Setup
- Get API Keys: Obtain an API key from Memobase or run a local server.
- Configure Environment Variables:
- Install Dependencies:
Code Breakdown

Diagram of OpenAI API with Memory
- Wrap the OpenAI client: This allows us to intercept chat messages and inject memory context into prompts.
- Integrate Memobase APIs: Use the wrappers to store chat history and retrieve user memories.
- Test: Verify that the memory feature functions correctly.
You can find the full source code on GitHub.
Basic Setup
First, initialize the OpenAI and Memobase clients.Wrapping the OpenAI Client
We use duck typing to wrap the OpenAI client. This approach avoids altering the original client’s class structure.- It checks if the client has already been patched to prevent applying the wrapper multiple times.
- It replaces the standard
chat.completions.create
method with our custom_sync_chat
function, which will contain the memory logic.
The New chat.completions.create
Method
Our new chat.completions.create
method must meet several requirements:
- Accept a
user_id
to enable user-specific memory. - Support all original arguments to ensure backward compatibility.
- Return the same data types, including support for streaming.
- Maintain performance comparable to the original method.
user_id
are passed directly to the original method.
*args
, **kwargs
) to the original function, preserving its behavior. Memobase uses UUIDs to identify users, so we convert the provided user_id
(which can be any string) into a UUID.
If a user_id
is present, the workflow is:
- Get or create the user in Memobase.
- Inject the user’s memory context into the message list.
- Call the original
create
method with the modified messages. - Save the new conversation to Memobase for future recall.
Enhancing Messages with User Context
Theuser_context_insert
function injects the user’s memory into the prompt.
Saving Conversations
After receiving a response from OpenAI, we save the conversation to Memobase to build the user’s memory. This is done asynchronously using a background thread to avoid blocking.Non-Streaming Responses
For standard responses, we extract the content and save it.Streaming Responses
For streaming, we yield each chunk as it arrives and accumulate the full response. Once the stream is complete, we save the entire conversation.Utility Functions
The patch also adds several helper functions to the client for managing user memory:Usage Example
Here’s how to use the patched OpenAI client.Conclusion
This guide demonstrates a powerful method for adding persistent user memory to the OpenAI client. The patched client:- Is fully compatible: It works identically to the original client.
- Enables memory: Adds memory capabilities when a
user_id
is provided. - Supports all modes: Handles both streaming and non-streaming responses.
- Is automatic: Seamlessly saves conversations and injects context without extra code.