Documentation Index
Fetch the complete documentation index at: https://docs.memobase.io/llms.txt
Use this file to discover all available pages before exploring further.
Full Code
Memobase supports any OpenAI-compatible LLM provider as its backend. This tutorial demonstrates how to use Ollama to run a local LLM for both the Memobase server and your chat application.
Setup
- Install Ollama on your local machine.
- Verify the installation by running
ollama -v.
- Pull a model to use. For this example, we’ll use
qwen2.5:7b.
To use a local LLM provider with the Memobase server, you need to modify your config.yaml file.
Set the following fields to point to your local Ollama instance:
llm_api_key: ollama
llm_base_url: http://host.docker.internal:11434/v1
best_llm_model: qwen2.5:7b
Note: Since the Memobase server runs in a Docker container, we use host.docker.internal to allow it to access the Ollama server running on your local machine at port 11434.
Code Breakdown
This example uses Memobase’s OpenAI Memory Patch for a clear demonstration.
Client Initialization
First, we set up the OpenAI client to point to our local Ollama server and then apply the Memobase memory patch.
from memobase import MemoBaseClient
from openai import OpenAI
from memobase.patch.openai import openai_memory
# Point the OpenAI client to the local Ollama server
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama",
)
# Initialize the Memobase client
mb_client = MemoBaseClient(
project_url="http://localhost:8019", # Assuming local Memobase server
api_key="secret",
)
# Apply the memory patch
client = openai_memory(client, mb_client)
After patching, your OpenAI client is now stateful. It will automatically manage memory for each user, recalling information from past conversations.
Chat Function
Next, we create a chat function that uses the patched client. The key is to pass a user_id to trigger the memory functionality.
def chat(message, user_id=None, close_session=False):
print(f"Q: {message}")
response = client.chat.completions.create(
messages=[{"role": "user", "content": message}],
model="qwen2.5:7b",
stream=True,
user_id=user_id, # Pass user_id to enable memory
)
# Display the response
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
# Flush the memory buffer at the end of a session
if close_session and user_id:
client.flush(user_id)
Testing the Memory
Now, you can test the difference between a stateless and a stateful conversation:
# --- Without Memory ---
print("-- Ollama without memory --")
chat("My name is Gus.")
chat("What is my name?") # The model won't know
# --- With Memory ---
print("\n--- Ollama with memory ---")
user = "gus_123"
chat("My name is Gus.", user_id=user, close_session=True)
# In a new conversation, the model remembers
chat("What is my name?", user_id=user)
For the complete code, see the full example on GitHub.