Full Code

Memobase supports any OpenAI-compatible LLM provider as its backend. This tutorial demonstrates how to use Ollama to run a local LLM for both the Memobase server and your chat application.

Setup

1. Configure Ollama

  • Install Ollama on your local machine.
  • Verify the installation by running ollama -v.
  • Pull a model to use. For this example, we’ll use qwen2.5:7b.
    ollama pull qwen2.5:7b
    

2. Configure Memobase

To use a local LLM provider with the Memobase server, you need to modify your config.yaml file.

Learn more about config.yaml.

Set the following fields to point to your local Ollama instance:

config.yaml
llm_api_key: ollama
llm_base_url: http://host.docker.internal:11434/v1
best_llm_model: qwen2.5:7b

Note: Since the Memobase server runs in a Docker container, we use host.docker.internal to allow it to access the Ollama server running on your local machine at port 11434.

Code Breakdown

This example uses Memobase’s OpenAI Memory Patch for a clear demonstration.

Client Initialization

First, we set up the OpenAI client to point to our local Ollama server and then apply the Memobase memory patch.

from memobase import MemoBaseClient
from openai import OpenAI
from memobase.patch.openai import openai_memory

# Point the OpenAI client to the local Ollama server
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)

# Initialize the Memobase client
mb_client = MemoBaseClient(
    project_url="http://localhost:8019", # Assuming local Memobase server
    api_key="secret",
)

# Apply the memory patch
client = openai_memory(client, mb_client)

After patching, your OpenAI client is now stateful. It will automatically manage memory for each user, recalling information from past conversations.

Chat Function

Next, we create a chat function that uses the patched client. The key is to pass a user_id to trigger the memory functionality.

def chat(message, user_id=None, close_session=False):
    print(f"Q: {message}")
    
    response = client.chat.completions.create(
        messages=[{"role": "user", "content": message}],
        model="qwen2.5:7b",
        stream=True,
        user_id=user_id, # Pass user_id to enable memory
    )
    
    # Display the response
    for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

    # Flush the memory buffer at the end of a session
    if close_session and user_id:
        client.flush(user_id)

Testing the Memory

Now, you can test the difference between a stateless and a stateful conversation:

# --- Without Memory ---
print("-- Ollama without memory --")
chat("My name is Gus.")
chat("What is my name?") # The model won't know

# --- With Memory ---
print("\n--- Ollama with memory ---")
user = "gus_123"
chat("My name is Gus.", user_id=user, close_session=True)

# In a new conversation, the model remembers
chat("What is my name?", user_id=user)

For the complete code, see the full example on GitHub.