Full Code

Memobase supports any OpenAI compatible LLM provider as the backend. In this example, we will use Ollama as our LLM provider.

ollama is a tool to pull and run LLMs locally on your computer

This tutorial uses ollama as the LLM of Memobase Server and the chat model.

Setup Ollama

  • Make sure you install the ollama.

  • Run ollama -v to see if the output is correct (my version is 0.3.8).

  • Download qwen2.5 by running command ollama pull qwen2.5:7b.

You can use any LLM you like in here, just make sure it exists in ollama

Setup Memobase to use Ollama as a backend

We need to modify the config.yaml of Memobase Server to use another LLM backend.

What is config.yaml?

To use a local LLM provider, you need to set the following fields:

config.yaml
llm_api_key: ollama
llm_base_url: http://host.docker.internal:11434/v1
best_llm_model: qwen2.5:7b

Since Memobase Server will be running in a docker container, we need to use host.docker.internal instead of localhost as the host for Memobase Server to access the local LLM provider. http://host.docker.internal:11434/v1 means the Ollama server is running on your local machine at port 11434.

Code Breakdown

We use Memobase’s OpenAI Memory feature here for clear demonstration.

First, we need to setup the libs and clients:

from memobase import MemoBaseClient
from openai import OpenAI
from memobase.patch.openai import openai_memory
from time import sleep

stream = True
user_name = "test35"
model = "qwen2.5:7b"

# 1. Patch the OpenAI client to use MemoBase
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)
mb_client = MemoBaseClient(
    project_url="http://localhost:8019",
    api_key="secret",
)
client = openai_memory(client, mb_client)
  • We also use Ollama as the chat model here.
  • We use the openai_memory function to patch the OpenAI client to use Memobase as the memory backend.
  • After this patch, your OpenAI client will become stateful to the user, meaning it can recall the thing beyond the current conversation.

We then can write our chat function, to perform a single QA with

def chat(message, close_session=False, use_users=True):
    print("Q: ", message)
    # 2. Use OpenAI client as before 🚀
    r = client.chat.completions.create(
        messages=[
            {"role": "user", "content": message},
        ],
        model=model,
        stream=stream,
        # 3. Add an unique user string here will trigger memory.
        # Comment this line and this call will just like a normal OpenAI ChatCompletion
        user_id=user_name if use_users else None,
    )
    # Below is just displaying response from OpenAI
    if stream:
        for i in r:
            if not i.choices[0].delta.content:
                continue
            print(i.choices[0].delta.content, end="", flush=True)
        print()
    else:
        print(r.choices[0].message.content)

    # 4. Once the chat session is closed, remember to flush to keep memory updated.
    if close_session:
        sleep(0.1)  # Wait for the last message to be processed
        client.flush(user_name)
  • Memobase has buffer for every user and won’t trigger memory process immediately after the insertion. This can help you save some cost. We use parameter close_session to flush the buffer when the chat session is closed.
  • use_users is a parameter to control if we should use the user id to trigger memory. If you don’t care about the memory, you can set it to False.

You can run some tests to see how Memobase is helping:

print("--------Use Ollama without memory--------")
chat("I'm Gus, how are you?", use_users=False)
chat("What's my name?", use_users=False)

print("--------Use Ollama with memory--------")
chat("I'm Gus, how are you?", close_session=True)
print("User Profiles:", [p.describe for p in client.get_profile(user_name)])
chat("What's my name?")

For further details, you can check Full Code.