Use Ollama as LLM
Memobase supports any OpenAI compatible LLM provider as the backend. In this example, we will use Ollama as our LLM provider.
ollama
is a tool to pull and run LLMs locally on your computer
This tutorial uses ollama
as the LLM of Memobase Server and the chat model.
Setup Ollama
-
Make sure you install the
ollama
. -
Run
ollama -v
to see if the output is correct (my version is0.3.8
). -
Download
qwen2.5
by running commandollama pull qwen2.5:7b
.
You can use any LLM you like in here, just make sure it exists in ollama
Setup Memobase to use Ollama as a backend
We need to modify the config.yaml
of Memobase Server to use another LLM backend.
config.yaml
?To use a local LLM provider, you need to set the following fields:
Since Memobase Server will be running in a docker container, we need to use host.docker.internal
instead of localhost
as the host for Memobase Server to access the local LLM provider.
http://host.docker.internal:11434/v1
means the Ollama server is running on your local machine at port 11434.
Code Breakdown
We use Memobase’s OpenAI Memory feature here for clear demonstration.
First, we need to setup the libs and clients:
- We also use Ollama as the chat model here.
- We use the
openai_memory
function to patch the OpenAI client to use Memobase as the memory backend. - After this patch, your OpenAI client will become stateful to the user, meaning it can recall the thing beyond the current conversation.
We then can write our chat function, to perform a single QA with
- Memobase has buffer for every user and won’t trigger memory process immediately after the insertion. This can help you save some cost. We use parameter
close_session
to flush the buffer when the chat session is closed. use_users
is a parameter to control if we should use the user id to trigger memory. If you don’t care about the memory, you can set it toFalse
.
You can run some tests to see how Memobase is helping:
For further details, you can check Full Code.