LLM retrieval augmented generation (RAG)

GT includes a proof-of-concept implementation of user a Vector DB to index the GT Book and provide RAG enhanced queries to an LLM.

Currently two vector DBs are supported:

GtSimpleInMemoryVectorDatabase Object << #GtSimpleInMemoryVectorDatabase slots: { #model . #distanceMetric . #embeddingsByKey . #embeddings }; tag: 'Model'; package: 'Gt4Llm-VectorDb' stores all information in the image and doesn't have any external dependencies.

ChromaDB (LeChromaDb) is a light-weight vector database that is more scalable. GT assumes that is is already installed and the server running.

Limitations

In-memory database

The Vector DB is stored on disk in a simple STON file (around 400MB serialised) and held in memory during use.

Generating the embeddings and querying the LLM both assume that Ollama is installed with the default embedding and reasoning model (see below).

The context length of the running model needs to be significantly larger than the default explanation length (listed below). Note that the running context length in Ollama is not the same as the maximum context length size of the underlying model. The running context length may be viewed after the model has been loaded in to Ollama with:

provider := GtLConnection new
	providerClass: GtLOllamaProvider;
	modelName: 'qwen3.5:9b';
	buildBareProvider.
provider runningModels.

I.e. this is not intended to be a production-ready implementation.

ChromaDB

This assumes that ChromaDB is installed locally with the server listening on the port defined in GtChromaDbClient class>>#defaultBaseUrl. The code currently just uses the default tenant and database.

Load the database

If needed, this will regenerate the entire DB from scratch, and will likely take 15 minutes or more, depending on the hardware and models chosen, see below.

In-memory:

"To force a reload of the vector db, remove the old file"
GtLExperimentalRag cleanUp.
(GtLExperimentalRag localStore / GtLExperimentalRag gtBookBasename) ensureDelete.

"Accessing the RAG Db is enough to load it.
This can take some time, so do it in a background process."
[ gtRag := GtLExperimentalRag gtBookInMemory] 
	forkAt: Processor userBackgroundPriority - 1.

ChromaDB:

"For now it is up to you to empty the collection if needed"
GtLExperimentalRag cleanUp.

"The first time load is likely to take 15 minutes or more.
Subsequent startups will be subsecond."
[ GtLExperimentalRag gtBookChromaDb. ]
	forkAt: Processor userBackgroundPriority - 1.

Update the Vector DB

This will re-index any pages modified since the DB file modification timestamp.

GtLExperimentalRag gtBook updateVectorDb.

Query with RAG

provider := GtLConnection new
	providerClass: GtLOllamaProvider;
	modelName: 'qwen3.5:9b';
	buildBareProvider.
chat := GtLChat new provider: provider.
chat
	queryMarkdown: 'How do I change the colour of text?'
	withRag: GtLExperimentalRag gtBook.

Query with re-ranking

This requires that the sentence_transformers module be installed in the local PythonBridge, which will download the associated libraries and LLM models, i.e. it will take some time (minutes).

GtLSentenceTransformerCrossEncoder new installSentenceTransformers.

provider := GtLConnection new
	providerClass: GtLOllamaProvider;
	modelName: 'qwen3.5:9b';
	buildBareProvider.
chat := GtLChat new provider: provider.
chat
	queryMarkdown: 'How do I change the colour of roped text?'
	withRag: (GtLExperimentalRag gtBook asRagQueryContext
		optionAt: #reRank put: GtLCrossEncoderReranker new;
		yourself).

Indexing Strategy

The indexing strategy is described in GtLLepiterRagDb Object << #GtLLepiterRagDb slots: { #vectorDb . #splitter . #lepiterDbName }; tag: 'Support'; package: 'Gt4Llm-VectorDb' 's class comments.

The default embedding parameters are: