LLM retrieval augmented generation (RAG)
GT includes a proof-of-concept implementation of user a Vector DB to index the GT Book and provide RAG enhanced queries to an LLM.
Currently two vector DBs are supported:
GtSimpleInMemoryVectorDatabase
stores all information in the image and doesn't have any external dependencies.
ChromaDB (LeChromaDb) is a light-weight vector database that is more scalable. GT assumes that is is already installed and the server running.
The Vector DB is stored on disk in a simple STON file (around 400MB serialised) and held in memory during use.
Generating the embeddings and querying the LLM both assume that Ollama is installed with the default embedding and reasoning model (see below).
The context length of the running model needs to be significantly larger than the default explanation length (listed below). Note that the running context length in Ollama is not the same as the maximum context length size of the underlying model. The running context length may be viewed after the model has been loaded in to Ollama with:
provider := GtLConnection new providerClass: GtLOllamaProvider; modelName: 'qwen3.5:9b'; buildBareProvider. provider runningModels.
I.e. this is not intended to be a production-ready implementation.
This assumes that ChromaDB is installed locally with the server listening on the port defined in GtChromaDbClient class>>#defaultBaseUrl. The code currently just uses the default tenant and database.
If needed, this will regenerate the entire DB from scratch, and will likely take 15 minutes or more, depending on the hardware and models chosen, see below.
In-memory:
"To force a reload of the vector db, remove the old file" GtLExperimentalRag cleanUp. GtLExperimentalRag defaultVectorDbFile ensureDelete.
[ gtRag := GtLExperimentalRag gtBook. gtRag loadInMemoryLepiterRagDb: GtLExperimentalRag defaultVectorDbFile. ] forkAt: Processor userBackgroundPriority - 1.
ChromaDB:
"For now it is up to you to empty the collection if needed" [ gtRag := GtLExperimentalRag gtBook. gtRag loadChromaDbLepiterRagDb. ] forkAt: Processor userBackgroundPriority - 1.
This will re-index any pages modified since the DB file modification timestamp.
gtRag := GtLExperimentalRag gtBook. gtRag updateVectorDb.
provider := GtLConnection new providerClass: GtLOllamaProvider; modelName: 'qwen3.5:9b'; buildBareProvider. chat := GtLChat new provider: provider. chat queryMarkdown: 'How do I change the colour of text?' withRag: GtLExperimentalRag gtBook.
provider := GtLConnection new providerClass: GtLOllamaProvider; modelName: 'qwen3.5:9b'; buildBareProvider. chat := GtLChat new provider: provider. chat queryMarkdown: 'How do I change the colour of roped text?' withRag: (GtLExperimentalRag gtBook asRagQueryContext optionAt: #reRank put: GtLCrossEncoderReranker new; yourself).
The indexing strategy is described in GtLLepiterRagDb
's class comments.
The default embedding parameters are: