Creating a dataset for fine-tuning
The most important part of a fine-tuning run is the dataset. A natural target for creating a dataset in Glamorous Toolkit is examples: they offer code and live objects as well as views.
In this page, we will look at generating a dataset for a tutor generating Phlow views. As such, we will need objects and their Phlow views.
To create a dataset from examples, we first need to gather the ones we require.
aCollection := Smalltalk gtExamplesContained collect: [ :eachExample | eachExample asCachedExampleWithResult ]. aGroup := GtExampleGroup withAll: aCollection
This selects all examples currently defined. We can then run all of the examples. Caution: this will take a long time, multiple hours; it’s best to either let them run asynchronously or to select only a subset of examples.
If desired the number of examples can be limited by selecting the first X examples only.
numberOfExamples := 1000. aGroup := aGroup first: numberOfExamples
Finally, all the examples need to be run.
aGroup runNotYetExecuted
After all the example results have been collected, we can convert them into conversations for our dataset.
tutor := GtLlmTutor new. conversations := (aGroup select: [ :anExample | anExample result isNotNil ]) flatCollect: [ :anExample | (GtLlmExampleViewConversationCollector new example: anExample; instructions: tutor instruction) conversations ]
Once this is done, we can create a fine-tuning file as specified inCreating a fine-tuned model.
file := GtLlmFineTuningFile new name: 'fine-tuning.jsonl'; model: 'gpt-4o-mini-2024-07-18'; conversations: conversations. file costsPerEpoch
If you know how many epochs of training will be performed, you can also specify them:
file costsForEpochs: 3
Note that because of some special discount cases surrounding newer models, this is not equivalent to doing costsPerEpoch * numberOfEpochs
.
If the cost is not prohibitive, we can then start a fine-tuning.
client := GtOpenAIClient withApiKeyFromFile. openAiFile := client uploadFile: file withPurpose: 'fine-tune'. fineTuningJob := client createFineTuningJobOnModel: file model withFile: openAiFile id