Moldable Chat: Multimodal inputs
Prompts can also contain images or PDFs.
Let’s first create an image. In this example, we save a screenshot of the currently focused world to a file.
fileReference := BlExporter png element: (GtWorldElement allInstances detect: [ :aWorldElement | aWorldElement space notNil and: [ aWorldElement space isFocused ] ]); export
We can pass the image together with a textual prompt using GtLChat>>#sendWith:
. This method takes a block argument in which we can specify multiple inputs, such as TGtLWithInputModelsAPI>>#markdown:
, TGtLWithInputModelsAPI>>#images:
, or TGtLWithInputModelsAPI>>#pdfs:
.
c := GtLChat new
markdownResponse;
sendWith: [ :m |
m
markdown: 'What do you see in the picture?';
images: {fileReference} ]