Moldable Chat: Multimodal inputs

Prompts can also contain images or PDFs.

Let’s first create an image. In this example, we save a screenshot of the currently focused world to a file.

fileReference := BlExporter png
		element: (GtWorldElement allInstances
				detect: [ :aWorldElement | aWorldElement space notNil and: [ aWorldElement space isFocused ] ]);
		export
  

We can pass the image together with a textual prompt using GtLChat>>#sendWith: sendWith: aBlock | newMessage | newMessage := self createUserMessage. aBlock value: newMessage. self sendMessage: newMessage . This method takes a block argument in which we can specify multiple inputs, such as TGtLWithInputModelsAPI>>#markdown: markdown: aString self addInputMagritteObject: (GtLMarkdown new content: aString) named: 'Markdown' , TGtLWithInputModelsAPI>>#images: images: aCollection self addAttachments: (aCollection collect: [ :each | GtLInputImageFile fileReference: each ]) , or TGtLWithInputModelsAPI>>#pdfs: pdfs: aCollection self addAttachments: (aCollection collect: [ :each | GtLInputPdfFile fileReference: each ]) .

c := GtLChat new
		markdownResponse;
	sendWith: [ :m | 
		m
			markdown: 'What do you see in the picture?';
			images: {fileReference} ]