Molding CSV data — modeling domain objects

This is Part 2 of Tutorial: molding CSV data into an explainable system. In this part we take the JSON data that we extracted from the raw CSV datasets of the research project that we are exploring, and we wrap these data into domain entities that model papers ( i.e. , visualization publications). We consider some specific questions we would like to answer about the data, and we then mold the paper objects to answer these questions. We see in particular how we can design simple contextual views of these objects to answer questions by making interesting information easily accessible.

If you are jumping into the tutorial at this point, and have some previously saved work not already in this image, you should file it in now. Otherwise, you can file in a sample snapshot reflecting the work completed at the end of the previous part:

VRTutorialExamples new fileIn1DatasetExamples

Modeling papers

So far we have only transformed the datasets into a more convenient representation, and packaged them as examples. To do more we need to turn these data into domain entities , namely, papers, authors, venues etc.

Browsing the two datasets, we see the first (...-all) dataset contains more general information for all the papers about authors , the venue in which it was presented, the type of paper, and so on, while the second (...-selected) dataset contains more detailed information about design studies, such as the paper's abstract and the kind of visualization task considered. We should therefore start by extracting more general domain entities from the first dataset, before diving into the details in the second dataset.

There are several domain entities that we might like to recover. Domain entities are full-fledged objects ( e.g. , paper) rather than attributes of objects ( e.g. , title of a paper). The properties of these domain entities may consist of both attributes (data values) and relationships to other domain entities. It may happen that something we consider as just data ( e.g. , the type of a paper) could turn out to be a domain entity on its own. We'll discover these as we explore.

Here are a few of the most obvious domain entities to extract: — paper — the publication, which has properties title (string), ID , authors (persons), venue — person — an author or co-author of one or more papers — venue — a conference (SOFTVIS Or VISSOFT) with a date , location etc. There may be others, but let's start with paper and then add more.

Consider the JSON data we have extracted. It consists of an Array ArrayedCollection << #Array layout: VariableLayout; slots: {}; tag: 'Base'; package: 'Collections-Sequenceable' of Dictionary HashedCollection << #Dictionary slots: {}; tag: 'Dictionaries'; package: 'Collections-Unordered' instances, each of which describes a paper.

VRDatasetExamples new allPapersJSON

We cannot do much with the raw dictionaries, because a dictionary has nothing to do the visualization review domain per se. We need to turn this raw data into a moldable object by wrapping it. This is an example of another, closely-related pattern: Moldable Data Wrapper. Not all moldable objects are wrappers, but when we are creating an explainable system from existing data, this is the most common way to do it.

Let's introduce a new class VRPaper. Every instance will wrap (or encapsulate) the raw data dictionary of a paper, and provide a more domain-friendly interface instance. This is how we would like to instantiate papers from the raw data:

thePapers := VRDatasetExamples new allPapersJSON collect: [ :data | VRPaper for: data ]

As before, this code won't work yet, but we can create it by fixing it. We want to: — create the new class VRPaper in the same VisualisationReview package, but tagged as a Model class — add a data slot to hold the dictionary — add accessors data and data: to get and set the data slot — add a class-side instance creation method for: that instantiates an object and sets the data

Try these steps yourself. You can always browse the proposed changes below and accept them later. ⇒ Create VRPaper using the fixit dialog, as before. ⇒ Add the missing for: method using the fixit dialog. Change the proposed data argument to aDictionary; add the body ^ self new data: aDictionary; change the default method category from “accessing” to “instance creation”; accept the change. ⇒ In the for: method fix the data: accessor, adding the body data := aDictionary ⇒ In the data: setter method, add the missing data slot using the fixit dialog, change the method category to “initialization” and accept the change ⇒ Add data getter. This can be done in several ways. One way is to click on the class name from one of the methods you have created, opening a class coder view. Click on the grey bar below the class name to open the class specification. Primary-click on the data slot to select the Create accessors menu item, select it, and accept the refactoring.

Your changes should be similar to the proposed changes below. You can also simply accept those:

Object << #VRPaper
	slots: { #data };
	tag: 'Model';
	package: 'VisualisationReview'.

Object class << VRPaper class
	slots: {}

"protocol: #accessing"

VRPaper >> data
	^ data

"protocol: #initialization"

VRPaper >> data: aDictionary
	data := aDictionary

"protocol: #'instance creation'"

VRPaper class >> for: aDictionary
	^ self new
		data: aDictionary;
		yourself

Now we have all the missing pieces and we can create our collection of VRPaper instances. ⇒ Inspect the result. Dive into any particular paper, and inspect its data slot.

VRDatasetExamples new allPapersJSON collect: [ :data | VRPaper for: data ]

VRDatasetExamples new allPapers

⇒ Extract the above snippet as an example in the VRDatasetExamples class as allPapers. Add an assertion to check the number of papers created. After doing this, the following snippet should work:

VRDatasetExamples new allPapers

Your changes should look something like this:

"protocol: #examples"

VRDatasetExamples >> allPapers
	<gtExample>
	| papers |
	papers := self allPapersJSON collect: [ :data | VRPaper for: data ].
	self assert: papers size equals: 346.
	^ papers

So far the domain objects are not much of an improvement over our raw dictionaries, in fact they seem worse, because we have to navigate to get to the interesting data. But now the fun starts as we begin to mold these objects, adding contextual tools to make them more useful.

To sync to this point in the tutorial (throwing away any other changes) evaluate:

VRTutorialExamples new fileIn2PaperWrapper

Which are the selected papers?

Moldable development should be driven by questions you have about the system that needs explaining. Let's start with an easy question, which are the selected design study papers? We know the answer, that the Design study papers are the selected ones, but are the data consistent is the important subquestion: are the papers flagged as design study papers in the first dataset precisely those listed in the second dataset? To answer this question, we need to be able to query papers for their id , their type and whether they are selected or not.

A good way to add new behavior to domain objects is to prototype the behavior first in a playground, and then extract a method. We can practice this in the snippets below:

aPaper:= VRDatasetExamples new allPapers first.

aPaper data at: 'Type'

⇒ Evaluate and inspect the two snippets above to verify that we can get a paper's type this way. ⇒ Select the code in the second snippet and primary-click to Extract method . Call the new method type. After the refactoring, the second snippet should look like this:

aPaper type

Let's repeat the exercise by creating a Boolean testing method: ⇒ Extract a method called isDesignStudy from the snippet below. ⇒ Change the method category to testing. (Click the grey triangle to edit the method.)

aPaper type = 'Design Study'

The result should look like this:

aPaper isDesignStudy

Now let's select all the design study papers and make that an example.

⇒ Evaluate the snippet below, and extract it as a new designStudies example. ⇒ Add an assertion over the expected size of the result. ⇒ Change the method category to examples.

VRDatasetExamples new allPapers select: #isDesignStudy

We should get 65 papers:

VRDatasetExamples new designStudies

To check whether the datasets are consistent, we still need a paper's id. ⇒ Extract a method to access a paper's id.

aPaper data at: 'ID'

⇒ Change the id accessor to return a NumberMagnitude << #Number slots: {}; tag: 'Numbers'; package: 'Kernel' instead of a ByteStringString << #ByteString layout: ByteLayout; slots: {}; sharedVariables: { #NonAsciiMap }; tag: 'Base'; package: 'Collections-Strings' (i.e., send asNumber before returning the result).

Now we can get the ids of all design study papers:

designStudyPaperIDs := VRDatasetExamples new designStudies collect: #id

We haven't created any domain objects from the second data set, so we'll have to extract them from the raw data:

selectedPaperIDs := VRDatasetExamples new selectedPapersJSON
		collect: [ :d | d at: 'ID' ]

If we compare the two lists we get an error:

self assert: designStudyIDs equals: selectedPaperIDs

That's because the lists are not in the same order. If we convert them to sets, everything should be ok.

self assert: designStudyIDs asSet equals: selectedPaperIDs asSet

Your changes should look like this:

"protocol: #accessing"

VRPaper >> type
	^ self data at: 'Type'

"protocol: #testing"

VRPaper >> isDesignStudy
	^ self type = 'Design Study'

"protocol: #examples"

VRDatasetExamples >> designStudies
	<gtExample>
	| papers |
	papers := self allPapers select: #isDesignStudy.
	self assert: papers size equals: 65.
	^ papers

"protocol: #accessing"

VRPaper >> id
	^ (self data at: 'ID') asNumber

To sync to this point in the tutorial (throwing away any other changes) evaluate:

VRTutorialExamples new fileIn3DesignStudyPapers

Molding the print view — Implementing printOn:

Inspecting the list of design study papers is not very enlightening. Each paper is simply displayed as a VRPaper. ⇒ Inspect the list of papers below, and dive into one of the papers. Check that the Print view just displays a VRPaper. Dive into the Raw view to locate the paper's title.

VRDatasetExamples new designStudies

We can do much better. Let's first extract the title of a paper. ⇒ Extract an accessor for the title of a paper. Call it title.

aPaper := VRDatasetExamples new designStudies first.

aPaper data at: 'Title'

Now let's reimplement the method responsible for “printing” a paper. ⇒ Inspect aPaper. It should have a Raw view, a Print view and a Meta view.

aPaper

⇒ Go to the paper's Meta view and select + to add a new method. ⇒ Enter a printOn: method that prints the paper's title to a stream. It should look like this:

⇒ Accept the change and set the new method's category to printing. ⇒ Go back to the Print view and verify that the paper's title is now displayed. ⇒ Now inspect the list of design study papers again. Verify that the paper's titles are now displayed.

VRDatasetExamples new designStudies

Although this is much better, it hard-codes what we see when we inspect a list of papers. Later we will see how we can mold these views more flexibly.

Here are the last changes:

"protocol: #accessing"

VRPaper >> title
	^ self data at: 'Title'

"protocol: #printing"

VRPaper >> printOn: aStream
	aStream nextPutAll: self title

To sync to this point in the tutorial (throwing away any other changes) evaluate:

VRTutorialExamples new fileIn4PrintOn

Adding a forwarding contextual view

The reason we wrapped the publications data into VRPaper domain objects is that this enables us to associate tiny, contextual tools to these objects, answering questions that we have about them. The most common of these kinds of tools is a Contextual View, a dedicated inspector view that is associated to that kind of object. Such views can be as simple or elaborate as you like, but most views are cases of a Simple View. The simplest of these is a repurposed forwarding view that already exists for an existing class. Let's see how we can create one of these.

Let's have a closer look at one of the papers:

VRDatasetExamples new allPapers detect: [ :p | p id = 128 ]

⇒ Extract this paper as an example paper128 ⇒ Add assertions for the paper id and type.

VRDatasetExamples new paper128

If we inspect this paper we see its title in the Print view, but the rest of the useful information is buried in the Raw view. If we explore the data slot, we find an Items view that improves on this somewhat. Let's repurpose that view for our paper by creating a forwarding view from the paper to its data. ⇒ Evaluate and inspect the paper. ⇒ Open an inspector playground (lift the handle at the bottom of the inspector to reveal a Pharo playground snippet). ⇒ Enter and inspect the code: self data. ⇒ Secondary-click (OPT-click or ALT-click) on the Items tab to reveal the source code of this view. Note that the view is defined in a method called gtItemsFor: of the Dictionary HashedCollection << #Dictionary slots: {}; tag: 'Dictionaries'; package: 'Collections-Unordered' class. We would like to repurpose this code without copy-pasting it. ⇒ Primary-click in the playground with self data, and select the option Create <gtView> forward to Items . Before accepting this code, change the name of the generated method from gtViewFor: to gtJsonFor:. Change the title: from 'Items' to 'JSON'. Accept the method. ⇒ Go to the new JSON tab of the paper, and verify that it contains the repurposed view. ⇒ Secondary- click on the tab to see the source code of the new view method. The generated code, belonging to the VRPaper class, should look like this:

This is all the code that is needed to reuse the dictionary's Items view! Note that the method takes aView as an argument, it has a <gtView> pragma ( i.e., annotation), and it returns the result of sending aView a number of messages, in this case the first being forward. All view methods look like this. Furthermore, the name of the method is gtJsonFor:. By convention, view methods are named gt Something For:, but this is only a convention. The first message sent to aView is the factory message forward, specifying the kind of view that we want. (See Inspector views for an overview.) Then we set the tab title for the view, the object whose view we want to repurpose, and finally the view message to send. That's it!

Below the gtJsonFor: method you should also see the code of the original method, Dictionary>>#gtItemsFor: gtItemsFor: aView <gtView> ^ aView columnedTree title: 'Items'; priority: 1; items: [ self associations sort: (#key collatedBy: #asString) ]; children: [ :each | each value isDictionary ifTrue: [ each value associations sort: (#key collatedBy: #asString) ] ifFalse: [ (each value isArray and: [ each value allSatisfy: #isDictionary ]) ifTrue: [ each value collectWithIndex: [ :x :i | i -> x ] ] ifFalse: [ #() ] ] ]; column: 'Key' text: [ :assoc | assoc key ]; column: 'Value' text: [ :assoc | assoc value ] weight: 3; contextItemLabel: 'Inspect key' action: [ :anElement :anAssoc | anElement phlow spawnObject: anAssoc key ]; contextItemLabel: 'Inspect value' action: [ :anElement :anAssoc | anElement phlow spawnObject: anAssoc value ]; contextItemLabel: 'Inspect association' action: [ :anElement :anAssoc | anElement phlow spawnObject: anAssoc ]; contextItemLabel: 'Remove key and value' action: [ :anElement :anAssoc | self removeKey: anAssoc key ifAbsent: [ "ignore" ]. anElement phlow contextMenuUpdateViewContent ]; send: [ :assoc | assoc value ]. "Implementation note: association sorting uses #collatedBy: to avoid a 'Symbol DNU value: value: error'" . It's a bit more complicated, being a columnedTree view, but still just over 20 lines of code.

The new JSON view is a bit of an improvement over the Raw view of a paper, but we still have to dig for the information we want about a paper. We will now see how we can do better.

These were the only changes we made so far in this section:

"protocol: #accessing"

VRDatasetExamples >> paper128
	<gtExample>
	| paper |
	paper := self allPapers detect: [ :p | p id = 128 ].
	self assert: paper id equals: 128.
	self assert: paper isDesignStudy.
	^ paper

"protocol: #views"

VRPaper >> gtJsonFor: aView
	<gtView>
	^ aView forward
		title: 'JSON';
		object: [ self data ];
		view: #gtItemsFor:

To sync to this point in the tutorial (throwing away any other changes) evaluate:

VRTutorialExamples new fileIn5ForwardJSONView

Adding a contextual “summary” view

The new JSON view of a paper offers a slight improvement over the Raw view, but it is rather clumsy for answering basic questions about a paper: — What is the title of the paper? — What is the “type” of the paper? Is it a design study? — How many authors contributed to the paper? — How long is the paper? — Was this paper selected for the in-depth analysis? We can answer these questions more easily by providing a dedicated contextual view.

Consider the view code below. It is similar to the forwarding view we defined above, except that instead of repurposing an existing view, it creates a new, dedicated columned list view, consisting of a list of items with column headings. This view is supposed to provide a high-level summary of the most important details of a paper, so we call it gtSummaryFor: and give it the title Summary . We also set the priority to 10. This is a number (usually between 1 and 100) that tells the inspector how to order all the views. Next come the items of the list. In this case we just list a bunch of Key and Value pairs corresponding to each piece of information we want to display ( id , title etc.). We then define two columns, one for the Keys and one for the Values . These are just the first and second elements of each item. NB: specifying the text of the key as #first is equivalent to writing the more verbose lambda block, [:item | item first ]. Finally we add a button to explicitly refresh the view, in case we need it.

Let's install this view. ⇒ Evaluate and inspect the paper example below. ⇒ Go to the Meta view. Click + to add a method, and copy-paste the code above. Change the category to views and accept the change. ⇒ Click on the new Summary tab to check that the new view does what it's supposed to.

VRDatasetExamples new paper128

⇒ Try adding a few more pieces of information to the summary, such as the year and the venue. Be sure to add dedicated accessors for those attributes. ⇒

The new changes:

"protocol: #views"

VRPaper >> gtSummaryFor: aView
	<gtView>
	^ aView columnedList
		title: 'Summary';
		priority: 10;
		items: [ {{'ID'.
					self id}.
				{'Title'.
					self title}.
				{'Type'.
					self type}.
				{'# Authors'.
					self data at: '# Authors'}.
				{'# Pages'.
					self data at: '# Pages'}.
				{'Is selected'.
					self isDesignStudy}} ];
		column: 'Key'
			text: #first
			width: 100;
		column: 'Value' text: #second;
		actionUpdateButton

To sync to this point in the tutorial (throwing away any other changes) evaluate:

VRTutorialExamples new fileIn6SummaryView

Wrapping the paper collection

Although we can mold an individual paper, we can't mold the whole collection, or a subset of papers, to show more properties or more views, because it is a generic Array ArrayedCollection << #Array layout: VariableLayout; slots: {}; tag: 'Base'; package: 'Collections-Sequenceable' :

VRDatasetExamples new allPapers

For example, we would like to see not just the title of each paper, but also the type, and the number of authors. Cramming this information into the Print view of papers is possible, but awkward and not very flexible. The solution is to introduce a dedicated Moldable Collection Wrapper. We introduce a new class VRPaperGroup using the trait TGtGroupWithItems Trait << #TGtGroupWithItems traits: {TGtGroup + TGtGroupItems}; slots: {}; package: 'GToolkit-Utility-System' , which provides an items slot and duck-typed collection methods that just delegate their behavior to that slot. This simple wrapper introduces a new object that we can now mold.

⇒ In the snippet below, create the missing class VRPaperGroup as a Model class in the VisualisationReview package, adding the trait TGtGroupWithItems in the template.

VRPaperGroup withAll: VRDatasetExamples new allPapers

If you inspect the result now, you will just see a Raw view of the new paper group, which is arguably worse than the Array Items view we had before, but we'll fix that right away. Of course we could introduce a forwarding view to the array, but we want something better.

⇒ In the Meta view of the paper group, add the view method listed below. Be sure to set the category to views. ⇒ Have a look at the new Papers view. We have created a new columnedList view, which is similar to the array's Items view we had before, but now we have an additional Type column.

⇒ Experiment with adding additional columns to the Papers view, for example, the number of authors or the year. Note that by introducing a dedicated collection wrapper for papers, we not only can improve on the default Items view we had before, but we are free to add other views as well that are specific to the domain. For example, we could have a dedicated Authors view listing the authors of just this paper group, or we could have a Summary view that lists information such as the number of papers, number of authors, or statistics of the types of papers. ⇒ Extract an example called allPapersWrapped for the wrapped papers. ⇒ Add an assertion on the size of the result.

VRPaperGroup withAll: VRDatasetExamples new allPapers

The result should look like this:

VRDatasetExamples new allPapersWrapped

Now that we have a dedicated collection wrapper for papers, we can also leverage it for returning the results of queries. But first we need some more domain entities.

Object << #VRPaperGroup
	traits: {TGtGroupWithItems};
	slots: {};
	tag: 'Model';
	package: 'VisualisationReview'.

Object class << VRPaperGroup class
	traits: {TGtGroupWithItems classTrait}

"protocol: #views"

VRPaperGroup >> gtItemsFor: aView
	<gtView>
	^ aView columnedList
		title: 'Papers';
		items: [ self items ];
		column: 'Index'
			text: [ :eachItem :eachIndex | eachIndex asRopedText foreground: Color gray ]
			width: 45;
		column: 'Title' text: [ :each | each title ];
		column: 'Type' text: [ :each | each type ];
		actionUpdateButton

"protocol: #examples"

VRDatasetExamples >> allPapersWrapped
	<gtExample>
	| papers |
	papers := VRPaperGroup withAll: self allPapers.
	self assert: papers size equals: 346.
	^ papers

To sync to this point in the tutorial (throwing away any other changes) evaluate:

VRTutorialExamples new fileIn7PaperGroup

Discussion

Modeling papers is an example of the most fundamental of the patterns, namely Moldable Object. To make a system explainable, we need live, moldable objects which we can augment with contextual tools.

NB: This page links to Pages containing missing references - allowed failures to allow references to missing classes and methods in this page. Next: Part 3. Molding CSV data — modeling other entities