Molding CSV data — modeling domain objects
This is Part 2 of Tutorial: molding CSV data into an explainable system. In this part we take the JSON data that we extracted from the raw CSV datasets of the research project that we are exploring, and we wrap these data into domain entities that model papers ( i.e. , visualization publications). We consider some specific questions we would like to answer about the data, and we then mold the paper objects to answer these questions. We see in particular how we can design simple contextual views of these objects to answer questions by making interesting information easily accessible.
If you are jumping into the tutorial at this point, and have some previously saved work not already in this image, you should file it in now. Otherwise, you can file in a sample snapshot reflecting the work completed at the end of the previous part:
VRTutorialExamples new fileIn1DatasetExamples
So far we have only transformed the datasets into a more convenient representation, and packaged them as examples. To do more we need to turn these data into domain entities , namely, papers, authors, venues etc.
This is an example of the most fundamental of the patterns, namely Moldable Object. To make a system explainable, we need live, moldable objects which we can augment with contextual tools. We will now start to introduce such objects.
Browsing the two datasets, we see the first (...-all) dataset contains more general information for all the papers about
authors
, the
venue
in which it was presented, the
type
of paper, and so on, while the second (...-selected) dataset contains more detailed information about design studies, such as the paper's
abstract
and the kind of visualization
task
considered.
We should therefore start by extracting more general domain entities from the first dataset, before diving into the details in the second dataset.
There are several domain entities that we might like to recover. Domain entities are full-fledged objects ( e.g. , paper) rather than attributes of objects ( e.g. , title of a paper). The properties of these domain entities may consist of both attributes (data values) and relationships to other domain entities. It may happen that something we consider as just data ( e.g. , the type of a paper) could turn out to be a domain entity on its own. We'll discover these as we explore.
Here are a few of the most obvious domain entities to extract: — paper — the publication, which has properties title (string), ID , authors (persons), venue — person — an author or co-author of one or more papers — venue — a conference (SOFTVIS Or VISSOFT) with a date , location etc. There may be others, but let's start with paper and then add more.
Consider the JSON data we have extracted. It consists of an Array
of Dictionary
instances, each of which describes a paper.
VRDatasetExamples new allPapersJSON
We cannot do much with the raw dictionaries, because a dictionary has nothing to do the visualization review domain per se. We need to turn this raw data into a moldable object by wrapping it. This is an example of another, closely-related pattern: Moldable Data Wrapper. Not all moldable objects are wrappers, but when we are creating an explainable system from existing data, this is the most common way to do it.
Let's introduce a new class VRPaper. Every instance will wrap (or encapsulate) the raw data dictionary of a paper, and provide a more domain-friendly interface instance.
This is how we would like to instantiate papers from the raw data:
thePapers := VRDatasetExamples new allPapersJSON collect: [ :data | VRPaper for: data ]
As before, this code won't work yet, but we can create it by
fixing
it. We want to:
— create the new class VRPaper in the same VisualisationReview package, but tagged as a Model class
— add a data slot to hold the dictionary
— add
accessors
data and data: to get and set the data slot
— add a class-side instance creation method for: that instantiates an object and sets the data
Try these steps yourself. You can always browse the proposed changes below and accept them later.
⇒
Create VRPaper using the fixit dialog, as before.
⇒
Add the missing for: method using the fixit dialog. Change the proposed data argument to aDictionary; add the body ^ self new data: aDictionary; change the default method category from “accessing” to “instance creation”; accept the change.
⇒
In the for: method fix the data: accessor, adding the body data := aDictionary
⇒
In the data: setter method, add the missing data slot using the fixit dialog, change the method category to “initialization” and accept the change
⇒
Add data getter. This can be done in several ways. One way is to click on the class name from one of the methods you have created, opening a class coder view. Click on the grey bar below the class name to open the class specification. Primary-click on the data slot to select the
Create accessors
menu item, select it, and accept the refactoring.
Your changes should be similar to the proposed changes below. You can also simply accept those:
Object << #VRPaper
slots: { #data };
tag: 'Model';
package: 'VisualisationReview'.
Object class << VRPaper class
slots: {}
"protocol: #accessing"
VRPaper >> data
^ data
"protocol: #initialization"
VRPaper >> data: aDictionary
data := aDictionary
"protocol: #'instance creation'"
VRPaper class >> for: aDictionary
^ self new
data: aDictionary;
yourself
Now we have all the missing pieces and we can create our collection of VRPaper instances.
⇒
Inspect the result. Dive into any particular paper, and inspect its data slot.
VRDatasetExamples new allPapersJSON collect: [ :data | VRPaper for: data ]
VRDatasetExamples new allPapers
⇒
Extract the above snippet as an example in the VRDatasetExamples class as allPapers.
Add an assertion to check the number of papers created.
After doing this, the following snippet should work:
VRDatasetExamples new allPapers
Your changes should look something like this:
"protocol: #examples" VRDatasetExamples >> allPapers <gtExample> | papers | papers := self allPapersJSON collect: [ :data | VRPaper for: data ]. self assert: papers size equals: 346. ^ papers
So far the domain objects are not much of an improvement over our raw dictionaries, in fact they seem worse, because we have to navigate to get to the interesting data. But now the fun starts as we begin to mold these objects, adding contextual tools to make them more useful.
To sync to this point in the tutorial (throwing away any other changes) evaluate:
VRTutorialExamples new fileIn2PaperWrapper
Moldable development should be driven by questions you have about the system that needs explaining. Let's start with an easy question, which are the selected design study papers? We know the answer, that the Design study papers are the selected ones, but are the data consistent is the important subquestion: are the papers flagged as design study papers in the first dataset precisely those listed in the second dataset? To answer this question, we need to be able to query papers for their id , their type and whether they are selected or not.
A good way to add new behavior to domain objects is to prototype the behavior first in a playground, and then extract a method. We can practice this in the snippets below:
aPaper:= VRDatasetExamples new allPapers first.
aPaper data at: 'Type'
⇒
Evaluate and inspect the two snippets above to verify that we can get a paper's
type
this way.
⇒
Select the code in the second snippet and primary-click to
Extract method
. Call the new method type.
After the refactoring, the second snippet should look like this:
aPaper type
Let's repeat the exercise by creating a Boolean testing method:
⇒
Extract a method called isDesignStudy from the snippet below.
⇒
Change the method category to testing.
(Click the grey triangle to edit the method.)
aPaper type = 'Design Study'
The result should look like this:
aPaper isDesignStudy
Now let's select all the design study papers and make that an example.
⇒
Evaluate the snippet below, and extract it as a new designStudies example.
⇒
Add an assertion over the expected size of the result.
⇒
Change the method category to examples.
VRDatasetExamples new allPapers select: #isDesignStudy
We should get 65 papers:
VRDatasetExamples new designStudies
To check whether the datasets are consistent, we still need a paper's id.
⇒
Extract a method to access a paper's id.
aPaper data at: 'ID'
⇒
Change the id accessor to return a Number instead of a ByteString (i.e., send asNumber before returning the result).
Now we can get the ids of all design study papers:
designStudyPaperIDs := VRDatasetExamples new designStudies collect: #id
We haven't created any domain objects from the second data set, so we'll have to extract them from the raw data:
selectedPaperIDs := VRDatasetExamples new selectedPapersJSON collect: [ :d | d at: 'ID' ]
If we compare the two lists we get an error:
self assert: designStudyIDs equals: selectedPaperIDs
That's because the lists are not in the same order. If we convert them to sets, everything should be ok.
self assert: designStudyIDs asSet equals: selectedPaperIDs asSet
Your changes should look like this:
"protocol: #accessing" VRPaper >> type ^ self data at: 'Type' "protocol: #testing" VRPaper >> isDesignStudy ^ self type = 'Design Study' "protocol: #examples" VRDatasetExamples >> designStudies <gtExample> | papers | papers := self allPapers select: #isDesignStudy. self assert: papers size equals: 65. ^ papers "protocol: #accessing" VRPaper >> id ^ (self data at: 'ID') asNumber
To sync to this point in the tutorial (throwing away any other changes) evaluate:
VRTutorialExamples new fileIn3DesignStudyPapers
Inspecting the list of design study papers is not very enlightening. Each paper is simply displayed as a VRPaper.
⇒
Inspect the list of papers below, and dive into one of the papers. Check that the Print view just displays a VRPaper. Dive into the Raw view to locate the paper's title.
VRDatasetExamples new designStudies
We can do much better. Let's first extract the
title
of a paper.
⇒
Extract an accessor for the title of a paper. Call it title.
aPaper := VRDatasetExamples new designStudies first.
aPaper data at: 'Title'
Now let's reimplement the method responsible for “printing” a paper.
⇒
Inspect aPaper. It should have a
Raw
view, a
Print
view and a
Meta
view.
aPaper
⇒
Go to the paper's
Meta
view and select + to add a new method.
⇒
Enter a printOn: method that prints the paper's title to a stream. It should look like this:
⇒
Accept the change and set the new method's category to printing.
⇒
Go back to the
Print
view and verify that the paper's title is now displayed.
⇒
Now inspect the list of design study papers again. Verify that the paper's titles are now displayed.
VRDatasetExamples new designStudies
Although this is much better, it hard-codes what we see when we inspect a list of papers. Later we will see how we can mold these views more flexibly.
Here are the last changes:
"protocol: #accessing" VRPaper >> title ^ self data at: 'Title' "protocol: #printing" VRPaper >> printOn: aStream aStream nextPutAll: self title
To sync to this point in the tutorial (throwing away any other changes) evaluate:
VRTutorialExamples new fileIn4PrintOn
The reason we wrapped the publications data into VRPaper domain objects is that this enables us to associate tiny, contextual tools to these objects, answering questions that we have about them. The most common of these kinds of tools is a Contextual View, a dedicated inspector view that is associated to that kind of object. Such views can be as simple or elaborate as you like, but most views are cases of a Simple View. The simplest of these is a
repurposed forwarding view
that already exists for an existing class. Let's see how we can create one of these.
Let's have a closer look at one of the papers:
VRDatasetExamples new allPapers detect: [ :p | p id = 128 ]
⇒
Extract this paper as an example paper128
⇒
Add assertions for the paper id and type.
VRDatasetExamples new paper128
If we inspect this paper we see its title in the
Print
view, but the rest of the useful information is buried in the
Raw
view. If we explore the data slot, we find an
Items
view that improves on this somewhat. Let's repurpose that view for our paper by creating a
forwarding view
from the paper to its data.
⇒
Evaluate and inspect the paper.
⇒
Open an inspector playground (lift the handle at the bottom of the inspector to reveal a Pharo playground snippet).
⇒
Enter and inspect the code: self data.
⇒
Secondary-click (OPT-click or ALT-click) on the
Items
tab to reveal the source code of this view.
Note that the view is defined in a method called gtItemsFor: of the Dictionary
class. We would like to repurpose this code without copy-pasting it.
⇒
Primary-click in the playground with self data, and select the option
Create <gtView> forward to Items
. Before accepting this code, change the name of the generated method from gtViewFor: to gtJsonFor:. Change the title: from 'Items' to 'JSON'. Accept the method.
⇒
Go to the new
JSON
tab of the paper, and verify that it contains the repurposed view.
⇒
Secondary-
click on the tab to see the source code of the new view method.
The generated code, belonging to the VRPaper class, should look like this:
This is all the code that is needed to reuse the dictionary's
Items
view!
Note that the method takes aView as an argument, it has a <gtView> pragma (
i.e.,
annotation), and it returns the result of sending aView a number of messages, in this case the first being forward. All view methods look like this. Furthermore, the name of the method is gtJsonFor:. By convention, view methods are named gt
Something
For:, but this is only a convention.
The first message sent to aView is the factory message forward, specifying the kind of view that we want. (See Inspector views for an overview.) Then we set the tab
title
for the view, the
object
whose view we want to repurpose, and finally the
view
message to send. That's it!
Below the gtJsonFor: method you should also see the code of the original method, Dictionary>>#gtItemsFor:
. It's a bit more complicated, being a columnedTree view, but still just over 20 lines of code.
The new JSON view is a bit of an improvement over the Raw view of a paper, but we still have to dig for the information we want about a paper. We will now see how we can do better.
These were the only changes we made so far in this section:
"protocol: #accessing" VRDatasetExamples >> paper128 <gtExample> | paper | paper := self allPapers detect: [ :p | p id = 128 ]. self assert: paper id equals: 128. self assert: paper isDesignStudy. ^ paper "protocol: #views" VRPaper >> gtJsonFor: aView <gtView> ^ aView forward title: 'JSON'; object: [ self data ]; view: #gtItemsFor:
To sync to this point in the tutorial (throwing away any other changes) evaluate:
VRTutorialExamples new fileIn5ForwardJSONView
The new JSON view of a paper offers a slight improvement over the Raw view, but it is rather clumsy for answering basic questions about a paper: — What is the title of the paper? — What is the “type” of the paper? Is it a design study? — How many authors contributed to the paper? — How long is the paper? — Was this paper selected for the in-depth analysis? We can answer these questions more easily by providing a dedicated contextual view.
Consider the view code below. It is similar to the forwarding view we defined above, except that instead of repurposing an existing view, it creates a new, dedicated
columned list
view, consisting of a list of items with column headings. This view is supposed to provide a high-level summary of the most important details of a paper, so we call it gtSummaryFor: and give it the title
Summary
.
We also set the
priority
to 10. This is a number (usually between 1 and 100) that tells the inspector how to order all the views.
Next come the
items
of the list. In this case we just list a bunch of
Key
and
Value
pairs corresponding to each piece of information we want to display (
id
,
title
etc.).
We then define two columns, one for the
Keys
and one for the
Values
. These are just the first and second elements of each item.
NB:
specifying the text of the key as #first is equivalent to writing the more verbose lambda block, [:item | item first ].
Finally we add a button to explicitly refresh the view, in case we need it.
Let's install this view.
⇒
Evaluate and inspect the paper example below.
⇒
Go to the
Meta
view. Click + to add a method, and copy-paste the code above. Change the category to views and accept the change.
⇒
Click on the new
Summary
tab to check that the new view does what it's supposed to.
VRDatasetExamples new paper128
⇒ Try adding a few more pieces of information to the summary, such as the year and the venue. Be sure to add dedicated accessors for those attributes. ⇒
The new changes:
"protocol: #views"
VRPaper >> gtSummaryFor: aView
<gtView>
^ aView columnedList
title: 'Summary';
priority: 10;
items: [ {{'ID'.
self id}.
{'Title'.
self title}.
{'Type'.
self type}.
{'# Authors'.
self data at: '# Authors'}.
{'# Pages'.
self data at: '# Pages'}.
{'Is selected'.
self isDesignStudy}} ];
column: 'Key'
text: #first
width: 100;
column: 'Value' text: #second;
actionUpdateButton
To sync to this point in the tutorial (throwing away any other changes) evaluate:
VRTutorialExamples new fileIn6SummaryView
Although we can mold an individual paper, we can't mold the whole collection, or a subset of papers, to show more properties or more views, because it is a generic Array
:
VRDatasetExamples new allPapers
For example, we would like to see not just the title of each paper, but also the type, and the number of authors. Cramming this information into the
Print
view of papers is possible, but awkward and not very flexible.
The solution is to introduce a dedicated Moldable Collection Wrapper.
We introduce a new class VRPaperGroup using the trait TGtGroupWithItems
, which provides an items slot and duck-typed collection methods that just delegate their behavior to that slot.
This simple wrapper introduces a new object that we can now mold.
⇒
In the snippet below, create the missing class VRPaperGroup as a Model class in the VisualisationReview package, adding the trait TGtGroupWithItems in the template.
VRPaperGroup withAll: VRDatasetExamples new allPapers
If you inspect the result now, you will just see a Raw view of the new paper group, which is arguably worse than the Array Items view we had before, but we'll fix that right away. Of course we could introduce a forwarding view to the array, but we want something better.
⇒
In the
Meta
view of the paper group, add the view method listed below. Be sure to set the category to views.
⇒
Have a look at the new
Papers
view.
We have created a new columnedList view, which is similar to the array's
Items
view we had before, but now we have an additional
Type
column.
⇒
Experiment with adding additional columns to the
Papers
view, for example, the number of authors or the year.
Note that by introducing a dedicated collection wrapper for papers, we not only can improve on the default
Items
view we had before, but we are free to add other views as well that are specific to the domain. For example, we could have a dedicated
Authors
view listing the authors of just this paper group, or we could have a
Summary
view that lists information such as the number of papers, number of authors, or statistics of the types of papers.
⇒
Extract an example called allPapersWrapped for the wrapped papers.
⇒
Add an assertion on the size of the result.
VRPaperGroup withAll: VRDatasetExamples new allPapers
The result should look like this:
VRDatasetExamples new allPapersWrapped
Now that we have a dedicated collection wrapper for papers, we can also leverage it for returning the results of queries. But first we need some more domain entities.
Object << #VRPaperGroup
traits: {TGtGroupWithItems};
slots: {};
tag: 'Model';
package: 'VisualisationReview'.
Object class << VRPaperGroup class
traits: {TGtGroupWithItems classTrait}
"protocol: #views"
VRPaperGroup >> gtItemsFor: aView
<gtView>
^ aView columnedList
title: 'Papers';
items: [ self items ];
column: 'Index'
text: [ :eachItem :eachIndex | eachIndex asRopedText foreground: Color gray ]
width: 45;
column: 'Title' text: [ :each | each title ];
column: 'Type' text: [ :each | each type ];
actionUpdateButton
"protocol: #examples"
VRDatasetExamples >> allPapersWrapped
<gtExample>
| papers |
papers := VRPaperGroup withAll: self allPapers.
self assert: papers size equals: 346.
^ papers
To sync to this point in the tutorial (throwing away any other changes) evaluate:
VRTutorialExamples new fileIn7PaperGroup
NB: This page links to Pages containing missing references - allowed failures to allow references to missing classes and methods in this page. Next: Part 3. Molding CSV data — modeling other entities