Tutorial: molding CSV data into an explainable system

TL;DR

This tutorial is intended as a first step-by-step, hands-on introduction to moldable development, using a CSV dataset from a research study as a case study. This tutorial is part of the learning module Learn the basics of Glamorous Toolkit.

Goals

At this stage you should have been introduced to Glamorous Toolkit and seen several examples Glamorous Toolkit as a case study of Moldable Development. The purpose of this tutorial is to show you, step-by-step, how to apply moldable development to create an Explainable System.

Moldable development can be applied to any kind of software system: a “greenfield” project developed within GT itself, existing code in GT, legacy code in some other language, a web API, a running system with a language bridge to GT, or static data or logs from an existing project. To keep things simple for now, we will start in this tutorial with an example of analyzing datasets from a research project in the form of CSV dumps of spreadsheets, and we will see how to turn this data into an explainable system.

The key point to remember about the selected case study in this tutorial is: it does not matter what kind of software you wish to make explainable, the process is essentially the same. We have picked a simple example of understanding some CSV data, but it could be pretty much anything.

Basically the process is: — Establish some questions you want answered. — Start with some objects of interest. Create the objects if they don't exist. — Ask questions of these objects by interacting with them (i.e., through code). — Turn these interactions into tiny, contextual tools that answer your questions. (E.g., Views, actions, searches.) — Create new objects if you need them to answer your questions. — Repeat.

In the process, we will see that there are a number of Moldable Development patterns that come into play. We won't look at all the patterns to start with, but rather highlight just a few of the most basic ones that help us, when we need them.

Instructions

You'll have to write some Smalltalk code in this tutorial. If you're comfortable with that, just proceed. If not, you might want to first consult A gentle introduction to Pharo Smalltalk.

The first time through this tutorial, you should step through each section, writing and testing code snippets as you go along. At the end of each section you will find a changes snippet listing the changes to the code base so far. You can consult this, or evaluate it if you want to see what code might have been written.

At the end of each section you will also find a snippet that will throw away all changes so far and file in a snapshot of the code as it should (could) be at that point in the tutorial. NB: If you want to continue working at some given section, you can simply run the file-in snapshot at the end of the previous section to get the code in the right state.

It is also a good idea to go through this tutorial again as a “kata” at a faster pace, writing the code yourself, without looking at the detailed instructions.

If you want to save your work at any point, you can file out your changes so far, and file in those changes at a later stage. (See How to file out changes.)

Tutorial parts

The tutorial consists of five parts that should be followed in sequence. At any time you can interrupt and continue later. You can either continue with your own changes, or load in the demo version of the work done up to that point.

1. Molding CSV data — preparing the data — this part introduces the software visualization review project and its datasets. We see how to parse and convert the raw CSV data into JSON, which will be more convenient for wrapping the data into domain objects.

2. Molding CSV data — modeling domain objects — here we start to turn the static CSV data into live domain objects that we can then mold with contextual tools to answer questions we have about the domain. We start by modeling a paper (publication) as an object that wraps the raw data from the dataset about that paper. We then add a couple of contextual views so we can see more interesting information about the papers when we inspect them.

3. Molding CSV data — modeling other entities — we continue by modeling other useful domain entities, in particular, datasets and authors , and we mold them as we did the papers .

4. Molding CSV data — actions, queries and visualizations — in addition to adding contextual views to domain entities, we can also add actions and queries , and we can have richer views, such as simple graph visualizations. We see how these are easy to add to the objects we already have.

5. Molding CSV data — continuing the exploration — in this last part you are left on your own to apply what you have seen to continue to model and mold further domain entities, namely venues and design studies .

Followup

Did you make it here? Great! Now here's another exercise you can do: throw out all the code you wrote (or back it up), and start again . Only this time, just start with the raw CSV files, and work on your own to model and mold the domain entities you find interesting. Drive your development from the questions you have about the data. You might end up with pretty much the same code, or perhaps you will find yourself going in a different direction. Either way, what is important is that you hone your skills in modeling domain entities and molding them with tiny tools to create an explainable system.

You can go back to Learn the basics of Glamorous Toolkit and continue from there.