Exploring IMDB movie lists
Motivation
Many projects start with existing data. This exercise will explore how to build up an explorable domain model for a movie list from IMDB.
Tasks
Download a movie list
Find an IMDB Movie list you are interested in, and export it as a CSV file. (Click on the three dots.)
You can also start from this list which is already in the image:
csv := (FileLocator gtResource / 'feenkcom' / 'gtoolkit-demos' / 'data' / 'imdb' / 'Movies.csv') contents.
We can't do much with this except open it as a spreadsheet.
Parse the data
You can use the existing CSVParser
to parse the data and an existing visitor to produce a JSON representation.
(CSV2JSON for: csv) json.
We see that there is one record per movie, and each movie has a Title, Directors, Genres, a link to the IMDB page, and a few other attributes.
An alternative is to use the raw Array of Dictionaries representation:
(CSV2JSON for: csv) jsonObject.
This is less pretty to inspect but is easier to access.
Wrap the data
We don't have too many domain concepts, but there are enough to build a simple model. We have: Film Collection, Film, Director, and possibly Genre.
Let's wrap the data as a FilmCollection.
library := FilmCollection new data: (CSV2JSON for: csv) jsonObject.
Use the fixits to create the class in a new package with the category Model. Store the data in a new slot.
Inspect data asJson
in the Contextual Playground of the FilmCollection instance. Create a forwarding view to the JSON Object view from the library instance.
Go to the Meta view, add a method that looks like this:
gtJsonFor: aView
<gtView>
^ aView forward
object: [ data asJson ];
view: #gtJsonObjectFor:context:
Add a suitable title and priority.
Create a Film entity
We want to create a Film entity for each item in data
. Fix and run this in the contextual playground, then extract the code as a method films
.
Inspect a film. Extract its Title from the data. Extract that as a title
method.
Change the printOn:
method to show the film's title, so when we inspect the list of films in a collection we see the film titles.
Add a Json view to an individual film. (Copy-paste the similar method from the collection. Consider putting the shared code in a trait.)
Add a Films list view
Add accessors for the year and directors for each film.
Create a columnedList
view for the collection, showing the title, year and directors of each film.
Create a Director entity
Note how the Directors field of a film is formatted. The field may be empty, a single director, or a comma-separated list in double quotes. Figure out how to parse the Directors field and return a list of individual director names.
Create a small example of a film collection with three films with 0, 1 and more directors as test cases for the parsing.
Rewrite the directors
accessor to return instances of a new Director class instead of strings.
We now see that Films and Directors should have access to the film collection, so we can find all the films of a given director. Update these classes to have a slot called collection
, which is initialized when we create film and director instances.
Explore how a Director instance can return the list of films for which that instance is one of the directors.
Add an action to open a the IMDB webpage
Add the action to the Film entity.
For inspiration, see, for example AbstractFileReference>>#gtActionWebBrowseFor:
and AWebPage>>#gtActionWebBrowseFor:
.
Add search facilities
Add Spotter searches to the domain classes, starting with the FilmCollection.
For inspiration, see how spotter searchers are implemented for File references.
#gtSearch gtPragmas & AbstractFileReference gtMethodsInClass
A Spotter search for a collection should return matching films, or directors.
Introduce collection wrappers for searches
The result of a search should also be moldable, so make each of them a Moldable Collection Wrapper. Lists of films or directors should be moldable objects with custom views. Use the TGtGroupWithItems
trait to give the wrapper collection methods.
For inspiration look at the MarkdownWebsite
class hierarchy. A website holds a collection of pages, but so does a web page group. Both use the shared trait TWebPageGroup
. In this way when we get the result of a Spotter search which is a list of pages, it has the same custom views as a website. We can do something similar so that the set of films of a director, or the result of any query will be a list of films with the same views as a FilmCollection.
Further exploration
The data exported from IMDB is missing lots of important information about movies. There exists an IMDB programmer API but it requires an AWS account and more.
There exists several other Movie APIs that you might want to explore to enrich the bare bones model we have now.