Exploring IMDB movie lists


Many projects start with existing data. This exercise will explore how to build up an explorable domain model for a movie list from IMDB.


Download a movie list

Find an IMDB Movie list you are interested in, and export it as a CSV file. (Click on the three dots.)

You can also start from this list which is already in the image:

csv := (FileLocator gtResource / 'feenkcom' / 'gtoolkit-demos' / 'data' / 'imdb' / 'Movies.csv') contents.

We can't do much with this except open it as a spreadsheet.

Parse the data

You can use the existing CSVParser SmaCCParser subclass: #CSVParser instanceVariableNames: '' classVariableNames: '' package: 'SmaCC_CSV_Parser' to parse the data and an existing visitor to produce a JSON representation.

(CSV2JSON for: csv) json. 

We see that there is one record per movie, and each movie has a Title, Directors, Genres, a link to the IMDB page, and a few other attributes.

An alternative is to use the raw Array of Dictionaries representation:

(CSV2JSON for: csv) jsonObject.

This is less pretty to inspect but is easier to access.

Wrap the data

We don't have too many domain concepts, but there are enough to build a simple model. We have: Film Collection, Film, Director, and possibly Genre.

Let's wrap the data as a FilmCollection.

library := FilmCollection new data: (CSV2JSON for: csv) jsonObject.

Use the fixits to create the class in a new package with the category Model. Store the data in a new slot.

Inspect data asJson in the Contextual Playground of the FilmCollection instance. Create a forwarding view to the JSON Object view from the library instance.

Go to the Meta view, add a method that looks like this:

gtJsonFor: aView
	^ aView forward
		object: [ data asJson ];
		view: #gtJsonObjectFor:context:

Add a suitable title and priority.

Create a Film entity

We want to create a Film entity for each item in data. Fix and run this in the contextual playground, then extract the code as a method films.

Inspect a film. Extract its Title from the data. Extract that as a title method.

Change the printOn: method to show the film's title, so when we inspect the list of films in a collection we see the film titles.

Add a Json view to an individual film. (Copy-paste the similar method from the collection. Consider putting the shared code in a trait.)

Add a Films list view

Add accessors for the year and directors for each film.

Create a columnedList view for the collection, showing the title, year and directors of each film.

Create a Director entity

Note how the Directors field of a film is formatted. The field may be empty, a single director, or a comma-separated list in double quotes. Figure out how to parse the Directors field and return a list of individual director names.

Create a small example of a film collection with three films with 0, 1 and more directors as test cases for the parsing.

Rewrite the directors accessor to return instances of a new Director class instead of strings.

We now see that Films and Directors should have access to the film collection, so we can find all the films of a given director. Update these classes to have a slot called collection, which is initialized when we create film and director instances.

Explore how a Director instance can return the list of films for which that instance is one of the directors.

Add an action to open a the IMDB webpage

Add the action to the Film entity.

For inspiration, see, for example AbstractFileReference>>#gtActionWebBrowseFor: gtActionWebBrowseFor: anAction <gtAction> self exists ifFalse: [ ^ anAction noAction ]. ^ anAction button icon: BrGlamorousIcons go; tooltip: 'Open in OS'; action: [ WebBrowser openOn: self fullName ] and AWebPage>>#gtActionWebBrowseFor: gtActionWebBrowseFor: anAction <gtAction> ^ anAction button tooltip: 'Open in browser'; priority: 10; icon: BrGlamorousIcons go; action: [ WebBrowser openOn: self url ] .

Add search facilities

Add Spotter searches to the domain classes, starting with the FilmCollection.

For inspiration, see how spotter searchers are implemented for File references.

#gtSearch gtPragmas  & AbstractFileReference gtMethodsInClass

A Spotter search for a collection should return matching films, or directors.

Introduce collection wrappers for searches

The result of a search should also be moldable, so make each of them a Moldable Collection Wrapper. Lists of films or directors should be moldable objects with custom views. Use the TGtGroupWithItems Trait named: #TGtGroupWithItems uses: TGtGroup + TGtGroupItems instanceVariableNames: '' package: 'GToolkit-Utility-System' trait to give the wrapper collection methods.

For inspiration look at the MarkdownWebsite AWebsite subclass: #MarkdownWebsite instanceVariableNames: 'navigationMenuLinks' classVariableNames: '' package: 'GToolkit-Demo-WebsiteExplorer-Model' class hierarchy. A website holds a collection of pages, but so does a web page group. Both use the shared trait TWebPageGroup Trait named: #TWebPageGroup instanceVariableNames: '' package: 'GToolkit-Demo-WebsiteExplorer-Model' . In this way when we get the result of a Spotter search which is a list of pages, it has the same custom views as a website. We can do something similar so that the set of films of a director, or the result of any query will be a list of films with the same views as a FilmCollection.

Further exploration

The data exported from IMDB is missing lots of important information about movies. There exists an IMDB programmer API but it requires an AWS account and more.

There exists several other Movie APIs that you might want to explore to enrich the bare bones model we have now.