Exploring a publications database

Motivation

This exercise is similar to Exploring IMDB movie lists, but with a richer domain model of bibliographic data for academic publications.

Tasks

The tasks are similar to the previous exercise, so we just outline them.

Get the data and parse it

You can get bibtex files from various sources, such as DBLP.

Here in the GT image is a small bibtex file which has also been translated to JSON.

FileLocator gtResource / 'feenkcom' / 'gtoolkit-demos' / 'data' / 'bibtex'
  

You can either parse the bibtex file with the (experimental) BibtexPParser PP2CompositeNode subclass: #BibtexPParser instanceVariableNames: 'type citeKey fieldName rawFieldString fieldCharWithinQuotes fieldStringWithinQuotes fieldValueInQuotes fieldCharWithinBraces fieldStringWithinBraces fieldValueInBraces fieldValue field fields bibEntry bibFile' classVariableNames: '' package: 'GToolkit-Demo-PetitParser-PetitParser' parser:

BibtexPParser new optimize
	parse: (FileLocator gtResource / 'feenkcom' / 'gtoolkit-demos' / 'data' / 'bibtex'
			/ 'scg-pub.bib') contents.
  

or you can parse the JSON version with STON Object subclass: #STON instanceVariableNames: '' classVariableNames: '' package: 'STON-Core-Facade' or NeoJSONReader NeoJSONMapper subclass: #NeoJSONReader instanceVariableNames: 'readStream stringStream listClass mapClass propertyNamesAsSymbols' classVariableNames: '' package: 'Neo-JSON-Core' .

STON
	fromString: (FileLocator gtResource / 'feenkcom' / 'gtoolkit-demos' / 'data' / 'bibtex'
			/ 'scg-pub.json') contents.
  

Have a look and decide which version you prefer to work with.

Wrap the data

Wrap the data as a BibliographyCollection instance. Introduce a JSON view of the raw data.

Create domain entities

Explore the data, and start to introduce domain entities for Publications and Authors, and possibly others, such as Publishers and Keywords.

Add views that allow you to navigate from a publications to authors, from authors to their co-authords and publications, and so on. Note that each entities will have to be able to navigate back to the source bibliography collection.

Add custom actions to download the PDF

For publications with Url download links, add a custom action to open the link in a WebBrowser Object subclass: #WebBrowser instanceVariableNames: '' classVariableNames: '' package: 'WebBrowser-Core-Base' . See FileReference>>#gtOpenWebBrowserActionFor: gtOpenWebBrowserActionFor: anAction <gtAction> (self isDirectory and: [ (self filesMatching: 'index.html;index.htm') size isZero not ]) ifFalse: [ ^ anAction noAction ]. ^ anAction button tooltip: 'Open in web browser'; icon: BrGlamorousVectorIcons link; action: [ :aButton :aTab | | someFiles | someFiles := self filesMatching: 'index.html;index.htm'. someFiles ifNotEmpty: [ WebBrowser openOn: someFiles anyOne asZnUrl ] ] for an example of an optional action (noAction) in case there is no Url field.

Consider adding an http check for publications to indicate whether the link is valid. Have a look at WebLink>>#checkHttpStatus checkHttpStatus [ ZnClient new enforceHttpSuccess; timeout: 1; head: self path; response. self status: self httpOkStatus ] onErrorDo: [ :ex | self status: ex printString ] to see how to check a link.

Add search facilities

Add Spotter searches to the BibliographyCollection to search for Publications matching some string.

Introduce collection wrappers

Collections of publications, authros etc should also support custom views. Wrap each of these entities as a Moldable Collection Wrapper so they can be molded.

Add some visualizations

Consider which visualizations might be interesting, such as the co-author graph, perhaps with the size of an author node reflecting the number of publications.

Have a look at the Map view of the MarkdownWebsiteExamples>>#onWebsiteOct23 onWebsiteOct23 "This example is a reduced snapshot of the actual GitHub pages website, just for testing purposes." <gtExample> | webUrl repoDir navMenuLinks website | webUrl := 'https://www.oscar.nierstrasz.org'. "NB: Use this instead of FileLocator ..., which apparently only works in a development image." repoDir := (FileLocator gtResource / 'feenkcom' / 'gtoolkit-demos' / 'data' / 'onierstrasz.github.io') asFileReference. navMenuLinks := {'/cv/'. '/low-road-blog/'. '/oddsAndEnds/'. '/publications/'. '/talks/'. '/teaching/'}. website := MarkdownWebsite new repoDir: repoDir; url: webUrl; navigationMenuLinks: navMenuLinks; yourself. self assert: website pages size equals: 22. self assert: website links size equals: 373. self assert: website httpLinks size equals: 288. self assert: website internalLinks size equals: 85. self assert: website missingLinks size equals: 70. self assert: website reachable size equals: 15. self assert: website unreachable size equals: 7. self assert: website rootPages size equals: 5. ^ website example for inspiration.

Further exploration

Have a look at exploring the Zotero Web API for bibliographic data from within GT.

Draw inspiration from Working with a REST API: the GitHub case study to see how you would do this.