Using PetitParser to build an AST

TL;DR

We now have a PetitParser class SPLGrammar PP2CompositeNode subclass: #SPLGrammar instanceVariableNames: 'keyword identifier boolean integer float number string varDecl primary unary factor term comparison equality logicAnd logicOr assignment expression exprStmt printStmt ifStmt whileStmt block statement declaration program parenthesizedExpression negatedUnary assignmentExpression comment ignorable' classVariableNames: '' package: 'GToolkit-Demo-SPL-PetitParser' that recognizes SPL programs as part of our PetitParser SPL case study. We now extend the grammar rules with actions to build an AST that we can then use to implement an interpreter.

Adding PetitParser actions in a subclass

Although we could directly modify the SPLGrammar PP2CompositeNode subclass: #SPLGrammar instanceVariableNames: 'keyword identifier boolean integer float number string varDecl primary unary factor term comparison equality logicAnd logicOr assignment expression exprStmt printStmt ifStmt whileStmt block statement declaration program parenthesizedExpression negatedUnary assignmentExpression comment ignorable' classVariableNames: '' package: 'GToolkit-Demo-SPL-PetitParser' productions to add actions, for a non-trivial grammar it can be a good idea to add actions instead in a subclass. This will (1) clearly separate the concerns of recognizing the grammar from building the AST, and (2) allow us later to add other subclasses that perform different actions, such as pretty-printing, or generating a different intermediate representation for SPL programs.

We accordingly define SPLParser SPLGrammar subclass: #SPLParser instanceVariableNames: '' classVariableNames: '' package: 'GToolkit-Demo-SPL-PetitParser' as a subclass, and we also introduce SPLParserExamples SPLGrammarExamples subclass: #SPLParserExamples instanceVariableNames: '' classVariableNames: '' package: 'GToolkit-Demo-SPL-Examples' as a subclass of SPLGrammarExamples PP2CompositeNodeExamples subclass: #SPLGrammarExamples instanceVariableNames: '' classVariableNames: '' package: 'GToolkit-Demo-SPL-Examples' , so that we can reuse and extend the examples and add new kinds of assertions for the AST generation rules.

PetitParser actions are added to a parser (or any sub-parser) using the ==> operation. The right-hand side is a one-argument block that transforms the result of the parser into another kind of object. Since we inherit all the productions from the superclass, pretty much all the methods of SPLParser SPLGrammar subclass: #SPLParser instanceVariableNames: '' classVariableNames: '' package: 'GToolkit-Demo-SPL-PetitParser' look like this example: SPLParser>>#boolean boolean ^ super boolean ==> [ :node | SPLBoolean for: node = 'true' ]

We inherit the boolean production from the superclass, and adorn it with an action that returns an SPLBoolean SPLValue subclass: #SPLBoolean instanceVariableNames: '' classVariableNames: '' package: 'GToolkit-Demo-SPL-Interpreter' AST node for the (Smalltalk) Boolean value true or false. (Note that the expression node = 'true' will evaluate to an actual Smalltalk Boolean value.)

Sometimes we must extract some information of interest from the argument to the action block. For example, for the print statement, we are only interested in the second part recognized by the parser: SPLParser>>#printStmt printStmt ^ super printStmt ==> [ :node | SPLPrintStatement for: node second ]

If you expand the inherited production (click on the grey triangle to the right of super printStmt), you will see that node second corresponds to the expression of the print statement. We can safely discard the print keyword and the semi-colon.

Some productions require no action: SPLParser>>#statement statement ^ super statement

All the actions are pretty much the same. The only complicated one is for if statements, which have an option else part: SPLParser>>#ifStmt ifStmt "NB: the optional else part is the sixth element" ^ super ifStmt ==> [ :node | node sixth ifNil: [ SPLIfStatement if: node third then: node fifth ] ifNotNil: [ SPLIfElseStatement if: node third then: node fifth else: node sixth second ] ]

The SPL AST hierarchy

The AST nodes form a hierarchy.

Each node has a dedicated constructor, which may be inherited from an abstract superclass.

For example, all SPLValue SPLExpression subclass: #SPLValue instanceVariableNames: 'value' classVariableNames: '' package: 'GToolkit-Demo-SPL-Interpreter' subclasses share the constructor SPLValue>>#for: for: aValue ^ self new value: aValue

Pretty printing

In this phase we just build the nodes, providing accessors for their state, and we implement pretty printing.

Statements have a SPLStatement>>#printOn:indentBy: printOn: aStream indentBy: indentlevel self subclassResponsibility method that will print the statement correctly indented.

For example, a print statement just prints the right number of indents, followed by print, the expression pretty-printed, and a semi-colon. SPLPrintStatement>>#printOn:indentBy: printOn: aStream indentBy: indentlevel aStream nextPutAll: (self indent: indentlevel); nextPutAll: 'print '; print: self expression; nextPut: $;

Testing

There is not much yet to test, except that the right values are stored in the AST nodes, and that pretty printing works correctly. We implement tests for all the productions that focus on pretty printing.

Here's an example:

whileProgram
	<gtExample>
	| result |
	result := super whileProgram.
	self assert: result printString equals: 'var a = 1;
while (a<10) {
  print a;
  a = (a+1);
}
'.
	^ result