Debugging SPL grammar rules

TL;DR

We continue to script the grammar rules of SPL in the PetitParser SPL case study. We show how to debug the rules with simple test cases.

Debugging expressions

We incrementally implement the rules of the SPL grammar, continuing with expressions.

We start with incomplete rules, so we can test them bit by bit.

integer := #digit asPParser plus , $. asPParser not.
float := $- asPParser optional , #digit asPParser plus , $. asPParser
		, #digit asPParser plus.
number := integer / float.
boolean := 'true' asPParser / 'false' asPParser.

primary := boolean / number .	"incomplete"
unary := primary.	"incomplete"
factor := unary , ($/ asPParser / $* asPParser , unary) star.

We can (manually) test the factor rule so far as follows:

factor end parse: '2*6/12'.

Note that we apply end to the factor parser so we test that all the input is consumed.

When we later extract a class from our parser script, we can turn out test scripts into proper regression tests using example methods.

Suppose we try the following:

factor end parse: '2*3+6'.

Clearly this will fail. You can inspect the Execution Traces view, and walk through it. If you click and explore the Stream State view of each Debug Result object in the execution trace, you can walk through and see exactly where the parse has failed. (The + character is not expected.)

Recursive rules

After a few iterations we will need to define recursive rules. As shown in Parsing with PetitParser2, this can be done in a script by first defining a rule as an instance of PP2UnresolvedNode PP2Node subclass: #PP2UnresolvedNode instanceVariableNames: '' classVariableNames: '' package: 'PetitParser2-Core' , and then redefining it with def:, as follows:

integer := #digit asPParser plus , $. asPParser not.
float := $- asPParser optional , #digit asPParser plus , $. asPParser
		, #digit asPParser plus.
number := integer / float.
boolean := 'true' asPParser / 'false' asPParser.

primary := boolean / number.	"incomplete"
unary := PP2UnresolvedNode new.	"recursive"
negatedUnary := $! asPParser / $- asPParser , unary.
unary def: negatedUnary / primary.

Note how we first defined unary as an unresolved parser node, and then redefined it recursively after it was used to define negatedUnary.

unary end parse: '-3.14'.

Trimming whitespace

So far so good, but we cannot parse input with unexpected whitespace. This fails.

unary end parse: '-3.14 '.

We can apply trim to each parser that may be surrounded by whitespace.

integer := #digit asPParser plus , $. asPParser not trim.
float := $- asPParser optional , #digit asPParser plus , $. asPParser
		, #digit asPParser plus trim.
number := integer / float.
boolean := 'true' asPParser trim / 'false' asPParser trim.

primary := boolean / number.
unary := PP2UnresolvedNode new.
negatedUnary := $! asPParser / $- asPParser , unary.
unary def: negatedUnary / primary.

Now the test case passes.

unary end parse: '-3.14 '.

Trimming comments

We would also like to consider // ... style comments to be whitespace. In order to handle this, we should define our own whitespace rule:

comment := '//' asPParser , #newline asPParser negate star.
ignorable := (comment / #space asPParser) star.

And now, instead of applying trim, we apply trim: ignorable wherever we allow comments or whitespace to be ignored.

comment := '//' asPParser , #newline asPParser negate star.
ignorable := (comment / #space asPParser) star.
integer := #digit asPParser plus , $. asPParser not trim: ignorable.
float := $- asPParser optional , #digit asPParser plus , $. asPParser
		, #digit asPParser plus trim: ignorable.
number := integer / float.
boolean := ('true' asPParser trim: ignorable)
		/ ('false' asPParser trim: ignorable).

primary := boolean / number.
unary := PP2UnresolvedNode new.
negatedUnary := $! asPParser / $- asPParser , unary.
unary def: negatedUnary / primary.

Now we can handle comments in expressions:

unary end parse: '-3.14 // a negative pi'.