Parsing SPL tokens


We illustrate how to define very basic parsers with PetitParser by defining parsers for the tokens of SPL, as part of the PetitParser SPL case study. We also see most of the operators available for modifying and composing parsers.

Booleans (choice)

Boolean values are just true and false. We can recognize these with the following parsers:

'true' asPParser.
'false' asPParser.

Each of these expressions returns a parser that will recognize just a single string.

We can combine them into a single Boolean parser using the choice operator.

boolean := 'true' asPParser / 'false' asPParser.

This parser will recognize both values, starting first with 'true'.

NB: Note that the order of the choice operator is important. The first choice to succeed will win.

We can validate that this actually parses boolean values.

boolean parse: 'true'.

Integers (sequence, optional, plus, not)

We can recognize numbers by recognizing in sequence an optional minus sign, followed by any non-zero number of digits. We use of the comma (,) operator to compose parsers in sequence.

integer := $- asPParser optional , #digit asPParser plus , $. asPParser not.

An integer must not be followed by a period, so we apply the not operator the the $. parser. Not ethat this will not consume any input.

integer parse: '42'.


Floats are similar, but have a period between the digits.

float := $- asPParser optional , #digit asPParser plus , $. asPParser
		, #digit asPParser plus.


Finally numbers are a choice between integers and floats.

number := integer / float.
number parse: '-3.14'.


Note that the rules for integer and float are disjoint due to the use of $. asPParser not at the end of the integer rule. This means there is no ambiguity. However there will be some backtracking in the case we parse a float: the integer rule will match until the period is encountered, at which point the integer rule will fail, and then float rule will be tried.


Strings are straightforward, as SPl does not specify any escape sequences.

string := $" asPParser , $" asPParser negate plus , $" asPParser.

Keywords vs identifiers

We will want to recognize keywords, so we can distinguish them from identifiers for variable names.

keyword := ('var' asPParser , #letter asPParser not)
		/ ('if' asPParser , #letter asPParser not)
		/ ('else' asPParser , #letter asPParser not)
		/ ('while' asPParser , #letter asPParser not)
		/ ('true' asPParser , #letter asPParser not)
		/ ('false' asPParser , #letter asPParser not)
		/ ('and' asPParser , #letter asPParser not)
		/ ('or' asPParser , #letter asPParser not).

Then we can specify a rule for identifiers as follows:

identifier := keyword not, #letter asPParser, #word asPParser star.

Now andy should be recognized as variable, not a keyword. We use the end operator to ensure that the rule consumes all its input.

identifier end parse: 'andy'.