Querying SmaCC abstract syntax trees (ASTs)

SmaCC is a parsing engine. It comes out of the box with various parsers (see How to parse sources in different languages)..

One use of it is for code analysis purposes.

The result of parsing is an abstract syntax tree AST Let's consider this JavaScript source code (on the left) and its associated AST (on the right). Click around to navigate between the two perspectives.

The AST is, like its name says, a tree. In SmaCC, all nodes inherit from SmaCCParseNode Object subclass: #SmaCCParseNode instanceVariableNames: 'parent attributes' classVariableNames: '' package: 'SmaCC_Runtime' . This class provides an interesting query API.

For example, let's assume you want to find all the defined functions from the AST. You can just use SmaCCParseNode>>#// // aName ^ self asQueryResult // aName :

jsFunctions
	<gtExample>
	| functions |
	functions := self jsAST // 'function'.
	self assert: functions size = 2.
	self assert: functions notEmpty.
	self assert: functions isEmpty not.
	self
		assert: (functions allSatisfy: [ :each | each isKindOf: JSFunctionNode ]).
	^ functions
    

The above queries looks for nodes in the tree that have the function in their name. In this case, we will get instances of JSFunctionNode JSExpressionNode subclass: #JSFunctionNode instanceVariableNames: 'functionToken left restParameter parameters commas right leftBrace statements rightBrace name colonToken type typeParameters' classVariableNames: '' package: 'SmaCC_Javascript_Parser' . But sometimes we want more elaborate ways to search.

How about this: find all function calls that have as parameter a literal whose value equals 10. The example below does exactly that:

jsCallsWith10AsArg
	<gtExample>
	| calls |
	calls := self jsAST // 'literal' @ 'value' <- '10' \\ 'callexpression'.
	self assert: calls size = 2.
	^ calls
    

Let's unwrap what happens here.

First, we look for literals.

jsLiterals
	<gtExample>
	| calls |
	calls := self jsAST // 'literal'.
	self assert: calls size equals: 9.
	^ calls
    

Then we look at literals with value 10. We do this by composing the previous query with SmaCCParseNode>>#@ @ aName ^ self asQueryResult @ aName for getting the value attribute which is then combined with GtSmaCCAttributeQuery>>#<- <- aValue | results | results := GtSmaCCNodeQuery new. query do: [ :each | each tokenVariables do: [ :selector | (self does: attributeName match: selector) ifTrue: [ (each perform: selector) ifNotNil: [ :token | (self does: aValue match: token source) ifTrue: [ results addNode: each ] ] ] ]. each compositeTokenVariables do: [ :selector | (self does: attributeName match: selector) ifTrue: [ (each perform: selector) ifNotNil: [ :tokens | tokens do: [ :token | (self does: aValue match: token source) ifTrue: [ results addNode: each ] ] ] ] ] ]. ^ results which checks for specific values.

jsLiteralsWithValue10
	<gtExample>
	| calls |
	calls := self jsAST // 'literal' @ 'value' <- '10'.
	self assert: calls size = 2.
	^ calls
    

This gives us two interesting literals, but we want the calls, not the literals. So, at the end of the query we traverse parents using SmaCCParseNode>>#\ \ anObject ^ self asQueryResult \ anObject .