Rewriting Pharo code by example

Pharo code can be transformed using the RBParseTreeRewriter RBParseTreeSearcher subclass: #RBParseTreeRewriter instanceVariableNames: 'tree' classVariableNames: '' package: 'AST-Core-Matching' class. In its basic form, you can specify an expression using RBParseTreeRewriter>>#replace:with: replace: searchString with: replaceString "Add a new replace pattern. To get the replacement executed invoke executeTree: method." self addRule: (RBStringReplaceRule searchFor: searchString replaceWith: replaceString) and then use the RBParseTreeRewriter>>#executeTree: executeTree: aParseTree "Replace the argument node based on the replace rules. Return false when no transformation has been applied, and true when a transformation occured. Pay attention the method is not recompiled. Just the tree is modified. Look at class comment to see how the method can be compiled" "here is a little script showing a possible way to use executeTree: | rewriter node | rewriter := RBParseTreeRewriter new. rewriter replace: 'self halt' with: 'self dormatHalt'. node := (ProtoObjectTest>>#testIfNil) parseTree. rewriter executeTree: node. ^ node formattedCode " | oldContext | oldContext := context. context := SmallDictionary new. answer := false. tree := self visitNode: aParseTree. context := oldContext. ^answer method to transform the tree. In the following example, we transform all occurrances of collection at: 2 to be collection second.

rewriteToSecond
	<gtExample>
	| before after rewriter |
	before := 'dist: collection
					^((collection first * collection first) + ((collection at: 2) * (collection at: 2))) sqrt'.
	after := 'dist: collection
					^((collection first * collection first) + (collection second * collection second)) sqrt'.
	rewriter := RBParseTreeRewriter new.
	rewriter replace: 'collection at: 2' with: 'collection second'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

The real power of the rewriter is when you start using patterns to match code. Patterns start with a backquote character and contain a name. In the following example, we extend the previous example to match any variable node instead of only collection.

rewriteVariableAtTwoToSecond
	<gtExample>
	| before after rewriter |
	before := 'sum: firstCollection and: secondCollection
					^{firstCollection first + secondCollection first. (firstCollection at: 2) + (secondCollection at: 2)}'.
	after := 'sum: firstCollection and: secondCollection
					^{firstCollection first + secondCollection first. firstCollection second + secondCollection second}'.
	rewriter := RBParseTreeRewriter new.
	rewriter replace: '`col at: 2' with: '`col second'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

Matching any variable is good, but sometimes we want to be able to match any expression object. For these cases, we can add a @ character to our pattern node. In the following example, we modify the previous rewrite to take any expression as the receiver of the at: 2 message:

rewriteAtTwoToSecond
	<gtExample>
	| before after rewriter |
	before := 'sum: aCollection
					^aCollection first + (aCollection at: 2)'.
	after := 'sum: aCollection
					^aCollection first + aCollection second'.
	rewriter := RBParseTreeRewriter new.
	rewriter replace: '`@col at: 2' with: '`@col second'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

When a match is found, the rewriter creates a new AST based on the replacement pattern AST. When a pattern node is used in the replacement, it copies node that was matched by the pattern node in the search pattern. One question that can arise when the replacement occurs is what should happen if the AST matching a pattern node contains other matches. Should we continue replacements in these AST nodes, or should we only replace at the top level? By default, the rewriter only replaces nodes at the top level. Therefore, if you were to run the rewriter from the previous example on an expression like (collection at: 2) at: 2, you would get (collection at: 2) second. The inner at: 2 message would not be rewritten. Adding a second backquote character to the pattern node, makes the rewriter recursively look for more matches inside the pattern node being copied:

rewriteAllAtTwoToSecond
	<gtExample>
	| before after rewriter |
	before := 'sum: aCollection
					^aCollection first first + ((aCollection at: 2) at: 2)'.
	after := 'sum: aCollection
					^aCollection first first + aCollection second second'.
	rewriter := RBParseTreeRewriter new.
	rewriter replace: '``@col at: 2' with: '``@col second'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

In addition to matching variables and any object expression, we can also match just literal nodes by using the hash (#) character. In the following example, we add a printString message send to each top level literal node. If we used double backquote in the literal pattern, we would search inside of the literal array to find more matches.

literalMatches
	<gtExample>
	| before after rewriter |
	before := 'someMethod
					^{} class someInstance odd ifTrue: [1] ifFalse: [#(1)]'.
	after := 'someMethod
					^{} class someInstance odd ifTrue: [1 printString] ifFalse: [#(1) printString]'.
	rewriter := RBParseTreeRewriter new.
	rewriter replace: '`#l' with: '`#l printString'.	"Adding the # character matches only literal characters."
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

Using patterns, we can also match message sends where we don't know the selector or number of arguments. If we use the backquote and @ with an argument, we can match any message send. For example, suppose we wish to perform a rewrite where messages sent to a variable (collection) go through an accessor method. If the variable is not a receiver of message, then we should not convert it to use the accessor. In this case, we can match any message sent to the variable and only rewrite those matches.

rewriteToAccessorWhenReceiverOfMessage
	<gtExample>
	| before after rewriter |
	before := 'someMethod
					self doSomethingWith: collection.
					collection do: [:each | Transcript print: each; cr].
					^ collection'.
	after := 'someMethod
					self doSomethingWith: collection.
					self myCollection do: [:each | Transcript print: each; cr].
					^ collection'.
	rewriter := RBParseTreeRewriter new.
	rewriter
		replace: 'collection `@selector: `@args'
		with: 'self myCollection `@selector: `@args'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

Cascade messages are another type of node that has a custom rewrite syntax. Adding a ; character to a pattern message in a c ascade makes it match any message. If it is combined with an @ character, then it matches 0 or more messages that occur in the cascade. In the following example, we create a rewrite that swaps two add: messages in a cascade.

swapAddMessages
	<gtExample>
	| before after rewriter |
	before := 'createSet
			^ PluggableSet new
				hashBlock: [:each | each hash hashMultiply];
				add: 1;
				add: 2;
				yourself'.
	after := 'createSet
			^ PluggableSet new
				hashBlock: [:each | each hash hashMultiply];
				add: 2;
				add: 1;
				yourself'.
	rewriter := RBParseTreeRewriter new.
	rewriter
		replace: '`@set `@;messagesBefore; add: `@first; add: `@second; `@;messagesAfter'
		with: '`@set `@;messagesBefore; add: `@second; add: `@first; `@;messagesAfter'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

Sometimes just matching the AST structure isn't enough, we may want to do a little more testing before deciding to perform a rewrite. The rewriter allows one to use `{} pattern code blocks to further limit what matches. On the searching side, each pattern code block takes two arguments. The first argument is the node that is being matched, and the second is a dictionary that contains the pattern matches that have been made. The pattern block returns true if it matches, and false otherwise. On the replacement side, the pattern block takes a single dictionary argument and returns a program node object. In the following code, we only rewrite "isOpaque" messages to "isTranslucent not" if they are sent to something that looks like a color (i.e., has color in the source):

convertColorIsOpaque
	<gtExample>
	| before after rewriter |
	before := 'isOpaque
				^ element isOpaque | color isOpaque | self myColor isOpaque'.
	after := 'isOpaque
				^ element isOpaque | color isTranslucent not | self myColor isTranslucent not'.
	rewriter := RBParseTreeRewriter new.
	rewriter
		replace: '`{:node :dict | dict at: ''receiver'' put: node. ''*color*'' match: node sourceCode} isOpaque'
		with: '`{:dict | (dict at: ''receiver'') copy} isTranslucent not'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

The search pattern block nodes can also be used like a message sent to a node. In these cases, the pattern block node argument is the node that was matched to the receiver. As an example, consider this alternative for the previous example that uses a pattern message block and makes the rewrite expression simpler:

convertColorIsOpaqueAlternative
	<gtExample>
	| before after rewriter |
	before := 'isOpaque
				^ element isOpaque | color isOpaque | self myColor isOpaque'.
	after := 'isOpaque
				^ element isOpaque | color isTranslucent not | self myColor isTranslucent not'.
	rewriter := RBParseTreeRewriter new.
	rewriter
		replace: '``@node `{:node :dict | ''*color*'' match: node sourceCode} isOpaque'
		with: '``@node isTranslucent not'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

So far we have only looked at replacing expressions, but sometimes we need to work across multiple statements in the search or replace expressions. In these cases, we need to match a sequence node since they are the only AST nodes that contain statements. To match sequence nodes we need to specify the temporaries as well as the statements. To match the temporaries, we can use pattern variable nodes and actual temporary variable nodes. A pattern variable node will match a single temporary, but adding a @ character to the pattern variable makes it match 0 or more temporaries. Similarly, we can use patterns to match statement nodes. If we add a . character to our pattern expression, it makes the pattern match any statement. Additionally, if we add a @ character to the . character, then we can match 0 or more statements. In the following example, we convert two identical statements into a 2 timesRepeat: statement:

twoTimesRepeat
	<gtExample>
	| before after rewriter |
	before := 'method
				| oc |
				oc := OrderedCollection new.
				oc add: 1.
				oc add: 1.
				^oc'.
	after := 'method
				| oc |
				oc := OrderedCollection new.
				2 timesRepeat: [oc add: 1].
				^oc'.
	rewriter := RBParseTreeRewriter new.
	rewriter
		replace: '| `@temps | 
					`@.StatementsBefore.
					`@Expression.
					`@Expression.
					`@.StatementsAfter'
		with: '| `@temps | 
					`@.StatementsBefore.
					2 timesRepeat: [`@Expression].
					`@.StatementsAfter'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree
    

Finally, we can match blocks and methods using patterns. For a block, we specify the arguments using actual variable names or pattern variables. If the pattern variable contains an @ character, then it matches 0 or more argument variables. For methods, we need to specify the method name. Either we can use actual message names or we can use pattern keywords like we did for pattern messages above. Inside the block or method, we need to specify the sequence node to match the whole block or method. The following example converts an at:put: method into a put:at: method by switching the arguments. To setup the rewriter we need to use the RBParseTreeRewriter>>#replaceMethod:with: replaceMethod: searchString with: replaceString self addRule: (RBStringReplaceRule searchForMethod: searchString replaceWith: replaceString) message. This lets the parser know that the expression provided is a method and not an expression.

convertAtPutMethodToPutAt
	<gtExample>
	| before after rewriter |
	before := 'at: anIndex put: anObject
				^collection at: anIndex put: anObject'.
	after := 'put: anObject at: anIndex
				^collection at: anIndex put: anObject'.
	rewriter := RBParseTreeRewriter new.
	rewriter
		replaceMethod: 'at: `arg1 put: `arg2 | `@temps | `@.Stmts'
		with: 'put: `arg2 at: `arg1 | `@temps | `@.Stmts'.
	self assert: (rewriter executeTree: (RBParser parseMethod: before)).
	self assert: rewriter tree equals: (RBParser parseMethod: after).
	^ rewriter tree