SmaCC scopes

There are three scopes default, one and two. default always exists and is what is used when there is no scope specified on the token. one is defined with %scopes so it is like default and includes everything where no scope is specified. two is defined by %excludes and only includes tokens where that is specified. All tokens defined by keywords " " are included by in all scopes:

        
%scopes default one;
%excludes two;
<a> : a ;
one <b> : b ;
two <c> : c ;
Start 
    : "0" Abcd
    | SwitchToOne "1" Abcd
    | SwitchToTwo "2" Abcd
    ;
Abcd
    : (<a> | <b> | <c> | "d")+
    ;
SwitchToTwo
    : [self scope: #two]
    ;
SwitchToOne
    : [self scope: #one]
    ;

Given that definition we can parse these strings 0ad, 1abd, and 2cd. One important part of the grammar is the code that switches scopes. It needs to be performed before the grammar specifies the token that switches scopes since the parser has one token lookahead (i.e., SwitchToOne has to come before "1" and not after).

Scopes by themselves can't solve the parentheses in the urls problem. The problem there is that the ending character of the url is ), so we don't know if the ) is part of the url or the character marking the end of the url. You might want to say that it should accept balanced (), but you can't accept that with the scanner as it only allows regular expressions. We could accept the balanced () with the parser, but that makes things more complicated and we would need scopes for that. Basically, we would need a scope that had a token for valid characters that weren't () and tokens for ( and ).

One option is to use space as the end character.