The Anubis Project The Anubis Parser Maker Copyright (c) Alain Prouté 2006. Author: Alain Prouté From a grammar, APM (the 'Anubis Parser Maker') generates an Anubis source file containing a program (called a 'parser') able to recognize sentences of the corresponding language. APM is very similar to the well known UNIX tool 'YACC' (or its GNU equivalent 'BISON'). ------------------------------------------- Contents ---------------------------------- *** (1) Grammars and languages. *** (1.1) In theory. *** (1.2) In APM source files. *** (1.3) Some standard tools. *** (2) Reading APM source files. *** (2.1) Reading characters. *** (2.2) Reading meta-tokens. *** (2.3) Reading precedence and association rules. *** (2.4) Reading grammar rules. *** (2.5) Finding non terminals. *** (2.6) Gathering the informations read. *** (2.7) Proceeding the whole source file. *** (3) Making the parser automaton. *** (3.1) Computing 'First'. *** (3.2) Scenarii. *** (3.3) States. *** (3.4) Testing for similarity. *** (3.5) Saturating states. *** (3.6) The initial state. *** (3.7) Transitions. *** (3.8) Generating the states. *** (3.9) Making the automaton. *** (4) Reworking the automaton. *** (4.1) Numbering states and adding transitions lists. *** (4.2) Removing unneeded lookaheads, and separating scenarii. *** (4.3) Making decisions. *** (4.4) Reporting conflicts. *** (4.5) Making a trace file. *** (5) Making the output file. *** (5.1) Printing tools. *** (5.2) Performing reductions. *** (5.3) States as functions. *** (6) Putting it all together. (this is still under construction) --------------------------------------------------------------------------------------- *** (1) Grammars and languages. *** (1.1) In theory. We have two finite (and disjoint) sets of symbols: 'tokens' (also called 'terminals') and 'non terminals'. Here are our notational conventions (used in these explanations only, not in APM source files): a, b, c,... represent tokens A, B, C,... represent non terminals X, Y, Z,... represent arbitrary grammar symbols (tokens and non terminals), u, v, w,... represent arbitrary sequences of grammar symbols e represent the empty sequence of grammar symbols $ is the end marker (a special additional token) A 'grammar rule' (or 'production') has the form: A -> u (this one is called an 'A-production'). In other words, it has a non terminal on the left of the arrow, and a (possibly empty) sequence of grammar symbols on the right of the arrow. Its meaning is that we can produce an expression 'of type' 'A', by concatenating expressions of types X_1...X_k, where u = X_1...X_k. In this interpretation, tokens represent themselves. A 'grammar' is a finite set of grammar rules, together with a distinguished non terminal (denoted 'S' in these explanations), called the 'axiom'. The 'language' associated to the grammar is the set of all sequences of tokens which may produce 'S' (we also say that they are 'instances' of 'S'). For our convenience, we assume that there is one and only one S-production, and that it has the form: S -> A. Furthermore, S cannot appear in the right hand member of a production. It is trivial to replace a given grammar by a grammar fulfilling these conditions, by adding a new non terminal S, and the single new rule S -> A, where A is the axiom of the original grammar. This operation does not change the corresponding language. It is realized below by the function 'add_S_rule'. *** (1.2) In APM source files. Of course, we need to read grammars from a source file (an APM source file). The denotation for grammars in APM source files is somewhat more complicated, because we must take the values of grammar symbols into account. Indeed, in practice, terminals and non terminals may have values. Hence, we have an Anubis type (the type of syntactical entities) whose alternatives describe the required values (for both terminals and non terminals). When the ALG lexer returns a token, this token already has received a value. When the parser reduces a sequence X_1...X_k of grammar symbols, using the production A -> X_1...X_k, it computes the value of A from the values of X_1...X_k. Hence, the denotation for productions should allow the description of this computation. In YACC and BISON, this computation is described (in the language C) within so-called 'actions', which are post-fixed to grammar rules. In APM it is somewhat different. Since APM grammar symbols may be also names of alternatives, they may have operands, and the right hand side X_1...X_k of a production, will be written for example as: X_1(x,y) X_2(z) X_3 X_4(u,v,w) assuming in this example that the grammar symbol X_1 has two operands, X_2 one operand, X_3 no operand and X_4 three operands. In this denotation, x, y, z, u, v and w must be symbols. In the automaton produced by APM, they will become resurgent symbols. Now, the complete production A -> X_1...X_k will be denoted (assuming the same example): A(t): X_1(x,y) X_2(z) X_3 X_4(u,v,w). where t is a term (or several terms separated by commas), which may make use of the symbols x, y, z, u, v and w. Of course t will be used to compute the value of A when the reduction via this production will occur. The above rule is something like a case in a conditional, except that A(t) which plays the role of the body of case, is written on the left hand side. Hence, an APM grammar rule is described by the following self-explanatory 'meta-grammar' (the symbol between square brackets is a precedence level): GrammarRule -> Head : Body . | Head : Body [ Symbol ] . Head -> NonTerminal | NonTerminal ( Term ) Body -> /* empty */ | GrammarSymbol Body GrammarSymbol -> Symbol | Symbol ( Symbols_1 ) Symbols_1 -> Symbol | Symbol , Symbols_1 In a 'Head', APM does not read the 'Term', but just keeps track of matching parentheses (not contained within strings). Now, an APM source file has the following format: preambule (Anubis text) # # # postambule (Anubis text) Both tokens and nonterminals should be acceptable Anubis symbols. Indeed, they must also be names of alternatives in the type of syntactical entities. The name of this type is formed by the concatenation of 'SyntaxTree_' and the name of the parser. Normally it is defined by the user in the preambule. Reading APM grammars is simple enough so that we do not need to use neither ALG nor APM. *** (1.3) Some standard tools. We record here some standard tools, used in this file. define Int32 length ( List($T) l ) = if l is { [ ] then 0, [h . t] then length(t)+1 }. define Bool member ( $T x, List($T) l ) = if l is { [ ] then false, [h . t] then if h = x then true else member(x,t) }. define List($T) merge ( List($T) l1, List($T) l2 ) = if l1 is { [ ] then l2, [h . t] then if member(h,l2) then merge(t,l2) else merge(t,[h . l2]) }. *** (2) Reading APM source files. Below are the functions which enable APM to read source files. There is also some kind of a lexer. Its state is stored into a datum of type 'APM_LexerState'. This lexer keeps track of line numbers, eliminates blank characters, and tokenizes the input into a sequence of 'meta-tokens'. The meta-tokens we need to recognize in APM source files are the following: symbols terms (delimited by parentheses) : (separating head from body) . (marking the end of a rule) [symbol]. (end of rule with precedence level) # (the separator) error (corresponding to an illegal character) premature end of file (the legal end of file will be found by the function copying the postambule) They are defined as the alternatives of the type 'MetaToken'. Then, assembling tokens into precedence rules or grammar rules is rather easy. *** (2.1) Reading characters. We must read characters in an extended sens, to take the end of file into account. type ExChar: char(Int8), // normal character end_of_file. Like any other lexer, the APM lexer needs to work with a state. type APM_LexerState: lexer_state(RAddr(Int8) file, // the APM source file Int32 line, // current line number Maybe(Int8) unread). // character possibly 'unread' Here is how we read a character (returning both the new state of the lexer and the extended character). define (APM_LexerState,ExChar) read_char ( APM_LexerState ls ) = if ls is lexer_state(file,line,mbunread) then // if a character has been 'unread', it must be used. if mbunread is { failure then // if not, get one from the file if *file is { failure then (ls,end_of_file), success(c) then (lexer_state(file, // don't forget to count lines if c = '\n' then line+1 else line, // no unread character failure), char(c)) }, success(c) then (lexer_state(file, if c = '\n' then line+1 else line, // the character has been reread failure), char(c)) }. Note: 'unreading' a character is done 'by hand' by functions which need to do that. They can do it because they hold the lexer state. *** (2.2) Reading meta-tokens. While reading grammar rules, we need to recognize several kinds of meta-tokens: type MetaToken: symbol(String), // a regular Anubis symbol term(String), // terms (like 't' above, or 'x') read in as strings colon, // : (used to separate head from body) dot, // . (used to mark the end of a grammar rule) prec_level(String), // [ name ]. (precedence level for a rule; dot included) separator, // # error(Int8), // any misplaced character premature_end_of_file. // self explanatory Note: (t), (x,y), (z), etc... are seen as 'term(String)' meta-tokens. This is why parentheses do not appear in the above definition of meta-tokens. Here is a simple useful test for detecting the beginning of a symbol. define Bool may_begin_symbol ( Int8 c ) = if c = '_' then true else with c = int8_to_int32(c), if 'a' =< c then c =< 'z' else false. Another test for subsequent letters in a symbol. define Bool is_symbol_letter ( Int8 c ) = if c = '_' then true else with c = int8_to_int32(c), if 'a' =< c then c =< 'z' else if 'A' =< c then c =< 'Z' else if '0' =< c then c =< '9' else false. We need also to recognize blank characters. define Bool is_blank ( Int8 c ) = if c = ' ' then true else if c = '\t' then true else if c = '\n' then true else false. The function below reads a symbol whose first characters (at least one) have already been read, and are given in reverse order. define (APM_LexerState,MetaToken) read_symbol ( APM_LexerState ls, List(Int8) firsts // in reverse order ) = if read_char(ls) is (ls,ec) then if ec is { char(c) then if is_symbol_letter(c) then read_symbol(ls,[c . firsts]) else if ls is lexer_state(file,line,_) then // unread c, which is not part of the symbol (lexer_state(file,line,success(c)), symbol(implode(reverse(firsts)))), end_of_file then (ls,premature_end_of_file) }. The function 'read_string_within_term' is called while reading a string within a term (itself delimited by parentheses). The beginning of the term has already been read. We need to declare 'read_term', because the two functions are mutually recursive. In fact 'read_term' calls (terminally) 'read_string_in_term' when the beginning of a string is detected. Similarly, 'read_string_in_term' calls (terminally) 'read_term' when the end of that string is found. define (APM_LexerState,MetaToken) read_term ( APM_LexerState ls, List(Int8) read_so_far, // in reverse order Int32 depth // depth in parentheses ). define (APM_LexerState,MetaToken) read_string_in_term ( APM_LexerState ls, List(Int8) read_so_far, Int32 depth // not used but need to give it back to 'read_term' ) = if read_char(ls) is (ls,ec) then if ec is { char(c) then if c = '\"' then // end of string found read_term(ls,[c . read_so_far],depth) else if c = '\\' then ( if read_char(ls) is (ls,ec1) then if ec1 is { char(d) then read_string_in_term(ls,[d, c . read_so_far],depth), end_of_file then (ls,premature_end_of_file) } ) else read_string_in_term(ls,[c . read_so_far],depth), end_of_file then (ls,premature_end_of_file) }. The function below reads anything placed between balanced parentheses. The opening parenthese has already been read. define (APM_LexerState,MetaToken) read_term ( APM_LexerState ls, List(Int8) read_so_far, // in reverse order Int32 depth // depth in parentheses ) = if read_char(ls) is (ls,ec) then if ec is { char(c) then if c = ')' then if depth = 1 then (ls,term(implode(reverse(read_so_far)))) else read_term(ls,[c . read_so_far],depth-1) else if c = '(' then read_term(ls,[c . read_so_far],depth+1) else if c = '\"' then read_string_in_term(ls,[c . read_so_far],depth) else read_term(ls,[c . read_so_far],depth), end_of_file then (ls,premature_end_of_file) }. The next function tries to read the mandatory dot after '[ name ]'. define (APM_LexerState,MetaToken) read_after_prec_level ( APM_LexerState ls, String name ) = if read_char(ls) is (ls,ec) then if ec is { char(c) then if c = '.' then (ls,prec_level(name)) else (ls,error(c)), end_of_file then (ls,premature_end_of_file) }. The next function read 'name' in '[ name ]'. define (APM_LexerState,MetaToken) read_prec_level_name ( APM_LexerState ls, List(Int8) read_so_far ) = if read_char(ls) is (ls,ec) then if ec is { char(c) then if is_symbol_letter(c) then read_prec_level_name(ls,[c . read_so_far]) else if c = ']' then read_after_prec_level(ls,implode(reverse(read_so_far))) else (ls,error(c)), end_of_file then (ls,premature_end_of_file) }. The next function read the first character of name in '[ name ]'. define (APM_LexerState,MetaToken) read_prec_level ( APM_LexerState ls ) = if read_char(ls) is (ls,ec) then if ec is { char(c) then if may_begin_symbol(c) then read_prec_level_name(ls,[c]) else (ls,error(c)), end_of_file then (ls,premature_end_of_file) }. The next function reads the next meta-token from the source file, whatever this meta-token is. define (APM_LexerState,MetaToken) read_meta_token ( APM_LexerState ls, ) = if read_char(ls) is (ls,ec) then if ec is { char(c) then if may_begin_symbol(c) then read_symbol(ls,[c]) else if c = '(' then read_term(ls,[],1) else if c = ':' then (ls,colon) else if c = '.' then (ls,dot) else if c = '[' then read_prec_level(ls) else if c = '#' then (ls,separator) else if is_blank(c) then read_meta_token(ls) else (ls,error(c)), end_of_file then (ls,premature_end_of_file) }. *** (2.3) Reading precedence and association rules. Each token may be assigned a precedence level. A precedence level is an integer, but it is implicit in the APM source file. Only the order of declarations makes sens. Each declaration has one of the forms: right ... . left ... . non_assoc ... . Each declaration defines a precedence level. The first one receives the lowest precedence level (0). type PrecRule: right (List(String) names), left (List(String) names), non_assoc (List(String) names). type ReadPrecRuleResult: ok(PrecRule), separator, syntax_error, lexical_error(Int8), premature_end_of_file. The next function reads (maybe) a sequence of symbols, right delimited by a dot. define (APM_LexerState,Maybe(List(String))) read_symbols ( APM_LexerState ls, List(String) read_so_far ) = if read_meta_token(ls) is (ls,tok) then if tok is symbol(s) then read_symbols(ls,[s . read_so_far]) else if tok is dot then (ls,success(read_so_far)) else (ls,failure). Note: names are stored in reverse order, but it does'nt matter. Now, we read a precedence rule whose keyword has already been successfully read and recognized (and replaced by the corresponding constructor for type 'PrecRule'). define (APM_LexerState,ReadPrecRuleResult) read_prec_names ( APM_LexerState ls, (List(String)) -> PrecRule keyword ) = if read_symbols(ls,[]) is (ls,mbtoks) then if mbtoks is { failure then (ls,syntax_error), success(toks) then (ls,ok(keyword(toks))) }. Here, we read a precedence rule, whose keyword has been read but not yet recognized (it is only a character string at that point). define (APM_LexerState,ReadPrecRuleResult) read_after_prec_keyword ( APM_LexerState ls, String keyword ) = if keyword = "right" then read_prec_names(ls,right) else if keyword = "left" then read_prec_names(ls,left) else if keyword = "non_assoc" then read_prec_names(ls,non_assoc) else (ls,syntax_error). Finally, we read a complete precedence rule. define (APM_LexerState,ReadPrecRuleResult) read_prec_rule ( APM_LexerState ls ) = if read_meta_token(ls) is (ls,tok) then if tok is { symbol(s) then read_after_prec_keyword(ls,s), term(s) then (ls,syntax_error), colon then (ls,syntax_error), dot then (ls,syntax_error), prec_level(s) then (ls,syntax_error), separator then (ls,separator), error(c) then (ls,lexical_error(c)), premature_end_of_file then (ls,premature_end_of_file) }. Now, we must be able to read a sequence of precedence rules. This is achieved by the following function, which reads precedence rules until a separator (#) is found. type ReadPrecRulesResult: ok(List(PrecRule)), syntax_error, lexical_error(Int8), premature_end_of_file. define (APM_LexerState,ReadPrecRulesResult) read_prec_rules ( APM_LexerState ls, List(PrecRule) read_so_far ) = if read_prec_rule(ls) is (ls,result) then if result is { ok(pr) then read_prec_rules(ls,[pr . read_so_far]), separator then (ls,ok(reverse(read_so_far))), syntax_error then (ls,syntax_error), lexical_error(c) then (ls,lexical_error(c)), premature_end_of_file then (ls,premature_end_of_file) }. Now, we can construct precedence tables. The first one gives the precedence level for each token name. The second one gives the association mode for each precedence level. They are lists of the following respective types: List((String,Int32)) List((Int32,AssocMode)) type AssocMode: left, right, non_assoc. The next function constructs the table of association modes from the list of precedence rules. define List((Int32,AssocMode)) make_assoc_table ( List(PrecRule) l, Int32 level ) = if l is { [ ] then [ ], [h . t] then if h is { right(names) then [(level,right) . make_assoc_table(t,level+1)], left(names) then [(level,left) . make_assoc_table(t,level+1)], non_assoc(names) then [(level,non_assoc) . make_assoc_table(t,level+1)] } }. define List((Int32,AssocMode)) make_assoc_table ( List(PrecRule) l ) = make_assoc_table(l,0). The next function constructs the list of entries in the precedence table for just one level. define List((String,Int32)) make_precedence_entries ( List(String) names, // names for this Int32 level // level ) = if names is { [ ] then [ ], [h . t] then [(h,level) . make_precedence_entries(t,level)] }. The next function constructs the table of precedence levels from the list of precedence rules. define List((String,Int32)) make_precedence_table ( List(PrecRule) l, Int32 level ) = if l is { [ ] then [ ], [h . t] then with ns = names(h), reverse_append(make_precedence_entries(ns,level), make_precedence_table(t,level+1)) }. The next function gives the mode for a given precedence level (using the association table). define AssocMode mode ( Int32 level, List((Int32,AssocMode)) modes ) = if modes is { [ ] then alert, // all levels have modes [h . t] then if h is (n,m) then if level = n then m else mode(level,t) }. The next function checks the precedence table. It consists in verifying that the same name is not present two times, and that no non terminal has an entry in the table (we will see later how to construct the list of names of non terminals). type CheckPrecResult: ok, non_terminal_seen(String), token_redeclared(String). define Bool is_key_of ( List(($Key,$Value)) table, $Key key ) = if table is { [ ] then false, [h . t] then if h is (k,v) then if k = key then true else is_key_of(t,key) }. define CheckPrecResult check_precedence_table ( List((String,Int32)) table, List(String) non_terminals ) = if table is { [ ] then ok, [h . t] then if h is (name,level) then if member(name,non_terminals) then non_terminal_seen(name) else if is_key_of(t,name) then token_redeclared(name) else check_precedence_table(t,non_terminals) }. The next function gives the precedence level (if it exists) for a given token name. define Maybe(Int32) prec ( String name, List((String,Int32)) prec_table ) = if prec_table is { [ ] then failure, [h . t] then if h is (u,n) then if name = u then success(n) else prec(name,t) }. The same one, but for a possibly missing name. define Maybe(Int32) prec ( Maybe(String) mbname, List((String,Int32)) prec_table ) = if mbname is { failure then failure, success(name) then prec(name,prec_table) }. *** (2.4) Reading grammar rules. Grammar symbols are defined below. type Symbol: dollar, // the special end marker token: $ token(String name), // any token with its name non_terminal(String name). // any non terminal with its name Grammar rules A(t) -> u [p] (where p is a possible precedence level: actually, the name of a token) are stored as data of the following type: type GrammarRule: grammar_rule(String head, // A String term, // t List((Symbol,String)) body, // u Maybe(Int32) prec). // precedence level of p Note: in the pair (Symbol,String), the second element represents the value of the symbol (if no value is given, it is the empty string). Below is a function which reads the right hand side of a grammar rule. We need a type to handle the result of such a reading. type RightHandResult: ok(List((Symbol,String)), // a correct right hand side has been read Maybe(String)), // maybe with a precedence level syntax_error, // a syntax error has been found lexical_error(Int8), // an error has been found by the lexer premature_end_of_file. // end of file found while reading // the right hand side of a grammar rule define (APM_LexerState,RightHandResult) read_right_hand ( APM_LexerState ls, List((Symbol,String)) read_so_far, // in reverse order Maybe(Symbol) unread_symbol ) = if unread_symbol is { failure then if read_meta_token(ls) is (ls,tok) then if tok is dot then (ls,ok(reverse(read_so_far),failure)) else if tok is prec_level(name) then (ls,ok(reverse(read_so_far),success(name))) else if tok is symbol(name) then read_right_hand(ls,read_so_far,success(token(name))) else (ls,syntax_error), success(sym) then if read_meta_token(ls) is (ls,tok) then if tok is { symbol(s) then read_right_hand(ls,[(sym,"") . read_so_far],success(token(s))), term(s) then read_right_hand(ls,[(sym,s) . read_so_far],failure), colon then (ls,syntax_error), dot then (ls,ok(reverse([(sym,"") . read_so_far]),failure)), prec_level(s) then (ls,ok(reverse([(sym,"") . read_so_far]),success(s))), separator then (ls,syntax_error), error(c) then (ls,lexical_error(c)), premature_end_of_file then (ls,premature_end_of_file) } }. We also need a special type to handle all possible situations in the result of reading a grammar rule. type ReadGrammarRuleResult: ok(GrammarRule), // a grammar rule has been read successfully separator, // a separator (#) has been read syntax_error, // a syntax error has been detected lexical_error(Int8), // an error has been detected by the lexer premature_end_of_file. // end of file has been found while // reading a parser section Below is a function which reads a grammar rule whose head (including the colon) has been already read. define (APM_LexerState,ReadGrammarRuleResult) read_after_colon ( APM_LexerState ls, String head_name, String term, List((String,Int32)) prec_table ) = if read_right_hand(ls,[],failure) is (ls,rh) then if rh is { ok(rh,p) then (ls,ok(grammar_rule(head_name,term,rh,prec(p,prec_table)))), syntax_error then (ls,syntax_error), lexical_error(c) then (ls,lexical_error(c)), premature_end_of_file then (ls,premature_end_of_file) }. Below is a function which reads a grammar rule whose head has already been read (not including the colon). define (APM_LexerState,ReadGrammarRuleResult) read_after_head ( APM_LexerState ls, String head_name, String term, List((String,Int32)) prec_table ) = if read_meta_token(ls) is (ls,tok) then if tok is colon then read_after_colon(ls,head_name,term,prec_table) else (ls,syntax_error). Below is a function which reads a grammar rule whose head name has already been read. define (APM_LexerState,ReadGrammarRuleResult) read_after_head_name ( APM_LexerState ls, String head_name, List((String,Int32)) prec_table ) = if read_meta_token(ls) is (ls,tok) then if tok is { symbol(_) then (ls,syntax_error), term(t) then read_after_head(ls,head_name,t,prec_table), colon then read_after_colon(ls,head_name,"",prec_table), dot then (ls,syntax_error), prec_level(_) then (ls,syntax_error), separator then (ls,syntax_error), error(c) then (ls,lexical_error(c)), premature_end_of_file then (ls,premature_end_of_file) }. Below is a function, which reads a complete grammar rule from a file. define (APM_LexerState,ReadGrammarRuleResult) read_grammar_rule ( APM_LexerState ls, List((String,Int32)) prec_table ) = if read_meta_token(ls) is (ls,tok) then if tok is symbol(name) then read_after_head_name(ls,name,prec_table) else if tok is separator then (ls,separator) else (ls,syntax_error). type ReadGrammarRulesResult: ok(List(GrammarRule)), syntax_error, lexical_error(Int8), premature_end_of_file. define (APM_LexerState,ReadGrammarRulesResult) read_grammar_rules ( APM_LexerState ls, List(GrammarRule) read_so_far, // in reverse order List((String,Int32)) prec_table ) = if read_grammar_rule(ls,prec_table) is (ls,result) then if result is { ok(gr) then read_grammar_rules(ls,[gr . read_so_far],prec_table), separator then (ls,ok(read_so_far)), syntax_error then (ls,syntax_error), lexical_error(c) then (ls,lexical_error(c)), premature_end_of_file then (ls,premature_end_of_file) }. *** (2.5) Finding non terminals. So far, the grammar has been read, but all symbols have been stored as terminals. We must establish the list of names of all non terminals (they simply appear at the head of grammar rules, and change in grammar rules any symbol whose name matches one of these, to a non terminal. define List(String) find_non_terminals ( List(GrammarRule) l, List(String) found_so_far ) = if l is { [ ] then found_so_far, [h . t] then if h is grammar_rule(head,term,body,mbprec) then if member(head,found_so_far) then find_non_terminals(t,found_so_far) else find_non_terminals(t,[head . found_so_far]) }. define List((Symbol,String)) put_non_terminals ( List((Symbol,String)) l, List(String) names // of non terminals ) = if l is { [ ] then [ ], [h . t] then if h is (sym,term) then if sym is { dollar then alert, token(s) then if member(s,names) then [(non_terminal(s),term) . put_non_terminals(t,names)] else [h . put_non_terminals(t,names)], non_terminal(s) then alert } }. define GrammarRule put_non_terminals ( GrammarRule r, List(String) names // of non terminals ) = if r is grammar_rule(head,term,body,mbprec) then grammar_rule(head,term,put_non_terminals(body,names),mbprec). define List(GrammarRule) put_non_terminals ( List(GrammarRule) l, List(String) names // of non terminals ) = if l is { [ ] then [ ], [h . t] then [put_non_terminals(h,names) . put_non_terminals(t,names)] }. define List(GrammarRule) put_non_terminals ( List(GrammarRule) l ) = put_non_terminals(l,find_non_terminals(l,[])). *** (2.6) Gathering the informations read. Recall APM source files are organized as follows: preambule (Anubis text) # # # postambule (Anubis text) define Maybe(One) syntax_error ( APM_LexerState ls ) = print("Syntax error at line "+integer_to_string(line(ls))+".\n"); failure. define Maybe(One) lexical_error ( APM_LexerState ls, Int8 c ) = print("Illegal character: "+implode([c])+" at line "+integer_to_string(line(ls))+".\n"); failure. define Maybe(One) premature_end_of_file ( APM_LexerState ls ) = print("Premature end of file at line "+integer_to_string(line(ls))+".\n"); failure. The next function reads from the first separator to the third (last) one. It also calls the functions which will construct the automaton and dump it into the output file and the log file. Here is what it does: - read the name of the parser, - read precedence rules, - read grammar rules, - construct a datum of type 'Grammar', - call 'make_parser' it returns failure in case of a problem, and success(unique) otherwise. type Grammar: grammar(String parser_name, // List(String) non_terminals, List((String,Int32)) prec_table, List((Int32,AssocMode)) assoc_table, List(GrammarRule) grammar_rules). 'make_parser' is defined in the next chapter. define Maybe(One) make_parser ( Grammar grammar, WAddr(Int8) output, WAddr(Int8) log_file ). define Maybe(One) proceed_file_body ( APM_LexerState ls, WAddr(Int8) output, WAddr(Int8) log_file ) = if read_meta_token(ls) is (ls,mtok) then if mtok is symbol(parser_name) then ( if read_prec_rules(ls,[]) is (ls,prec_rules) then if prec_rules is { ok(prec_rules) then with prec_table = make_precedence_table(prec_rules,0), if read_grammar_rules(ls,[],prec_table) is (ls,grammar_rules) then if grammar_rules is { ok(grammar_rules) then make_parser(grammar(parser_name, prec_table, make_assoc_table(prec_rules), put_non_terminals(grammar_rules)), output, log_file), syntax_error then syntax_error(ls), lexical_error(c) then lexical_error(ls,c), premature_end_of_file then premature_end_of_file(ls) }, syntax_error then syntax_error(ls), lexical_error(c) then lexical_error(ls,c), premature_end_of_file then premature_end_of_file(ls) } ) else print("At line "+integer_to_string(line(ls))+": parser name not found.\n"); failure. *** (2.7) Proceeding the whole source file. The next function dumps the content of the input file into the output file, until the first separator is found. In other words, it copies the preambule to the output. It does not use the lexer, and must update the line number itself. define Maybe(Int32) copy_preambule ( RAddr(Int8) input, WAddr(Int8) output, Int32 line, ) = if *input is { failure then print("Cannot read from input file.\n"); failure, success(c) then if c = '#' then success(line) else if output <- c is { failure then print("Cannot write to output file.\n"); failure, success(_) then copy_preambule(input,output, if c = '\n' then line+1 else line) } }. The next function copies the postambule to the output. It does not need to count line numbers. define One copy_postambule ( RAddr(Int8) input, WAddr(Int8) output ) = if *input is { failure then unique, success(c) then if output <- c is { failure then print("Cannot dump postambule.\n"), success(_) then copy_postambule(input,output) } }. The next function receives the three files (input, output and the log file), reads the grammar and make the automaton. It proceeds in three steps: - copy the preambule to the output, - create a lexer state, read the precedence rules, the grammar rules, produce the automaton and dump it into the target file, - copy the postambule to the output. type Option: verbose. define One proceed_file ( List(Option) options, RAddr(Int8) input, WAddr(Int8) output, WAddr(Int8) log_file ) = if copy_preambule(input,output,0) is { failure then unique, // message already sent success(line) then if proceed_file_body(lexer_state(input,line,failure), output, log_file) is { failure then unique, // message already sent success(_) then copy_postambule(input,output) } }. define Maybe(Option) identify_option ( String s ) = if s = "-verbose" then success(verbose) else failure. The next function takes the arguments of the command line and separates options from the source file name. define Maybe((String,List(Option))) separate_options ( List(String) args, List(Option) options_so_far, Maybe(String) file_name_so_far ) = if args is { [ ] then if file_name_so_far is { failure then print("No file name on command line.\n"); failure, success(name) then success((name,options_so_far)) }, [h . t] then if nth(0,h) is { failure then alert, success(c) then if c = '-' then if identify_option(h) is { failure then failure, success(opt) then separate_options(t,[opt . options_so_far],file_name_so_far) } else if file_name_so_far is { failure then separate_options(t,options_so_far,success(h)), success(_) then print("Two file names on command line.\n"); failure } } }. Finally, here is the function which is made global. It performs the following tasks: - separate options from the source file name (by calling 'separate_options'), - open the source file, - open the target file, - open the log file, - call 'proceed_file'. global define One parser_makerss ( List(String) args ) = if separate_options(args,[],failure) is { failure then unique, // message already sent success(p) then if p is (source_file_name,options) then if (Maybe(RAddr(Int8))) connect to file source_file_name is { failure then print("Cannot open file '"+source_file_name+"'.\n"), success(input) then if (Maybe(WAddr(Int8))) connect to file "apg.out" is { failure then print("Cannot open file 'apg.out'.\n"), success(output) then if (Maybe(WAddr(Int8))) connect to file "apg.log" is { failure then print("Cannot open file 'apg.log'.\n"), success(log_file) then proceed_file(options,input,output,log_file) } } } }. *** (3) Making the parser automaton. In order to exemplify our discussion we will refer in the sequel to the following (ambiguous) 'example grammar': S -> A A -> A -> a A -> AA Notice that this grammar produces all sequences of a's, including the empty sequence. It is ambiguous since for example the sequence aaa may 'reduce' to S (or 'be derived' from S) in at least two ways: S -> A -> AA -> AAA -> AAa -> Aaa -> aaa S -> A -> AA -> Aa -> AAa -> Aaa -> aaa even if we use only 'rightmost' derivations, which means that when we follow the arrows, the non terminal which is replaced is always the rightmost one. It is the case above, as one may easily check. In the first case the tree structure of our sequence is a(aa), while in the second case, it is (aa)a. The automaton will realize the first of our two derivations above as follows (the dot represents the current position of reading from the input): .aaa shift a.aa reduce using rule A -> a A.aa shift Aa.a reduce using rule A -> a AA.a shift AAa. reduce using rule A -> a AAA. reduce using rule A -> AA AA. reduce using rule A -> AA A. reduce using rule S -> A (accept) S. The second one would be realized as follows: .aaa shift a.aa reduce using rule A -> a A.aa shift Aa.a reduce using rule A -> a AA.a reduce using rule A -> AA A.a shift Aa. reduce using rule A -> a AA. reduce using rule A -> AA A. reduce using rule S -> A (accept) S. The ambiguity is realized here by the choice we have in the situation: AA.a We may either reduce using rule A -> AA or shift. However, this grammar is much more ambiguous than this. We could for example have the following sequence: AA.a reduce using rule A -> AAA.a reduce using rule A -> AA AA.a which is obviously undesirable. In other words, our grammar has not only a shift/reduce conflict, but at least one reduce/reduce conflict. If we want to produce the same language (all the sequences of a's) with a non ambiguous grammar, we should use this one: S -> A A -> A -> aA or this one: S -> A A -> A -> Aa *** (3.1) Computing 'First'. Any symbol in a grammar represents a set of sequences of tokens, namely all sequences of tokens which reduce to this symbol. We also say that such a sequence is derived from the symbol, or that it is an 'instance' of the symbol. To any symbol we associate a finite set of 'extended tokens'. Here an extended token is either 'e' (representing the absence of a token) or a normal token, or the end marker '$'. By definition, 'First(X)' is the set of all tokens which may come first in an instance of 'X', plus 'e' if the empty sequence is an instance of 'X'. For our example grammar, we have: First($) = ($) First(a) = (a) First(S) = (a,$) First(A) = (a,$) The following type describes 'extended tokens'. type ExToken: token(String), // a normal token whose name is 's' empty, // no token at all dollar. // the end marker However, computing 'First' in general is not so easy. This is a saturation process. The main work is to compute 'First' for non terminals, since it is trivial for tokens. Here is how we can do this. (1) to each non terminal associate the empty list, i.e put First(A) = [ ]. (2) do the following until no more element can be added to any of the previous lists: if A -> au is a production, add 'a' to First(A), if A -> is a production, add 'e' to First(A), if A -> Bu is a production, and if - 'e' is in First(B) then add production A -> u to the grammar and add all of First(B)-[e] to First(A), - e is not in First(B) then add all of First(B) to First(A). Of course, productions are added to the grammar only for computing 'First', not for any other computation. We also need to compute 'First(X_1...X_k) for any sequence of symbols. This is done by induction on k: First() = [e] First(X_1...X_k) = - if 'e' is in First(X_1), then First(X_1)-[e] union First(X_2...X_k) - else First(X_1). In practice, we compute only what we call a 'first function', which is an association list: [ (A,[...]), (B,[...]), ... ] of type List((String,List(ExToken))), where 'A', 'B',... are the non terminals, and [...] the list of extended tokens which may come first in an instance of the corresponding non terminal. The next function computes: (l1 -[e]) union l2. However, 'e' may belong to l2, and in that case will belong to the result. define List(ExToken) merge_except_empty ( List(ExToken) l1, List(ExToken) l2 ) = if l1 is { [ ] then l2, [h . t] then if h = empty then merge_except_empty(t,l2) else if member(h,l2) then merge_except_empty(t,l2) else merge_except_empty(t,[h . l2]) }. We will need to convert an extended token to a grammar symbol. 'e' should never be converted. define Symbol to_symbol ( ExToken t )= if t is { token(s) then token(s), empty then alert, dollar then dollar }. The function below, constructs the initial stage of our 'first function'. In this stage all lists of tokens are empty. define List((String,List(ExToken))) initial_stage ( List(String) non_terminals ) = if non_terminals is { [ ] then [ ], [h . t] then [(h,[]) . initial_stage(t)] }. We will also need to find the value of a non terminal (given by its name) in our 'first function'. This search should always be successful. define List(ExToken) first ( String name, List((String,List(ExToken))) f ) = if f is { [ ] then alert, [h . t] then if h is (n,l) then if n = name then l else first(name,t) }. The same, but for an arbitrary grammar symbol 'X'. define List(ExToken) first ( Symbol _X, List((String,List(ExToken))) f ) = if _X is { dollar then [dollar], token(s) then [(ExToken)token(s)], non_terminal(s) then first(s,f) }. Finally, we may compute 'First(u)' for any sequence of grammar symbols 'u'. define List(ExToken) first ( List(Symbol) u, List((String,List(ExToken))) f ) = if u is { [ ] then [ empty ], [_X . v] then with first_X = first(_X,f), if member(empty,first_X) then merge_except_empty(first_X,first(v,f)) else first_X }. The following function adds a token to a set of tokens in a 'first function'. It is given the extended token 'x' to be added, the name of the non terminal under which it should be added, and the 'first function' into which this operation should be performed. The grammar is not used, but must be transmitted via terminal calls. define (List((String,List(ExToken))),List(GrammarRule)) add ( ExToken x, String head_name, List((String,List(ExToken))) f, List(GrammarRule) l ) = if f is { [ ] then alert, // the name should have been found [h . t] then if h is (name,toks) then if name = head_name then if member(x,toks) then (f,l) else ([(name,[x . toks]) . t],l) else if add(x,head_name,t,l) is (f,l) then ([h . f],l) }. The next function tests if a given non terminal may represent the empty sequence. define Bool may_be_empty ( String name, // name of non terminal List((String,List(ExToken))) f // 'first function' ) = member(empty,first(name,f)). The following function adds all elements of a set of extended tokens to a 'first list' in a given 'first function'. define (List((String,List(ExToken))),List(GrammarRule)) add_all_of ( List(ExToken) new, // elements to be added String head_name, // name of non terminal List((String,List(ExToken))) f, // 'first function' List(GrammarRule) l // just to be transmitted ) = if f is { [ ] then alert, [h . t] then if h is (name,toks) then if name = head_name then ([(name,merge(new,toks)) . t],l) else if add_all_of(new,head_name,t,l) is (f,l) then ([h . f],l) }. The same one, but not adding 'e'. define (List((String,List(ExToken))),List(GrammarRule)) add_all_of_except_empty ( List(ExToken) new, // elements to be added String head_name, // name of non terminal List((String,List(ExToken))) f, // 'first function' List(GrammarRule) l // just to be transmitted ) = if f is { [ ] then alert, [h . t] then if h is (name,toks) then if name = head_name then ([(name,merge_except_empty(new,toks)) . t],l) else if add_all_of_except_empty(new,head_name,t,l) is (f,l) then ([h . f],l) }. The following function works out one grammar rule for the addition of elements to 'First lists'. define (List((String,List(ExToken))),List(GrammarRule)) first_work_rule ( List((String,List(ExToken))) f, List(GrammarRule) l, // the complete grammar at that point GrammarRule r ) = if r is grammar_rule(head_name,term,body,mbprec) then if body is { [ ] then add(empty,head_name,f,l), [a . u] then if a is (sym,args) then if sym is { dollar then alert, token(str) then add(token(str),head_name,f,l), non_terminal(str) then if may_be_empty(str,f) then add_all_of_except_empty(first(str,f),head_name,f, merge([grammar_rule(head_name,term,u,failure)],l)) else add_all_of(first(str,f),head_name,f,l) } }. The next function makes one step of completion of first sets (only for non terminals), making one action for each rule in the grammar. We need to return both the 'first function' 'f' and the grammar, because they change during the process. define (List((String,List(ExToken))),List(GrammarRule)) first_one_step ( List((String,List(ExToken))) f, List(GrammarRule) l, List(GrammarRule) todo ) = if todo is { [ ] then (f,l), [r1 . others] then if first_work_rule(f,l,r1) is (f,l) then first_one_step(f,l,others) }. The next function saturates a 'first function'. define (List((String,List(ExToken))),List(GrammarRule),Int32) saturate_first ( List((String,List(ExToken))) f, List(GrammarRule) l, Int32 count ) = if first_one_step(f,l,l) is (f_new,l_new) then if f = f_new then if l = l_new then (f,l,count) else saturate_first(f_new,l_new,count+1) else saturate_first(f_new,l_new,count+1). We need to extract the list of all non terminals from the grammar. define List(String) non_terminals ( List(GrammarRule) l, List(String) found_so_far ) = if l is { [ ] then found_so_far, [h . t] then if h is grammar_rule(name,term,body,mbprec) then if member(name,found_so_far) then non_terminals(t,found_so_far) else non_terminals(t,[name . found_so_far]) }. The next function is an interface to the previous one. define List(String) non_terminals ( List(GrammarRule) l ) = non_terminals(l,[]). Here is the function which computes the 'first function' associated to a given grammar. define (List((String,List(ExToken))),Int32) first_function ( List(GrammarRule) l ) = if saturate_first(initial_stage(non_terminals(l)),l,0) is (f,l,n) then (f,n). *** (3.2) Scenarii. As we saw previously, reductions using a grammar rule, occur only on top of stack. If the stack (as far as grammar symbols are concerned) is: ... u i.e. if it ends by u (a sequence of grammar symbols), and if there is a production of the form: A -> uv then it is possible that after having read an instance of v, we reduce using that rule. Furthermore, the automaton is able to look at the next token to be read (it has one token of 'lookahead'). This helps to make decisions, as we will see later, using precedence and association rules. In particular, the automaton knows which token is allowded as the lookahead for a given reduction. Hence, we introduce the notion of a scenario. A 'scenario' is a pair, denoted (in these explanations): (A ->u.v , (a_1,...,a_k)) where A -> uv is a production (whose right hand side has been split into two parts u and v, separated by a dot, where u and/or v may be empty), and where (a_1,...,a_k) is a non empty set of tokens. In the case of our example grammar, here are all the possible left part of scenarii: S -> .A S -> A. A -> . A -> .a A -> a. A -> .AA A -> A.A A -> AA. That a scenario (A -> u.v, E) is 'possible' in some state s means that the top of stack is described by u (one slot for one symbol), and that reduction using the given grammar rule may occur if the lookahead token (at the time the reduction takes place) belongs to 'E'. It is clear that, the grammar being given as a finite set of rules (and a finite sets of tokens), there is only a finite number of scenarii. Two scenarii: (A -> u.v , E) (B -> w.t , F) are called 'compatible' if either u is a postfix of w, or w a postfix of u. This simply means that there exists a stack for which the two scenarii are possible. The top of that stack must have the longuest of u and w on its top. Two scenarii: (A -> u.v , E) (A -> u.v , F) are called 'similar' if they have the same left part (same production splitted at the same place). They differ only by the sets of tokens E and F. Two such scenarii may be joined together into the unique scenario: (A -> u.v , G) where G is the union of E and F. Below is our representation of scenarii (A ->u.v , E): type Scenario: scenario(String, // A List(Symbol), // u in reverse order List(Symbol), // v in natural order List(ExToken), // E Maybe(Int32)). // precedence level of grammar rule 'u' is stored in reverse order, because the most common operation is to kake the head of 'v' and put it in front of 'u', so that the dot in the scenario advances past one grammar symbol. *** (3.3) States. A state of our automaton is a finite set of two by two compatible scenarii, which does not contain any two similar scenarii. Intuitively, the scenarii in a state are simply those which are still possible in this state. The 'core' of a state is what remains if we ignore lookaheads. States which do not differ by the core are called 'similar'. Could'nt we consider similar states as equivalent ? The answer is no in theory. But the difference of behavior of the automaton in similar states is negligible in practice. This is the reason why we will identify similar states (merging lists of lookahead for similar scenarii). But let's see what the difference is really. Clearly, since similar states differ only by the lookaheads, the same shift and/or reduces may arise. The difference is only in the decision to make in case of a conflict. However, since the user has plenty of tools to influence such decisions, there is no need to make any distinction between similar states. Of course we represent states (up to a certain point) using the type 'List(Scenario)'. *** (3.4) Testing for similarity. The next function tests if two scenarii are similar. define Bool similar ( Scenario s1, Scenario s2 ) = if s1 is scenario(n1,u1,v1,_,_) then if s2 is scenario(n2,u2,v2,_,_) then (n1,u1,v1) = (n2,u2,v2). The next function takes a scenario 's' and a state, and returns this state from which an eventual scenario similar to 's' has been dropped. define Maybe(List(Scenario)) drop_similar ( Scenario h, List(Scenario) s ) = if s is { [ ] then failure, [u . v] then if similar(h,u) then success(v) else if drop_similar(h,v) is { failure then failure, success(w) then success([u . w]) } }. The next function tests for similar states. define Bool similar ( List(Scenario) s1, List(Scenario) s2 ) = if s1 is { [ ] then s2 = [ ], [h . t] then if drop_similar(h,s2) is { failure then false, success(s2a) then similar(t,s2a) } }. The next function tests if a list if scenarii contains only scenarii with the splitting dot at the left end (i.e. in front of the right member of the rule). define Bool has_only_front_dots ( List(Scenario) s ) = if s is { [ ] then true, [h . t] then if h is scenario(_,u,_,_,_) then if u is { [ ] then has_only_front_dots(t), [_._] then false } }. The next function tests if a given non saturated state has a saturated version similar to some saturated state. It does this without saturating the first state. define Bool saturated_is_similar ( List(Scenario) s1, // non saturated List(Scenario) s2 // saturated ) = if s1 is { [ ] then has_only_front_dots(s2), [h . t] then if drop_similar(h,s2) is { failure then false, success(s2a) then saturated_is_similar(t,s2a) } }. *** (3.5) Saturating states. Remark that if some state contains the scenario: (A -> u.Bv , E) (where B is a non terminal), it is possible that the next sequence of tokens to be read matches B. This means that, if B -> w is any B-production, the scenario (B -> .w, ?) should also be possible in the same state. Now, what are the acceptable lookaheads for this scenario ? They are obviously all the tokens which may begin an instance of va, for any a in E. This remark provides a procedure for 'saturating' states. A state is 'saturated' if whenever it contains: (A -> u.Bv , (a_1,...,a_k)) it also contains: (B -> .w , union First(va_i)) i for all B-productions B -> w. In the sequel, we will compute saturated states, but states are often more conveniently represented by their non saturated version. Below is a function which computes union First(va_i): i define List(ExToken) union_first ( List(ExToken) _E, // a_1 ... a_k List(Symbol) v, List((String,List(ExToken))) f ) = if _E is { [ ] then [ ], [a1 . others] then merge(first(append(v,[to_symbol(a1)]),f),union_first(others,v,f)) }. The next function tests if a given state is similar to some state in a given list of states. This is needed for our saturation process, because we must not add to a state a scenario which already belongs (maybe in a similar form) to that state. Otherwise, our process would never end. define Bool already_present ( List(Scenario) s, List(List(Scenario)) l ) = if l is { [ ] then false, [h . t] then if similar(s,h) then true else already_present(s,t) }. The next function is given a (new) scenario to be inserted into a list of scenarii. If this list contains a similar scenario, the new scenario is just merged to that one. Otherwise, it is simply added to the list. define List(Scenario) insert_scenario ( Scenario s, List(Scenario) l ) = if l is { [ ] then [s], [s1 . others] then if similar(s,s1) then (if s is scenario(_A,u,v,_E,mbprec) then if s1 is scenario(_, _,_,_F,_) then [scenario(_A,u,v,merge(_E,_F),mbprec) . others]) else [s1 . insert_scenario(s,others)] }. The next function extracts the symbols from the right hand side of a grammar rule (dropping the 'term' part). define List(Symbol) symbols ( List((Symbol,String)) l ) = if l is { [ ] then [ ], [h . t] then if h is (s,u) then [s . symbols(t)] }. The following function adds to a given state 's', all the scenarii of the form (B -> .w , F), for all B-productions. The set of lookaheads F is given. define List(Scenario) add_scenarii ( String _B, // B List(ExToken) lookaheads, // F List(Scenario) s, // s List(GrammarRule) g // the grammar ) = // by induction on the list 'g' of all grammar rules if g is { [ ] then s, // no more grammar rule to try out [r1 . others_rules] then if r1 is grammar_rule(rule_name,term,body,mbprec) then if _B = rule_name // do it only for B-productions // this is a B-production. The new scenario is: then with new_scenario = scenario(_B,[],symbols(body),lookaheads,mbprec), // first insert the new scenario, and continue with next // grammar rule add_scenarii(_B, lookaheads, insert_scenario(new_scenario,s), others_rules) // else this was not a B-production else add_scenarii(_B,lookaheads,s,others_rules) }. The next function performs one step in the saturation of a state. This step consists in a loop on all scenarii in the state. The list l is the list of scenarii which have not yet been used for saturation, while 'all' is the set of all known scenarii in the state at any time. For each scenario ('sc1' below), of the form (A -> u.v , E), we first check the form of 'v'. If 'v' is empty the scenario does not participate to saturation, and we just re-enter the loop with the tail of 'l' instead of 'l'. If 'v' is not empty, it has a first symbol ('_B' below). This _B cannot be a $. If it is a token, the scenario does not participate to saturation, like above. Now, if _B is a non terminal, we add to 'all' all the scenarii derived by the previous function from B-productions, and we continue our loop. define List(Scenario) saturate_state_one_step ( List(Scenario) all, // all scenarii in the state List(Scenario) l, // scenarii not yet used for saturation List(GrammarRule) g, // the grammar List((String,List(ExToken))) f // the 'first function' ) = if l is { [ ] then all, // saturation step finished [sc1 . others] then if sc1 is scenario(_A,u,v,_E,_) then if v is { [ ] then saturate_state_one_step(all,others,g,f), [_B . w] then if _B is { dollar then alert, // the right side of a rule // cannot contain a '$' token(_) then saturate_state_one_step(all,others,g,f), non_terminal(name) then saturate_state_one_step( add_scenarii(name, // add a scenario for each B-production union_first(_E,w,f), // lookaheads all, g), others,g,f) } } }. Now, saturating a state is just performing saturation steps until a step does not change the state any more. define List(Scenario) saturate_state ( List(Scenario) s, List(GrammarRule) g, List((String,List(ExToken))) f ) = with s1 = saturate_state_one_step(s,s,g,f), if s1 = s then s else saturate_state(s1,g,f). *** (3.6) The initial state. The non terminal S represents the totality of what we want to read from the input. More precisely, if the input is correct, it is an instance of S. Hence, since there is only one S-production S -> A, our reading (if successful) will end by a reduction via this rule, and it will be correct if and only if the lookahead token is the end marker: $. Hence, at the beginning, there is obviously one and only one wanted scenario, which is: (S -> .A , ($)) This scenario (which will be called the 'initial scenario') needs to belong to the initial state. In fact, the initial state is simply the smallest saturated state which contains this scenario. In the case of our example, this saturated state will be (after two steps of saturation): (S -> .A , ($)) (A -> . , (a,$)) (A -> .a , (a,$)) (A -> .AA , (a,$)) Note that the rule S -> A appears only one time in the initial state since the state saturation process cannot produce a scenario using this rule. Now the state generation process will produce a state with the scenario (S -> A. , ($)). Obviously, we cannot have other scenarii using this rule. The state which contains the scenario (S -> A. , ($)) is our 'accepting state'. Indeed, the input has been read entirely only when we are on the point to reduce using this scenario. In that case the next token to be read is the end marker, and we 'accept' the input. However, we may have a reduce/reduce conflict with this scenario. It is the case in our example grammar. Indeed, in state 2 (see below), and if the next token to be read is the end marker, we may either reduce using the scenario (S -> A. , ($)) or the scenario (A -> . , (a,$)). Notice that it is not possible to have a shift/reduce conflict with scenario (S -> A. ,($)), because the token '$' cannot be shifted (it cannot appear in the right member of a rule). Of course the user cannot choose between these two reductions because he does'nt know about the existence of rule S -> A. Nevertheless, in that case, we avoid the conflict by reducing systematically using rule (S -> A. , ($)). This may be justified as follows. The initial state contains the initial scenario, and scenarii obtained by saturation, i.e. with the dot in front of the right member. Hence the accepting state may only contain the accepting scenario, scenarii of the form (? -> A.? , ?) (because we make a transition on A between the two states), and scenarii with the dot in front of the right member. Hence all scenarii in the accepting state have at most one symbol on the left of the dot. This means that if a reduce/reduce conflict arises between the accepting scenario and another scenario, this other scenario is either of the form: (B -> . , ($ ...)) or of the form: (B -> A. , ($ ...)) In the first case, ??? The following function constructs the non saturated initial state for a given grammar. It simply looks for the unique S-production, and constructs state 0 containing the unique initial scenario. define List(Scenario) initial_state ( List(GrammarRule) g ) = if g is { [ ] then alert, [h . t] then if h is grammar_rule(name,term,body,mbprec) then if name = "#S" then [scenario(name,[],symbols(body),[dollar],mbprec)] else alert }. *** (3.7) Transitions. Of course our automaton has transitions. It has two kinds of transitions: those which result from the reading of a token, and those which result from the reduction via a rule, after a sequence of tokens has been read which is an instance of the right side of this rule. The first ones are labelled by tokens, while the others are labelled by non terminals. If in some state, we have the scenario: (A -> u.av , E) (where 'a' is a token) then, if the next token to be read is 'a', it is clear that the transition will be performed to a state containing the scenario: (A -> ua.v , E) Notice that E is unchanged. Now, if in some state, we have the scenario: (A -> u.Bv , E) and if, after reading some tokens, we reduce via this B-production and return to this state, we will have to make a transition to a state containing: (A -> uB.v , E) (E again unchanged). All our transitions will occur in one of these two situations. *** (3.8) Generating the states. Which states do we needs ? We need the initial state, and all the states which are reachable from it via one of the two above kinds of transitions. This gives the method for generating states. (1) when creating a new state, saturate it, (2) for each symbol for which there are scenarii in the state with this symbol after the dot, construct the state needed for the corresponding transition. (3) Do that until no more state may be created. Example. Consider our example grammar: S -> A A -> A -> a A -> AA and remember that First(A) is (a,$). state 0 saturation step 1 saturation step 2 ----------------------------------------------------------- (S -> .A , ($)) (S -> .A , ($)) (S -> .A , ($)) (A -> . , ($)) (A -> . , (a,$)) (A -> .a , ($)) (A -> .a , (a,$)) (A -> .AA , ($)) (A -> .AA , (a,$)) Reading 'a' from state 0: state 1 ------------------ (A -> a. , (a,$)) Reading an instance 'A' from state 0: state 2 saturation ------------------------------------------- (S -> A. , ($)) (S -> A. , ($)) (A -> A.A , (a,$)) (A -> A.A , (a,$)) (A -> . , (a,$)) (A -> .a , (a,$)) (A -> .AA , (a,$)) Raeding 'a' from state 2: --> state 1 again Reading an instance of 'A' from state 2: state 3 saturation ------------------------------------------- (A -> AA. , (a,$)) (A -> AA. , (a,$)) (A -> A.A , (a,$)) (A -> A.A , (a,$)) (A -> . , (a,$)) (A -> .a , (a,$)) (A -> .AA , (a,$)) Reading 'a' from state 3: --> state 1 Reading an instance of 'A' from state 3: --> state 3 That's all ! *** (3.9) Making the automaton. The following function takes a scenario (A -> u.Xv , E), where X is any grammar symbol, and a list of lists of scenarii of the form: [ [ (? -> ?Y.? , ?) (? -> ?Y.? , ?) ... ], ... ] i.e. such that in each list (called a 'class'), the scenarii (? -> u.? , ?) have the same symbol as the last one in 'u' (i.e. the first one in our representation, since 'u' is stored in reverse order). The class above is said ''corresponding to Y''. The function looks for a class corresponding to X. If it exists the scenario is added to this class, after its dot has been put past X. Otherwise, it makes a new class. If the scenario has no symbol after the dot, it is not classified at all. define List(List(Scenario)) classify ( Scenario s, List(List(Scenario)) l ) = if s is scenario(_A,u,v,_E,mbp) then if v is { [ ] then l, // s not classified [_X . v1] then // s is (A -> u.Xv1 , E) if l is { [ ] then // no class yet: create a new class [[scenario(_A,[_X . u],v1,_E,mbp)]], [_C1 . other_classes] then // look at first class if _C1 is { [ ] then alert, // no class should be empty [s1 . _] then if s1 is scenario(_,u1,_,_,_) then // get the symbol Y for that class if u1 is { [ ] then alert, // u1 should end (begin) by a Y [_Y . _] then if _X = _Y // put scenario in class C1 then [insert_scenario(scenario(_A,[_X . u],v1,_E,mbp),_C1) . other_classes] // try other classes else [_C1 . classify(s,other_classes)] } } } }. The function 'next_states' takes a state 'state', and produces the list of all states which may be reached from 'state' via a single transition (either on shifting a token or after reduction to a non terminal). It works as follows. It partitions 'state' so that each element of the partition has scenarii with the same symbol after the dot. Then the dot is put past this symbol. For example, if 'state' is: [ (A -> u.av , E) (B -> w.at , F) (C -> z.By , G) ] it will produce: [ [ (A -> ua.v , E) (B -> wa.t , F) ], [ (C -> zB.y , G) ] ] The next function takes a (non saturated) state, and computes the list of all (non saturated) states which may be the target of a transition (either on a token or on a non terminal) from that state. It transforms a state into a set of classes like the above. define List(List(Scenario)) next_states ( List(Scenario) l ) = if l is { [ ] then [ ], [s1 . others] then with part = next_states(others), classify(s1,part) }. Now, in order to compute our automaton (of type 'List(List(Scenario))'), we must start with the initial non saturated state and add 'next' states until no more state may be added. Of course, we add states only if they are not already present in the automaton. More presisely, if there is a similar state in the automaton, we must merge those two states. Here is how we merge states. define Scenario get_similar ( Scenario s, List(Scenario) l ) = if s is scenario(_A,u,v,_E,mbp) then // a scenario similar to 's' is assumed to be in 'l' if l is { [ ] then alert, [s1 . others] then if s1 is scenario(_B,w,t,_F,_) then if (_A = _B) & (u = w) & (v = t) then s1 else get_similar(s,others) }. define List(Scenario) merge_states ( List(Scenario) l1, List(Scenario) l2 ) = // each element of l1 has a similar in l2. if l1 is { [ ] then [ ], [s1 . o1] then with s2 = get_similar(s1,l2), if s1 is scenario(_A,u,v,_E,mbp) then if s2 is scenario(_, _,_,_F,_) then [scenario(_A,u,v,merge(_E,_F),mbp) . merge_states(o1,l2)] }. The next function inserts a new state into a list of states. define List(List(Scenario)) insert_state ( List(Scenario) state, List(List(Scenario)) l ) = if l is { [ ] then [state], [s1 . others] then if similar(state,s1) then [merge_states(state,s1) . others] else [s1 . insert_state(state,others)] }. At each step of the construction of our automaton, we have two lists: - the list 'have_next' of those states for which next states have been already constructed, - the list 'have_no_next' of those state for which the next states have not yet been constructed. define List(List(Scenario)) make_states ( List(List(Scenario)) have_next, List(List(Scenario)) have_no_next, List(GrammarRule) g, List((String,List(ExToken))) f ) = if have_no_next is { [ ] then have_next, // the automaton is finished [state . others] then with state = saturate_state(state,g,f), if already_present(state,have_next) then make_states(insert_state(state,have_next), others, g, f) else make_states(insert_state(state,have_next), reverse_append(others,next_states(state)), g, f) }. define List(GrammarRule) add_S_rule ( List(GrammarRule) l ) = if l is { [ ] then alert, [h . t] then if h is grammar_rule(_A,_,_,_) then [grammar_rule("#S","",[(non_terminal(_A),"")],failure) . l] }. Finally, we can make the whole automaton from the sole grammar. define List(List(Scenario)) make_states ( List(GrammarRule) g ) = with g = add_S_rule(g), if first_function(g) is (f,n) then make_states([],[initial_state(g)],g,f). *** (4) Reworking the automaton. *** (4.1) Numbering states and adding transitions lists. Now that our states are established, we need to rework them. Here are the operations performed: - Put an identifying number on each state (beginning at 0) - Attach a transition A-list to each state (each key is a symbol or $). type IntermediateState: i_state(Int32 id, List(Scenario) scenarii, List((Symbol,Int32)) transitions). The next function just add numbers identifying states. define List(IntermediateState) number ( List(List(Scenario)) l, Int32 n ) = if l is { [ ] then [ ], [h . t] then [i_state(n,h,[]) . number(t,n+1)] }. The next function gives the number identifying a non saturated state in a list of intermediate states. define Int32 find_id ( List(Scenario) non_saturated_state, List(IntermediateState) all ) = if all is { [ ] then alert, [h . t] then if h is i_state(id,scnri,_) then if saturated_is_similar(non_saturated_state,scnri) then id else find_id(non_saturated_state,t) }. The next function takes a class (a list of scenarii with the same grammar symbol Y before the dot) and an automaton in the form os a list of intermediate states, and returns the pair (Y,n), where Y is the previous grammar symbol and n the integer identifying that class in the automaton. define (Symbol,Int32) make_transition ( List(Scenario) class, List(IntermediateState) all ) = if class is { [ ] then alert, [s . o] then if s is scenario(_,u,_,_,_) then if u is { [ ] then alert, [_Y . _] then (_Y,find_id(class,all)) } }. The following function takes a partition of a state (in the form of a list of classes), an automaton (in the form of a list of intermediate states), and returns a list of pairs (X,n) saying ``if transition is on X, then go to state n''. define List((Symbol,Int32)) make_transitions ( List(List(Scenario)) part, // partitioned saturated state List(IntermediateState) all, // all states List((Symbol,Int32)) computed_so_far ) = if part is { [ ] then computed_so_far, [scs1 . o] then make_transitions( o, all, [make_transition(scs1,all) . computed_so_far]) }. The next function adds transitions to all intermediate states in our automaton. define List(IntermediateState) add_transitions ( List(IntermediateState) all, // all states List(IntermediateState) l // current list of states to complete ) = if l is { [ ] then [ ], [h . t] then if h is i_state(id,scnri,_) then [i_state(id,scnri, make_transitions(next_states(scnri),all,[])) . add_transitions(all,t)] }. Finally, we transform our automaton. define List(IntermediateState) add_numbers_and_transitions ( List(List(Scenario)) automaton ) = with new = number(automaton,0), add_transitions(new,new). *** (4.2) Removing unneeded lookaheads, and separating scenarii. If a scenario in a state has the form ( A-> u.v , E) and if v is not empty, E is no more needed. Such a scenario is called a 'shifting' scenario, because it will cause the shifting of either a token or of an instance of a non terminal. On the contrary, scenarii of the form (A -> u. , E) are called 'reducing' scenarii, because they call for a reduction. type NonEmptyList($T): [$T . List($T)]. type ShiftingScenario: shifting_scenario(String name, List(Symbol) before_dot, NonEmptyList(Symbol) after_dot). type ReducingScenario: reducing_scenario(String name, List(Symbol) right_member, List(ExToken) lookaheads, Maybe(Int32) prec). type Conflict: reduce_reduce(ExToken token, ReducingScenario first, ReducingScenario second), shift_reduce(ExToken token, ShiftingScenario first, ReducingScenario second). type NewState: state(Int32 id, List(ReducingScenario) reducing_scenarii, List(ShiftingScenario) shifting_scenarii, List((Symbol,Int32)) transitions, List(Conflict) conflicts). Given an automaton in the form of a list of intermediate states, we transform it into an automaton in the form of a list of new states. This is a state by state operation. The next function checks if a precedence level may be deduced from the right member of the rule. define Maybe(Int32) get_prec_from ( List(Symbol) u, // right member of rule in reverse order List((String,Int32)) prec_table ) = if u is { [ ] then failure, [h . t] then if h is { dollar then get_prec_from(t,prec_table), token(s) then if prec(s,prec_table) is { failure then get_prec_from(t,prec_table), success(n) then success(n) }, non_terminal(s) then get_prec_from(t,prec_table) } }. define Maybe(Int32) get_prec_from ( Maybe(Int32) prec, List(Symbol) u, List((String,Int32)) prec_table ) = if prec is { failure then get_prec_from(reverse(u),prec_table), success(_) then prec }. For each state, we just need to separate the list of scenarii, and slightly rearrange each of them. define (List(ReducingScenario),List(ShiftingScenario)) separate ( List(Scenario) l, List((String,Int32)) prec_table ) = if l is { [ ] then ([ ],[ ]), [h . t] then if separate(t,prec_table) is (rs,ss) then if h is scenario(_A,u,v,_E,mbp) then if v is { [ ] then ([reducing_scenario(_A,u,_E,get_prec_from(mbp,u,prec_table)) . rs],ss), [_B . w] then (rs, [shifting_scenario(_A,u,[_B . w]) . ss]) } }. The next function establishes the list of conflict in a given state, from the two lists of reducing scenarii and shifting scenarii. define List($T) intersect ( List($T) l1, List($T) l2 ) = if l1 is { [ ] then [ ], [h . t] then if member(h,l2) then [h . intersect(t,l2)] else intersect(t,l2) }. define List(Conflict) rr_conflicts ( List(ExToken) common, ReducingScenario rs1, ReducingScenario rs2 ) = if common is { [ ] then [ ], [h . t] then [reduce_reduce(h,rs1,rs2) . rr_conflicts(t,rs1,rs2)] }. define List(Conflict) rr_conflicts ( ReducingScenario rs1, ReducingScenario rs2 ) = if rs1 is reducing_scenario(_,_,_E,_) then if rs2 is reducing_scenario(_,_,_F,_) then rr_conflicts(intersect(_E,_F),rs1,rs2). define List(Conflict) rr_conflicts ( ReducingScenario rs, List(ReducingScenario) l ) = if l is { [ ] then [ ], [rs1 . rso] then reverse_append(rr_conflicts(rs,rs1),rr_conflicts(rs,rso)) }. define List(Conflict) sr_conflicts ( ReducingScenario rs, List(ShiftingScenario) ss ) = if ss is { [ ] then [ ], [ss1 . sso] then if rs is reducing_scenario(_,_,_E,_) then if ss1 is shifting_scenario(_,_,v) then if v is [a . v1] then if a is { dollar then alert, token(s) then with a1 = (ExToken)token(s), if member(a1,_E) then [shift_reduce(a1,ss1,rs) . sr_conflicts(rs,sso)] else sr_conflicts(rs,sso), non_terminal(s) then sr_conflicts(rs,sso) } }. define List(Conflict) conflicts ( List(ReducingScenario) rs, List(ShiftingScenario) ss ) = if rs is { [ ] then [ ], [rs1 . rso] then reverse_append( reverse_append(rr_conflicts(rs1,rso),sr_conflicts(rs1,ss)), conflicts(rso,ss)) }. Now, we can transform our automaton. define List(NewState) separate ( List(IntermediateState) l, List((String,Int32)) prec_table ) = if l is { [ ] then [ ], [i_s . others] then if i_s is i_state(id,scnri,trs) then if separate(scnri,prec_table) is (rs,ss) then [state(id,rs,ss,trs,conflicts(rs,ss)) . separate(others,prec_table)] }. define Int32 count_conflicts ( List(NewState) l ) = if l is { [ ] then 0, [h . t] then if h is state(_,_,_,_,cfls) then length(cfls)+count_conflicts(t) }. *** (4.3) Making decisions. We will now examine our states to decide what to do in the presence of a given lookahead. In other words, we must construct our 'action' function. We continue with the same example. We record all possibilities in the following table: | a $ --+------------------------- 0 | s1/r2 r2 1 | r3 r3 2 | s1/r2 r1/r2 3 | s1/r2/r4 r2/r4 Indeed, in state 0, if we see an 'a' we may either shift and go to state 1, or reduce using rule 2 (A -> ). If we see a '$' we can only reduce using rule 2. In state 1, we can only reduce using rule 3 (A -> a). In state 2, if we see 'a', we ca shift and go to state 1, or reduce using rule 2 (A -> ). If we see a '$' we can reduce using either rule 1 (S -> A) or rule 2 (A -> ). In state 3, if we see 'a', we can shift and go to state 1, or reduce using either rule 2 (A -> ) or rule 4 (A -> AA). If we see '$', we can reduce using either rule 2 or rule 4. Hence, as expected, the example grammar is highly ambiguous. *** (4.4) Reporting conflicts. In a given saturated state, we have two sorts of scenarii: - 'reducing' scenarii with the dot at the end, - 'shifting' scenarii with the dot not at the end. Scenarii with the dot at the end call for reductions. (1) If there is no reducing scenario, no confict may arise in that state. (2) If there is a reducing scenario, this scenario has a list 'E' of lookaheads: (A -> u. , E) (2.1) Consider a shifting scenario, with the token 'a' after the dot: (B -> w.at) (2.1.1) If 'a' belongs to 'E', we may either reduce or shift, in the presence of 'a'. If the rule A -> u has a precedence level and if 'a' also has a precedence level, the conflict is resolved as follows: prec(a) < prec(A -> u) then reduce prec(a) > prec(A -> u) then shift prec(a) = prec(A -> u) then if this level associates: - on the left then reduce - on the right then shift - does not then generate an error If either the rule A -> u or 'a' has no precedence level, then there is actually a shift/reduce conflict. (2.1.2) If 'a' does not belong to 'E', the reducing scenario does not generate a conflict. (2.2) Consider another reducing scenario. (2.2.1) If they do not share any lookahead, there is no conflict. (2.2.2) If they share a lookahead 'a', there is a reduce/reduce conflict on 'a'. *** (4.5) Making a trace file. *** (5) Making the output file. *** (5.2) Performing reductions. *** (5.3) States as functions. *** (6) Putting it all together. Finally, here is a tool to print a 'first function'. We begin by a function printing a list of extended tokens. define One print ( List(ExToken) l ) = if l is { [ ] then unique, [h . t] then if h is { token(name) then print(name), empty then print("'empty'"), dollar then print("$") }; print(" "); print(t) }. Now, we can print a 'first function'. define One print ( List((String,List(ExToken))) f ) = if f is { [ ] then unique, [h . t] then if h is (name,toks) then print("First("+name+") = [ "); print(toks); print("]\n"); print(t) }. Here are some tools for printing an automaton. define One print ( Symbol s ) = if s is { dollar then print("$"), token(s) then print(s), non_terminal(s) then print(s) }. define One print ( List(Symbol) l ) = if l is { [ ] then unique, [h . t] then print(h); print(" "); print(t) }. define One print ( Scenario s ) = if s is scenario(name,u,v,_E,mbprec) then print("("+name+" -> "); print(reverse(u)); print(". "); print(v); print(" , [ "); print(_E); print("])\n"). define One print ( List(Scenario) s ) = if s is { [ ] then unique, [h . t] then print(h); print(t) }. Print an automaton, numbering the states at the same time. define One print ( List(List(Scenario)) l, Int32 n ) = if l is { [ ] then unique, [h . t] then print("\n-- state "+integer_to_string(n)+" --\n"); print(h); print(t,n+1) }. Here is a tool for printing a new automaton. define One map ( $T -> One f, List($T) l ) = if l is { [ ] then unique, [h . t] then f(h); map(f,t) }. define One map ( $T -> One f, NonEmptyList($T) l ) = if l is { [h . t] then f(h); map(f,t) }. define One print ( NonEmptyList(Symbol) l ) = if l is { [h . t] then print(h); print(" "); print(t) }. define One print ( ReducingScenario rs ) = if rs is reducing_scenario(n,rh,lh,prec) then print(" "+n+" -> "); print(reverse(rh)); if prec is { failure then print("."), success(n) then print(". ["+integer_to_string(n)+"]") }; print("\n"). define One print ( ShiftingScenario rs ) = if rs is shifting_scenario(n,bd,ad) then print(" "+n+" -> "); print(reverse(bd)); print(". "); print(ad); print("\n"). define Int32 max ( Int32 n, Int32 m ) = if n < m then m else n. define String right_pad ( String s, Int32 n ) = s + constant_string(max(0,n-length(s)),' '). define One print_term_transition ( (Symbol,Int32) tr ) = if tr is (s,n) then if s is { dollar then alert, token(s) then print(" "+right_pad(s,20)+" shift and goto state "); print(integer_to_string(n)+"\n"), non_terminal(s) then unique }. define One print_non_term_transition ( (Symbol,Int32) tr ) = if tr is (s,n) then if s is { dollar then alert, token(s) then unique, non_terminal(s) then print(" "+right_pad(s,20)+" goto state "); print(integer_to_string(n)+"\n") }. define One print_reductions ( String _A, List(Symbol) right_hand, List(ExToken) lookaheads ) = if lookaheads is { [ ] then unique, [h . t] then with tok = if h is { token(s) then s, empty then alert, dollar then "$" }, ( if nth(0,_A) = success('#') then print(" "+right_pad(tok,20)+" accept\n") else print(" "+right_pad(tok,20)+" reduce using rule "+_A+" -> "); print(reverse(right_hand)); print("\n") ); print_reductions(_A,right_hand,t) }. define One print_reductions ( ReducingScenario rs ) = if rs is reducing_scenario(n,rh,lh,prec) then print_reductions(n,rh,lh). define One print ( ExToken ec ) = if ec is { token(s) then print(s), empty then alert, dollar then print("$") }. define One print ( ExToken ec, Int32 n ) = if ec is { token(s) then print(right_pad(s,n)), empty then alert, dollar then print(right_pad("$",n)) }. define One print ( Conflict c ) = if c is { reduce_reduce(tok,rs1,rs2) then print(" "); print(tok,21); print("reduce/reduce\n"), shift_reduce(tok,rs,ss) then print(" "); print(tok,21); print("shift/reduce\n") }. define One print ( NewState s ) = if s is state(id,rs,ss,tr,cfls) then print("\n\nstate "+integer_to_string(id)+":\n\n"); map(print,rs); map(print,ss); print("\n"); map(print_term_transition,tr); map(print_reductions,rs); print("\n"); map(print_non_term_transition,tr); if cfls = [ ] then unique else (print("\n -- conflicts --\n"); map(print,cfls)). define One print_conflicts ( List(NewState) l ) = if l is { [ ] then unique, [h . t] then if h is state(_,_,_,_,cfls) then map(print,cfls); print_conflicts(t) }. define One print ( List(NewState) auto ) = map(print,auto). define One print ( List((Symbol,Int32)) l ) = if l is { [ ] then unique, [h . t] then if h is (s,n) then print(" "); print(s); if s is { dollar then print(" shift and go to state "), token(_) then print(" shift and go to state "), non_terminal(_) then print(" goto state ") }; print(n); print("\n"); print(t) }. define One print ( IntermediateState s ) = if s is i_state(id,scnri,trans) then print("\n[state "+integer_to_string(id)+"]\n"); print(scnri); print(trans). define One print ( List(IntermediateState) l ) = if l is { [ ] then unique, [h . t] then print(h); print(t) }. *** (5.1) Printing tools. define One print ( WAddr(Int8) file, String s, Int32 n ) = if nth(n,s) is { failure then unique, success(c) then if file <- c is { failure then print("Cannot write to output.\n"), success(_) then print(file,s,n+1) } }. define One trace_body ( List((Symbol,String)) body, WAddr(Int8) output ) = if body is { [ ] then unique, [h . t] then if h is (n,x) then with name = if n is token(s) then s else if n is non_terminal(s) then "_"+s else alert, print(output,name+"["+x+"] "); trace_body(t,output) }. define One trace_rule ( GrammarRule r, WAddr(Int8) output ) = if r is grammar_rule(head_name,term,body,mbprec) then print(output,"_"+head_name+"["+term+"] -> "); trace_body(body,output); if mbprec is { failure then print(output,".\n"), success(n) then print(output," ["+integer_to_string(n)+"].\n") }. define One print ( (Symbol,String) p ) = if p is (s,t) then print(s); print("("); print(t); print(")"). define One print ( GrammarRule r, ) = if r is grammar_rule(head_name,term,body,mbp) then print(" "+head_name+"["+term+"] -> "); map(print,body); if mbp is { failure then print("\n"), success(n) then print(" ["+integer_to_string(n)+"]\n") }. define One trace_rules ( List(GrammarRule) rules, WAddr(Int8) output ) = if rules is { [ ] then unique, [r1 . others] then trace_rule(r1,output); trace_rules(others,output) }. read trace_apg.anubis The function 'make_parser' receives the grammar read from the source file (together with its name, its precedence and association rules), and also the two output files. define Maybe(One) make_parser ( Grammar g, WAddr(Int8) output, WAddr(Int8) log_file ) = if g is grammar(parser_name,prec_table,assoc_table,rules) then if first_function(rules) is (ff,n) then print(ff); with stts = make_states(reverse(rules)), map(print,stts); with auto = separate(add_numbers_and_transitions(stts),prec_table), print(auto); failure.