The Anubis Project Reading an APG grammar. Copyright (c) Alain Proute' 2006. Author: Alain Proute' Created: March 2006 Revised: October 2014 (added extended BNF) January 2016 (declarations are now in a separate generated file) January 2016 (automatic generation of functions from token properties) The tool described here is part of the Anubis Parser Generator. It reads a 'grammar' in text format, and transforms it into an abstract grammar (of type `APG_Grammar`; see `common.anubis`). The format of APG source files is the following: ignored section #APG public preambule (some Anubis source text) goes to declarations file # private preambule (some Anubis source text) goes to body file # # # postambule (some Anubis source text) goes to body file must be an Anubis symbol not beginning by an upper case letter. Within the declaration section, we may have two sorts of declarations: * precedence-association declarations, of the following form: left . right . non_assoc . where are token names separated by blank characters. The trailing dot is mandatory. * type declarations, of the following form: type () . is the same as for precedence-association declarations. is an Anubis expression representing the type of the symbols (either tokens or non terminals) represented by . The 'extra' declaration allows to transmit data to the parser: extra () . The first operand (enclosed into an extra pair of parentheses) is just a type (the type of data to be transmitted) and the seond operand the name of these data. This name may be used within expressions that you put after the head of grammar rules. The grammar rules section contains grammar rules. Each grammar rule has one of the forms: : . : []. is a grammar symbol possibly followed by an Anubis expression within a pair of parentheses. is a (possibly empty) sequence of expressions similar to or of the form {}. is the name of a token whose precedence level will be used for that grammar rule. Comments of the form: // this is a comment up to end of line are allowed between declarations and rules. *** Extended Backus-Naur. Since October 2014, this syntax is extended as follows. The of a grammar rule is a (possibly empty) sequence of 'grammar rule tail items' ('items' for short). Each item has one of the following forms: () (these first two possibilities were already available) { ... }+(,) { ... }*(,) { ... }?(,) () (commande immediate) { item_1 ... item_k }+(s,t) represents the sequence of items between braces repeated one or more times. The symbol 's' represents the value of this repeated sequence i.e. of the whole item. It is of type 'List(T)' for some type 'T'. The expression 't' is a term computing the value of a single element of this list from the values of item_1, ..., item_k. It is of course of type 'T'. What the APG compiler does is just to replace the above item by 'NAME(s)', where 'NAME' is generated automatically, and to introduce two new rules in the grammar: NAME([t]) : item_1 ... item_k . NAME([t . ___]) : item_1 ... item_k NAME(___) . The other possibilities work similarly with the only difference that '*' means zero, one or more repetitions, and '?' means zero or one (and 'Maybe' is used instead of 'List'). Notice that the braces contain arbirary items, so that these constructs can be nested. For example, the rule: T(k(l)): {a(v) { c(w) { d(y) }?(f,g(y)) }*(z,q(w,f)) b(u)}+(l,p(u,z,v)). produces: (1) _T(k(l)) : a6(l) . (2) a2(failure) : . (3) a2(success(g(y))) : _d(y) . (4) a4([ ]) : . (5) a4([q(w,f) . ___]) : _c(w) a2(f) a4(___) . (6) a6([p(u,z,v)]) : _a(v) a4(z) _b(u) . (7) a6([p(u,z,v) . ___]) : _a(v) a4(z) _b(u) a6(___) . An example of such an APG source file may be found in the file `calculator_example.apg` in the same directory. See the Anubis documentation for a more detailed description of the APG format, and what it means. *** Automatically generating functions from token properties. (added January 2016) APG produces the type Token_Value_parser_name, which has one alternative per token. It is often the case that we have to write down functions taking a token as its unique argument. In case the grammar is reorganized, these functions must be rewritten. The tool provided here simplifies this kind of maintainance. In the declaration section, we can write: properties (Type_1) (val_1) name_1 ... (Type_n) (val_k) name_n. This declaration will produce n functions, named respectively name_1,...,name_n, taking an argument of type Token_Value_parser_name and returning a datum of type Type_1,...,Type_n respectively. By default, function name_i returns the default value val_i for all tokens. In order to override the default for a given token, you can add the declaration: prop name=(val) ... name=(val). which gives the values the named functions must return for this token. Example: properties (Bool) (false) bu (String) ("none") zo. prop gaga bu=(true). will produce: public define Bool bu ( Token_Value_parser_name tok ) = if tok is { eof(_0) then false, ... _gaga(_0) then true, ... }. public define String zo ( Token_Value_parser_name tok ) = if tok is { eof(_0) then "none", ... _gaga(_0) then "none", ... }. In the declaration 'prop name=(val) ..., each value can have a free occurrence of _0 (whose type depends on the particular token). Alternatively, we can declare: func func_name (value) token ... token. in order to assign 'value' to all the tokens of the above enumeration in the function 'func_name'. The declarations of these functions are put in the generated declaration file, and their definitions in the generated body file. read tools/streams.anubis read common.anubis *Name* read_APG_grammar *Description* This function reads a grammar in APG format from some text stream, and returns the (abstract) grammar just read. The type `APG_Error` is defined in `common.anubis`. public define Result(APG_Error,APG_Grammar) read_APG_grammar ( Stream s, List(APG_Option) options ). --- That's all for the public part ! -------------------------------------------------- --------------------------------- Table of Contents ----------------------------------- *** [1] Some tools. *** [1.1] The Kleisli composition. *** [1.1] Skipping blanks and comments. *** [2] Reading the grammar. *** [2.1] Reading the preambule. *** [2.2] Reading the name of the parser. *** [2.3] Reading the declarations. *** [2.4] Reading the grammar rules. *** [2.5] Reading the postambule. *** [2.6] Reading the whole grammar stream. *** [3] Testing the grammar reader. *** [3.1] An example APG source text. *** [3.2] The test. --------------------------------------------------------------------------------------- read tools/basis.anubis define Word32 current_line(Stream s) = truncate_to_Word32(current_line(s)). *** [1] Some tools. *** [1.1] The Kleisli composition. This is the so-called 'Kleisli composition' 'g*f' of the two functions 'f' and 'g' (from the Theory of Monads): f g +------> $E ---> +-----> $E / / $U ---+--------> $V ---+-------> $W This is just some elegant way of handling errors. The function 'f' may produce either a 'normal' result (of type $V) or an error (of type $E). If 'f' produces an error, 'g' is not applied and g*f returns that error. If 'f' produces a normal result, 'g' is applied to that result, producing either a 'normal' result (of type $W) or an error (of type $E). define $U -> Result($E,$W) $V -> Result($E,$W) g * $U -> Result($E,$V) f // the Kleisli composition g*f = ($U x) |-> if f(x) is { error(msg) then error(msg), ok(y) then g(y) }. The same one for 'Maybe': define $U -> Maybe($W) $V -> Maybe($W) g * $U -> Maybe($V) f = ($U x) |-> if f(x) is { failure then failure, success(y) then g(y) }. *** [1.1] Skipping to next leftmost non blank character. define One skip_to_leftmost ( Stream s ) = if read_byte(s) is { failure then unique, success(c) then if member([' ','\t','\n','\r'],c) then skip_to_leftmost(s) else if current_column(s) = 1 then unput_byte(c,s) else skip_to_leftmost(s) }. *** [1.1] Skipping blanks and comments. Comments of the form: // this is a comment up to end of line are allowed in APG source files. The function `skip_comments` skips until the end of line. It is just an auxiliary function for `skip_blanks_and_comments`. `skip_blanks_and_comments` is used to reach the next significant item. No character of this item is read by this function. define One skip_comments(Stream s). // forward declaration define One skip_blanks_and_comments ( Stream s ) = if read_byte(s) is { failure then unique, success(c) then if member([' ','\t','\n','\r'],c) then skip_blanks_and_comments(s) else if c = '/' then if read_byte(s) is { failure then unique, success(d) then if d = '/' then skip_comments(s) else skip_blanks_and_comments(s) } else unput_byte(c,s) }. define One skip_comments // the double slash is already read ( Stream s ) = if read_byte(s) is { failure then unique, success(c) then if c = '\n' // this is for Linux and Windows then skip_blanks_and_comments(s) else if c = '\r' // this is for Mac then skip_blanks_and_comments(s) else skip_comments(s) }. *** [2] Reading the grammar. *** [2.1] Reading the preambule. An APG source file begins by a preambule (an Anubis source text). The beginning of the preambule is marked by '#APG' found in the leftmost column. The end of the public preambule is marked by `#` in the leftmost column. Then comes the private preambule until the next '#' in the leftmost column. define Result(APG_Error,(String,String)) read_private_preambule ( Stream s, String public_preambule, List(Word8) so_far ) = if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '#' then if current_column(s) = 1 then ( unput_byte(c,s); with private_preambule = implode(reverse(so_far)), ok((public_preambule,private_preambule)) ) else read_private_preambule(s,public_preambule,[c . so_far]) else read_private_preambule(s,public_preambule,[c . so_far]) }. define Result(APG_Error,(String,String)) read_preambules // auxiliary function ( Stream s, List(Word8) so_far // bytes read so far (in reverse order) ) = if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '#' then if current_column(s) = 1 then read_private_preambule(s,implode(reverse(so_far)),[]) else read_preambules(s,[c . so_far]) else read_preambules(s,[c . so_far]) }. define Result(APG_Error,(String,String)) // returns the two preambules read_preambules ( Stream s ) = if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '#' then if current_column(s) = 1 then (with must_read_byte = (Word8 a) |-> (One u) |-> if read_byte(s) is { failure then failure, success(b) then if b = a then success(unique) else failure }, if (must_read_byte('G') * must_read_byte('P') * must_read_byte('A'))(unique) is { failure then read_preambules(s), success(_) then read_preambules(s,[]) }) else read_preambules(s) else read_preambules(s) }. *** [2.2] Reading the name of the parser. When the preambule has been read, we have to read the name of the parser. We skip the '#' and possible blanks and comments, and read the name of the parser as a symbol. define Result(APG_Error,String) read_parser_name ( Stream s ) = forget(read_byte(s)); // skip the '#' skip_blanks_and_comments(s); // skip blanks if read_symbol(s) is // read a symbol { failure then error(no_parser_name(current_line(s))), success(parser_name) then ok(parser_name) }. Reading a file name. define String read_filename ( Stream s, List(Word8) so_far ) = if read_byte(s) is { failure then implode(reverse(so_far)), success(c) then if c +=< ' ' then implode(reverse(so_far)) else read_filename(s,[c . so_far]) }. define Result(APG_Error,String) read_filename ( Stream s ) = skip_blanks(s); with t = read_filename(s,[]), if t = "" then error(missing_filename(current_line(s))) else ok(t). *** [2.3] Reading the declarations. After the name of the parser we find all declarations. They are of two sorts: * precedence-association declaration * type declaration precedence-associationn declaration begin by one of the keywords `left`, `right` or `non_assoc`. Type declarations begin by the keyword `type`. The function `next_dec_item` defined below reads the next declaration. If the second '#' is encountered, it returns this separator. Actually, it returns a datum of type type Dec_Item: separator, // marks the end of the declaration section prec_dec (APG_Precedence_Dec), // have read a precedence declaration type_dec (APG_Type_Dec), // have read a type declaration properties (APG_Properties_Dec), // have read the token properties declaration prop_dec (APG_Prop_Dec), // have read properties values for a token func_dec (List(APG_Prop_Dec)), // have read a func declaration extra_dec (String type, String name), // have read the extra dseclaration include (String filepath). // have read an include The function `read_prec_dec` read a precedence delaration. The keyword has already been read and is passed to this function in the form of the functional argument `sort`. This argument is the corresponding constructor of type `APG_Precedence_Dec` (see `common.anubis`). The precedence level is provided. All symbol names read are prefixed by "_". After having read the names of all symbols, we must check and read the mandatory trailing dot. define Stream -> Result(APG_Error,APG_Precedence_Dec) read_prec_dec ( Word32 prec_level, (Word32, List(String)) -> APG_Precedence_Dec sort, ) = (Stream s) |-> with names = map((String str) |-> "_"+str,read_several(s,read_symbol)), skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '.' then ok(sort(prec_level,names)) else error(bad_end_of_precedence_declaration(current_line(s))) }. When we are reading a type declaration (or a grammar rule), we have to read Anubis expressions within a pair of parentheses. The function `read_within_parentheses` reads such an expression. The opening parenthese has already been read. define Maybe(List(Word8)) read_string ( Stream s, List(Word8) so_far ) = if read_byte(s) is { failure then failure, success(c) then if c = '\\' then if read_byte(s) is { failure then failure, success(d) then read_string(s,[d,c . so_far]) } else if c = '\"' then success([c . so_far]) else read_string(s,[c . so_far]) }. define Maybe(String) read_within_parentheses // the opening parenthese is already read. ( Stream s, Int level, // we must remember the current level of parentheses List(Word8) so_far // bytes read so far (not including the opening parenthese) ) = if read_byte(s) is { failure then failure, success(c) then if c = ')' then if level =< 1 then success(implode(reverse(so_far))) // finished else read_within_parentheses(s,level-1,[c . so_far]) // count level and continue else if c = '(' then read_within_parentheses(s,level+1,[c . so_far]) // count level and continue else if c = '\"' then if read_string(s,[c . so_far]) is { failure then failure, success(l) then read_within_parentheses(s,level,l) } else read_within_parentheses(s,level,[c . so_far]) // continue }. Reading a type declaration, after the keyword `type` has been read, amounts to read the type itself as an Anubis expression within parentheses (this is mandatory), and read a sequence of symbols right delimited by a dot. The function `read_type_of_dec` reads just the type expression. define Maybe(String) read_type_of_dec ( Stream s ) = skip_blanks_and_comments(s); if read_byte(s) is { failure then failure, success(c) then if c = '(' then read_within_parentheses(s,1,[]) else failure }. Now, we read a complete type declaration. The keyword `type` is already read (this is why we know it is a type declaration). We read the Anubis type expression, the symbols and the trailing dot. The names of the symbols are prefixed by "_". define Result(APG_Error,APG_Type_Dec) read_type_dec ( Stream s ) = if read_type_of_dec(s) is { failure then error(type_description_expected(current_line(s))), success(type_desc) then with names = map((String str) |-> "_"+str,read_several(s,read_symbol)), skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '.' then ok(type_dec(type_desc,names)) else error(bad_end_of_type_declaration(current_line(s))) } }. Reading the properties declaration. define Result(APG_Error,List(APG_Type_Val_Name)) read_properties_items ( Stream s ) = skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '.' then ok([]) else if c = '(' then if read_within_parentheses(s,1,[]) is { failure then error(incorrect_type_description(current_line(s))), success(type) then skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(d) then if d = '(' then if read_within_parentheses(s,1,[]) is { failure then error(incorrect_value(current_line(s))), success(value) then skip_blanks_and_comments(s); if read_symbol(s) is { failure then error(symbol_expected(current_line(s))), success(name) then if read_properties_items(s) is { error(msg) then error(msg), ok(others) then ok([type_val_name(type,value,name) . others]) } } } else error(left_par_expected(current_line(s))) } } else error(left_par_or_dot_expected(current_line(s))) }. define Result(APG_Error,List((String,String))) read_token_prop_values ( Stream s ) = skip_blanks_and_comments(s); if read_symbol(s) is { failure then if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '.' then ok([]) else error(bad_end_of_property_declaration(current_line(s))) } success(name) then skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '=' then ( skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(d) then if d = '(' then if read_within_parentheses(s,1,[]) is { failure then error(incorrect_value(current_line(s))), success(val) then if read_token_prop_values(s) is { error(msg) then error(msg), ok(others) then ok([(name,val) . others]) } } else error(left_par_expected(current_line(s))) } ) else error(equal_sign_expected(current_line(s))) } }. define Result(APG_Error,(String,List((String,String)))) read_token_prop_dec ( Stream s ) = skip_blanks_and_comments(s); if read_symbol(s) is { failure then error(symbol_expected(current_line(s))), success(token_name) then if read_token_prop_values(s) is { error(msg) then error(msg), ok(nvs) then ok((token_name,nvs)) } }. define Result(APG_Error,(String,String,List(String))) read_func_prop_dec ( Stream s ) = // func [already read] func_name (value) token ... token. skip_blanks_and_comments(s); if read_symbol(s) is { failure then error(symbol_expected(current_line(s))), success(func_name) then skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '(' then if read_within_parentheses(s,1,[]) is { failure then error(incorrect_value(current_line(s))), success(value) then with tokens = read_several(s,read_symbol), ok((func_name,value,tokens)) } else error(left_par_expected(current_line(s))), } }. Reading an 'extra' declaration. define Result(APG_Error,Dec_Item) read_extra_dec ( Stream s ) = if read_type_of_dec(s) is { failure then error(type_description_expected(current_line(s))), success(type_desc) then if read_symbol(s) is { failure then error(symbol_expected(current_line(s))), success(name) then skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '.' then ok(extra_dec(type_desc,name)) else error(bad_end_of_extra_declaration(current_line(s))) } } }. While we are still in the declaration section, we may read either a precedence declaration, a type declaration, an extra declaration, an include command, the separator '#', or the end of input (in case of an 'included' file). Blanks and comments are allowed almost everywhere and skiped. Anything else is an error. define Result(APG_Error,Dec_Item) next_dec_item ( Stream s, Word32 prec_level, Bool included // true if we are currently reading an 'included' file ) = skip_to_leftmost(s); //skip_blanks_and_comments(s); // skip until next declaration or the separator if read_byte(s) is { failure then if included then ok(separator) // the end of the included file acts as a separator else error(unexpected_end_of_input(current_line(s))), success(byte) then if byte = '#' // end of declaration section then ok(separator) else unput_byte(byte,s); if read_symbol(s) is { failure then error(keyword_was_expected(current_line(s))), success(keyword) then if keyword = "type" then (((APG_Type_Dec d) |-> ok(type_dec(d))) * read_type_dec)(s) else if keyword = "include" then (((String n) |-> (Result(APG_Error,Dec_Item))ok(include(n))) * read_filename)(s) else if keyword = "properties" then if read_properties_items(s) is { error(msg) then error(msg), ok(tvns) then ok(properties(properties(tvns))) } else if keyword = "prop" then if read_token_prop_dec(s) is { error(msg) then error(msg), ok(t_nvs) then since t_nvs is (token_name,nvs), ok(prop_dec(prop_dec("_"+token_name,nvs))) } else if keyword = "func" then if read_func_prop_dec(s) is { error(msg) then error(msg), ok(n_v_ts) then since n_v_ts is (func_name,value,tokens), ok(func_dec(map((String tok) |-> prop_dec("_"+tok,[(func_name,value)]), tokens))) } else with g = (APG_Precedence_Dec d) |-> (Result(APG_Error,Dec_Item))ok(prec_dec(d)), if keyword = "left" then (g * read_prec_dec(prec_level,left))(s) else if keyword = "right" then (g * read_prec_dec(prec_level,right))(s) else if keyword = "non_assoc" then (g * read_prec_dec(prec_level,non_assoc))(s) else if keyword = "extra" then (((Dec_Item d) |-> ok(d)) * read_extra_dec)(s) else error(unknown_declaration_keyword(current_line(s),keyword)) } }. Now, we can read all declarations. This is just a loop (terminal recursion) from which we escape when the separator is read. define Result(APG_Error,(List(APG_Precedence_Dec), List(APG_Type_Dec), Maybe(APG_Properties_Dec), List(APG_Prop_Dec), Maybe(Extra))) read_declarations ( Stream s, List(APG_Precedence_Dec) prec_so_far, // precedence declarations read so far List(APG_Type_Dec) type_so_far, // type declarations read so far Maybe(APG_Properties_Dec) properties_so_far, // token properties declaration List(APG_Prop_Dec) prop_so_far, // properties values for tokens Maybe(Extra) extra_so_far, // extra declaration (maybe) Word32 prec_level, // we generate precedence levels here Bool included // true if we are currently reading an 'included' file ) = if next_dec_item(s,prec_level,included) is { error(msg) then error(msg), ok(item) then if item is { separator then ok((reverse(prec_so_far),reverse(type_so_far),properties_so_far,prop_so_far,extra_so_far)), prec_dec(dec) then read_declarations(s,[dec . prec_so_far],type_so_far,properties_so_far,prop_so_far,extra_so_far,prec_level+1,included), type_dec(dec) then read_declarations(s,prec_so_far,[dec . type_so_far],properties_so_far,prop_so_far,extra_so_far,prec_level,included), properties(dec) then read_declarations(s,prec_so_far,type_so_far,success(dec),prop_so_far,extra_so_far,prec_level,included), prop_dec(dec) then read_declarations(s,prec_so_far,type_so_far,properties_so_far,[dec . prop_so_far],extra_so_far,prec_level,included), func_dec(l) then read_declarations(s,prec_so_far,type_so_far,properties_so_far,l+prop_so_far,extra_so_far,prec_level,included), extra_dec(t,n) then read_declarations(s,prec_so_far,type_so_far,properties_so_far,prop_so_far,success(extra(t,n)),prec_level,included), include(path) then if file(path,read) is { failure then error(no_include_file(current_line(s),path)), success(f) then if read_declarations(make_stream(f),[],[],failure,[],failure,prec_level,true) is { error(msg) then error(msg), ok(more) then if more is (precs,types,mb_properties,props,mb_extra) then read_declarations(s,precs+prec_so_far, types+type_so_far, if mb_properties is failure then properties_so_far else mb_properties, props+prop_so_far, if mb_extra is failure then extra_so_far else mb_extra, prec_level+truncate_to_Word32(length(precs)), included) } } } }. Making the list of symbols with a declared type. Given the declarations we can establish the list of those symbols which have a declared type. define List(String) get_typed_symbols_list ( List(APG_Type_Dec) decs ) = if decs is { [ ] then [ ], [dec1 . others] then if dec1 is type_dec(type,symbol_names) then merge(symbol_names, get_typed_symbols_list(others)) }. *** [2.4] Reading the grammar rules. Reading a symbol which may be followed by a value between parentheses. We first skip blanks and comments. Then we try to read an opening parenthese. If what we read is not a parenthese, it is unput. Otherwise, the value of the symbol is read. define Result(APG_Error,String) read_value ( Stream s, String symbol, List(String) typed_symbols, List(APG_Option) options ) = with sym = "_"+symbol, is_typed = member(typed_symbols,sym), check_types = member(options,types), skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(byte) then if byte = '(' then if read_within_parentheses(s,1,[]) is { failure then error(incorrect_symbol_value(current_line(s))), success(term) then if (~check_types | is_typed) then ok(term) else error(symbol_has_no_type(current_line(s),symbol)) } else unput_byte(byte,s); if (check_types & is_typed) then error(symbol_should_have_value(current_line(s),symbol)) else ok("unique") }. define Result(APG_Error,(String,String)) read_double_value ( Stream s, String symbol, List(String) typed_symbols, List(APG_Option) options, String sort // one of "+", "*", "?" ) = with sym = "_"+symbol, is_typed = member(typed_symbols,sym), check_types = member(options,types), skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(byte) then if byte = '(' then if read_symbol(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(symb) then skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(e) then if e = ',' then if read_within_parentheses(s,1,[]) is { failure then error(incorrect_symbol_value(current_line(s))), success(term) then if (~check_types | is_typed) then ok((symb,term)) else error(symbol_has_no_type(current_line(s),symbol)) } else error(comma_expected(sort,current_line(s))) } } else unput_byte(byte,s); if (check_types & is_typed) then error(symbol_should_have_value(current_line(s),symbol)) else ok(("_","unique")) }. define Result(APG_Error,APG_Symbol_Value) read_symbol_and_value ( Stream s, List(String) typed_symbols, List(APG_Option) options ) = skip_blanks_and_comments(s); if read_symbol(s) is { failure then error(symbol_expected(current_line(s))), success(symbol) then if read_value(s,symbol,typed_symbols,options) is { error(msg) then error(msg), ok(val) then ok(symbol_value("_"+symbol,val)) } }. define Result(APG_Error,TailItem) read_symbol_and_value ( Stream s, List(String) typed_symbols, List(APG_Option) options ) = skip_blanks_and_comments(s); if read_symbol(s) is { failure then error(symbol_expected(current_line(s))), success(symbol) then if read_value(s,symbol,typed_symbols,options) is { error(msg) then error(msg), ok(val) then ok(symbol_value("_"+symbol,val)) } }. define Result(APG_Error,List(TailItem)) read_tail_items ( Stream s, List(String) typed_symbols, List(APG_Option) options, List(TailItem) so_far ). define Result(APG_Error,TailItem) read_tail_item ( Stream s, List(String) typed_symbols, List(APG_Option) options ) = skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '{' then if read_tail_items(s,typed_symbols,options,[]) is { error(msg) then error(msg), ok(items) then if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(d) then if d = '+' then if read_double_value(s,"",typed_symbols,options,"+") is { error(msg) then error(msg), ok(dval) then if dval is (sym,v) then ok(plus(items,sym,v)) } else if d = '*' then if read_double_value(s,"",typed_symbols,options,"*") is { error(msg) then error(msg), ok(dval) then if dval is (sym,v) then ok(star(items,sym,v)) } else if d = '?' then if read_double_value(s,"",typed_symbols,options,"?") is { error(msg) then error(msg), ok(dval) then if dval is (sym,v) then ok(maybe(items,sym,v)) } else error(unexpected_char_after_closing_brace(current_line(s))) } } else unput_byte(c,s); read_symbol_and_value(s,typed_symbols,options) }. define Result(APG_Error,List(TailItem)) read_tail_items ( Stream s, List(String) typed_symbols, List(APG_Option) options, List(TailItem) so_far ) = skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = '}' then ok(reverse(so_far)) else unput_byte(c,s); if read_tail_item(s,typed_symbols,options) is { error(msg) then error(msg), ok(i) then read_tail_items(s,typed_symbols,options,[i . so_far]) } }. When we have read the body of a grammar rule, we may have a non mandatory precedence between square brackets. define Result(APG_Error,APG_Extended_Grammar_Rule) read_rule_precedence ( Stream s, APG_Symbol_Value _A, // head of rule (already read) Word32 rule_id, // rule identifier List(TailItem) body // of grammar rule ) = skip_blanks_and_comments(s); if read_symbol(s) is { failure then error(incorrect_rule_precedence(current_line(s))), success(precedence) then skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(c) then if c = ']' then skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(d) then if d = '.' then ok(grammar_rule(rule_id, _A, body, success("_"+precedence))) else error(incorrect_end_of_rule(current_line(s))) } else error(misclosed_rule_precedence(current_line(s))) } }. Reading an immediate command: $() The $ is already read. We have to keep track of parentheses levels. define Result(APG_Error,TailItem) read_command_term // auxiliary function ( Stream s, Int level, List(Word8) so_far ) = if read_byte(s) is { failure then error(unexpected_end_of_file_in_command(current_line(s))), success(b) then if b = ')' then if level = 0 then ok(immcom(implode(reverse(so_far)))) else read_command_term(s,level-1,[b . so_far]) else if b = '(' then read_command_term(s,level+1,[b . so_far]) else read_command_term(s,level,[b . so_far]) }. define Result(APG_Error,TailItem) read_command_term ( Stream s ) = if read_byte(s) is { failure then error(cannot_read_after_dollar(current_line(s))), success(c) then if c = '(' then read_command_term(s,0,[]) else error(dollar_not_followed_by_paren(current_line(s))) }. After having read the head of a grammar rule (including the colon), we have to read the body of the rule. This body is right delimited either by a dot or an opening square bracket if there is a precedence level for that rule. define Result(APG_Error,APG_Extended_Grammar_Rule) read_grammar_rule_tail ( Stream s, APG_Symbol_Value _A, // head of rule (already read) Word32 rule_id, // rule identifier List(TailItem) so_far, // symbols and values List(String) typed_symbols, List(APG_Option) options ) = skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(byte) then if byte = '.' // end of rule then ok(grammar_rule(rule_id,_A,reverse(so_far),failure)) else if byte = '$' // immediate action then if read_command_term(s) is { error(msg) then error(msg), ok(ct) then if so_far is { [ ] then read_grammar_rule_tail(s,_A,rule_id,[ct],typed_symbols,options), [h . t] then if h is immcom(_) then error(consecutive_commands(current_line(s))) else read_grammar_rule_tail(s,_A,rule_id,[ct . so_far],typed_symbols,options) } } else if byte = '[' // precedence then read_rule_precedence(s,_A,rule_id,reverse(so_far)) else unput_byte(byte,s); if read_tail_item(s,typed_symbols,options) is { error(msg) then error(msg), ok(i) then read_grammar_rule_tail(s,_A,rule_id,[i . so_far], typed_symbols,options) } }. We are now reading a complete grammar rule. Here we know that the separator '#' has not yet been seen. Hence, an actual grammar rule should be present. define Result(APG_Error,APG_Extended_Grammar_Rule) read_actual_grammar_rule ( Stream s, String symbol, // head symbol already read Word32 rule_id, // the id for this rule is given. List(String) typed_symbols, List(APG_Option) options ) = if read_value(s,symbol,typed_symbols,options) is // read the head of rule { error(msg) then error(msg), ok(value) then skip_blanks_and_comments(s); if read_byte(s) is // read the mandatory colon { failure then error(unexpected_end_of_input(current_line(s))), success(byte) then if byte = ':' then read_grammar_rule_tail(s,symbol_value("_"+symbol,value),rule_id,[],typed_symbols,options) else error(colon_expected(current_line(s))) } }. Here, we read either a grammar rule or the separator '#'. type ReadRule: rule(APG_Extended_Grammar_Rule), // a rule include(String path), // an 'include' command separator. // end of grammar section (or of file for included files) define Result(APG_Error,ReadRule) read_grammar_rule ( Stream s, Word32 rule_id, List(String) typed_symbols, List(APG_Option) options, Bool included ) = skip_to_leftmost(s); //skip_blanks_and_comments(s); if read_byte(s) is { failure then if included then ok(separator) else error(unexpected_end_of_input(current_line(s))), success(byte) then if byte = '#' then ok(separator) else unput_byte(byte,s); if read_symbol(s) is { failure then ok(separator) success(sym) then skip_blanks_and_comments(s); if read_byte(s) is { failure then error(unexpected_end_of_input(current_line(s))), success(b) then if member(['(',':'],b) then unput_byte(b,s); if read_actual_grammar_rule(s,sym,rule_id,typed_symbols,options) is { error(msg) then error(msg), ok(r) then ok(rule(r)) } else unput_byte(b,s); if sym = "include" then if read_filename(s) is { error(msg) then error(msg), ok(name) then ok(include(name)) } else error(colon_expected(current_line(s))) } } }. Now, we are ready for reading all grammar rules in a loop. At the same type we generate ids (an increasing sequence of numbers) for the grammar rules. define Result(APG_Error,List(APG_Extended_Grammar_Rule)) read_grammar_rules ( Stream s, List(APG_Extended_Grammar_Rule) so_far, Word32 current_rule_id, List(String) typed_symbols, List(APG_Option) options, Bool included // true if reading an 'included' file ) = if read_grammar_rule(s,current_rule_id,typed_symbols,options,included) is { error(msg) then error(msg), ok(rrule) then if rrule is { rule(r) then //print(show_rule(r)); read_grammar_rules(s,[r . so_far], current_rule_id+1,typed_symbols,options,included) include(path) then if file(path,read) is { failure then error(no_include_file(current_line(s),path)), success(f) then if read_grammar_rules(make_stream(f),so_far,current_rule_id,typed_symbols,options,true) is { error(msg) then error(msg), ok(rs) then //print("length(rs) = "+to_decimal(length(rs))+"\n"); read_grammar_rules(s, rs, 1+truncate_to_Word32(length(rs)), typed_symbols, options, included) } } separator then if so_far is { [ ] then error(no_grammar_rule(current_line(s))), [_ . _] then ok(if included then so_far else reverse(so_far)) // we have read the separator (or end of file // if in an 'included' file) } } }. *** [2.5] Reading the postambule. Reading the postambule amounts to read everything until we find the end of the input stream. We use the tool `read_line` to minimize the overhead. define String read_postambule ( Stream s, String result, ) = if read_line(s) is { failure then result success(l) then read_postambule(s,result+l) }. *** [2.6] Checking that explicit rule precedences are defined define Bool defined ( String name, List(APG_Precedence_Dec) decs ) = if decs is { [ ] then false, [h . t] then if name:symbol_names(h) then true else defined(name,t) }. define Result(APG_Error,One) check_def_rule_precs ( List(APG_Precedence_Dec) decs, List(APG_Extended_Grammar_Rule) grammar_rules ) = if grammar_rules is { [ ] then ok(unique), [r1 . others] then since r1 is grammar_rule(id,head,body,prec), if prec is { failure then check_def_rule_precs(decs,others), success(name) then if defined(name,decs) then check_def_rule_precs(decs,others) else error(undefined_explicit_prec_level(name)) } }. *** [2.6] Reading the whole grammar stream. This amounts to read successively: - the preambule, - the parser name, - the declarations, - the grammar rules, - the postambule. public define Result(APG_Error,APG_Extended_Grammar) read_APG_grammar ( Stream s, List(APG_Option) options ) = (if member(options,verbose) then print("Reading the grammar ..."); forget(flush(stdout)) else unique); with result = if read_preambules(s) is { error(msg) then error(msg), ok(preambules) then since preambules is (public_preambule,private_preambule), if read_parser_name(s) is { error(msg) then error(msg), ok(parser_name) then if read_declarations(s,[],[],failure,[],failure,0,false) is { error(msg) then error(msg), ok(decs) then if decs is (prec_decs,type_decs,mb_properties,props,mb_extra) then if read_grammar_rules(s,[],1,get_typed_symbols_list(type_decs),options,false) is { error(msg) then error(msg), ok(grammar_rules) then if check_def_rule_precs(prec_decs,grammar_rules) is { error(msg) then error(msg), ok(_) then with postambule = read_postambule(s,""), ok(grammar(public_preambule, private_preambule, parser_name, prec_decs, type_decs, mb_properties, props, grammar_rules, mb_extra, postambule)) } } } } }, (if member(options,verbose) then print("Done. ") else unique); result. *** [3] Expanding the extended grammar. define (List(APG_Grammar_Rule),List(APG_Tail_Symbol_Value),Word32) expand ( List(TailItem) body, Word32 count ) = if body is { [ ] then ([],[],count), [h . t] then if expand(t,count) is (t_rules,t_body,new_count) then if h is { symbol_value(name,value) then (t_rules,[symbol_value(name,value) . t_body],new_count), plus(items,sym,val) then if expand(items,new_count) is (more_rules,extra_body,count2) then with rname = "a"+to_decimal(count2), ([grammar_rule(count2+1, symbol_value(rname,"["+val+" . ___]"), append(extra_body,[symbol_value(rname,"___")]), failure), grammar_rule(count2, symbol_value(rname,"["+val+"]"), extra_body, failure) . append(more_rules,t_rules)], [symbol_value(rname,sym) . t_body], count2+2), star(items,sym,val) then if expand(items,new_count) is (more_rules,extra_body,count2) then with rname = "a"+to_decimal(count2), ([grammar_rule(count2+1, symbol_value(rname,"["+val+" . ___]"), append(extra_body,[symbol_value(rname,"___")]), failure), grammar_rule(count2, symbol_value(rname,"[ ]"), [], failure) . append(more_rules,t_rules)], [symbol_value(rname,sym) . t_body], count2+2), maybe(items,sym,val) then if expand(items,new_count) is (more_rules,extra_body,count2) then with rname = "a"+to_decimal(count2), ([grammar_rule(count2+1, symbol_value(rname,"success("+val+")"), extra_body, failure), grammar_rule(count2, symbol_value(rname,"failure"), [], failure) . append(more_rules,t_rules)], [symbol_value(rname,sym) . t_body], count2+2), immcom(text) then (t_rules,[immcom(text) . t_body],new_count) } }. define List(APG_Grammar_Rule) expand ( List(APG_Extended_Grammar_Rule) l, List(APG_Grammar_Rule) so_far, Word32 count ) = if l is { [ ] then reverse(so_far), [h . t] then if h is grammar_rule(id,head,body,prec) then if expand(body,count) is (new_rules,new_body,new_count) then expand(t,append(new_rules,[grammar_rule(id,head,new_body,prec) . so_far]),new_count) }. public define APG_Grammar expand ( APG_Extended_Grammar g ) = if g is grammar(pubpre,privpre,pname,prec,types,mb_properties,props,rules,extra,post) then grammar(pubpre,privpre,pname,prec,types,mb_properties,props,expand(rules,[],truncate_to_Word32(length(rules))+1),extra,post).