Commit b61efc885dd9bb72a0d38d082f52a7de021d9079

Authored by Alain Prouté
1 parent b328018b

*** empty log message ***

Showing 1 changed file with 415 additions and 530 deletions   Show diff stats
anubis_dev/library/syntactic_analysis/parser_maker.anubis
... ... @@ -3,7 +3,7 @@
3 3  
4 4 The Anubis Project
5 5  
6   - The Anubis Parser Generator
  6 + The Anubis Parser Maker
7 7  
8 8 Copyright (c) Alain Prouté 2006.
9 9  
... ... @@ -11,11 +11,10 @@
11 11  
12 12  
13 13  
14   - This is the source file for the Anubis Parser Generator.
15   -
16   - From a grammar, APM generates an Anubis source file containing a program (called a
17   - parser) able to recognize sentences of the corresponding language. APM is very similar
18   - to the well known UNIX tool 'YACC' (or its GNU equivalent 'BISON').
  14 + From a grammar, APM (the 'Anubis Parser Maker') generates an Anubis source file
  15 + containing a program (called a 'parser') able to recognize sentences of the
  16 + corresponding language. APM is very similar to the well known UNIX tool 'YACC' (or its
  17 + GNU equivalent 'BISON').
19 18  
20 19  
21 20  
... ... @@ -60,9 +59,9 @@
60 59 *** (5.3) States as functions.
61 60  
62 61 *** (6) Putting it all together.
  62 + (this is still under construction)
63 63  
64   -
65   - ------------------------------------
  64 + ---------------------------------------------------------------------------------------
66 65  
67 66  
68 67  
... ... @@ -75,10 +74,9 @@
75 74  
76 75 *** (1.1) In theory.
77 76  
78   - We have two finite (and disjoint) sets of symbols: 'tokens' (also
79   - called 'terminals') and 'non terminals'. Here are our notational
80   - conventions (used in these explanations only, not in APM source
81   - files):
  77 + We have two finite (and disjoint) sets of symbols: 'tokens' (also called 'terminals')
  78 + and 'non terminals'. Here are our notational conventions (used in these explanations
  79 + only, not in APM source files):
82 80  
83 81 a, b, c,... represent tokens
84 82 A, B, C,... represent non terminals
... ... @@ -88,80 +86,68 @@
88 86 e represent the empty sequence of grammar symbols
89 87 $ is the end marker (a special additional token)
90 88  
91   - A 'grammar rule' (or 'production') has the form: A -> u (this one
92   - is called an 'A-production'). In other words, it has a non terminal
93   - on the left of the arrow, and a (possibly empty) sequence of
94   - grammar symbols on the right of the arrow. Its meaning is that we
95   - can produce an expression 'of type' 'A', by concatenating expressions
96   - of types X_1...X_k, where u = X_1...X_k. In this interpretation,
97   - tokens represent themselves.
  89 + A 'grammar rule' (or 'production') has the form: A -> u (this one is called an
  90 + 'A-production'). In other words, it has a non terminal on the left of the arrow, and a
  91 + (possibly empty) sequence of grammar symbols on the right of the arrow. Its meaning is
  92 + that we can produce an expression 'of type' 'A', by concatenating expressions of types
  93 + X_1...X_k, where u = X_1...X_k. In this interpretation, tokens represent themselves.
98 94  
99   - A 'grammar' is a finite set of grammar rules, together with a
100   - distinguished non terminal (denoted 'S' in these explanations),
101   - called the 'axiom'. The 'language' associated to the grammar is the
102   - set of all sequences of tokens which may produce 'S' (we also say
103   - that they are 'instances' of 'S').
  95 + A 'grammar' is a finite set of grammar rules, together with a distinguished non
  96 + terminal (denoted 'S' in these explanations), called the 'axiom'. The 'language'
  97 + associated to the grammar is the set of all sequences of tokens which may produce 'S'
  98 + (we also say that they are 'instances' of 'S').
104 99  
105   - For our convenience, we assume that there is one and only one
106   - S-production, and that it has the form: S -> A. Furthermore, S
107   - cannot appear in the right hand member of a production. It is
108   - trivial to replace a given grammar by a grammar fulfilling these
109   - conditions, by adding a new non terminal S, and the single new rule
110   - S -> A, where A is the axiom of the original grammar. This
111   - operation does not change the corresponding language. It is
112   - realized below by the function 'add_S_rule'.
  100 + For our convenience, we assume that there is one and only one S-production, and that it
  101 + has the form: S -> A. Furthermore, S cannot appear in the right hand member of a
  102 + production. It is trivial to replace a given grammar by a grammar fulfilling these
  103 + conditions, by adding a new non terminal S, and the single new rule S -> A, where A is
  104 + the axiom of the original grammar. This operation does not change the corresponding
  105 + language. It is realized below by the function 'add_S_rule'.
113 106  
114 107  
115 108  
116 109  
117 110 *** (1.2) In APM source files.
118 111  
119   - Of course, we need to read grammars from a source file (an APM
120   - source file). The denotation for grammars in APM source files is
121   - somewhat more complicated, because we must take the values of
122   - grammar symbols into account.
123   -
124   - Indeed, in practice, terminals and non terminals may have
125   - values. Hence, we have an Anubis type (the type of syntactical
126   - entities) whose alternatives describe the required values (for both
127   - terminals and non terminals).
128   -
129   - When the ALG lexer returns a token, this token already has received a
130   - value. When the parser reduces a sequence X_1...X_k of grammar
131   - symbols, using the production A -> X_1...X_k, it computes the value
132   - of A from the values of X_1...X_k. Hence, the denotation for
133   - productions should allow the description of this computation. In
134   - YACC and BISON, this computation is described (in the language C)
135   - within so-called 'actions', which are post-fixed to grammar
136   - rules. In APM it is somewhat different.
  112 + Of course, we need to read grammars from a source file (an APM source file). The
  113 + denotation for grammars in APM source files is somewhat more complicated, because we
  114 + must take the values of grammar symbols into account.
  115 +
  116 + Indeed, in practice, terminals and non terminals may have values. Hence, we have an
  117 + Anubis type (the type of syntactical entities) whose alternatives describe the required
  118 + values (for both terminals and non terminals).
  119 +
  120 + When the ALG lexer returns a token, this token already has received a value. When the
  121 + parser reduces a sequence X_1...X_k of grammar symbols, using the production A ->
  122 + X_1...X_k, it computes the value of A from the values of X_1...X_k. Hence, the
  123 + denotation for productions should allow the description of this computation. In YACC
  124 + and BISON, this computation is described (in the language C) within so-called
  125 + 'actions', which are post-fixed to grammar rules. In APM it is somewhat different.
137 126  
138   - Since APM grammar symbols may be also names of alternatives, they
139   - may have operands, and the right hand side X_1...X_k of a
140   - production, will be written for example as:
  127 + Since APM grammar symbols may be also names of alternatives, they may have operands,
  128 + and the right hand side X_1...X_k of a production, will be written for example as:
141 129  
142 130 X_1(x,y) X_2(z) X_3 X_4(u,v,w)
143 131  
144   - assuming in this example that the grammar symbol X_1 has two
145   - operands, X_2 one operand, X_3 no operand and X_4 three operands.
  132 + assuming in this example that the grammar symbol X_1 has two operands, X_2 one operand,
  133 + X_3 no operand and X_4 three operands.
146 134  
147   - In this denotation, x, y, z, u, v and w must be symbols. In the
148   - automaton produced by APM, they will become resurgent symbols.
  135 + In this denotation, x, y, z, u, v and w must be symbols. In the automaton produced by
  136 + APM, they will become resurgent symbols.
149 137  
150   - Now, the complete production A -> X_1...X_k will be denoted
151   - (assuming the same example):
  138 + Now, the complete production A -> X_1...X_k will be denoted (assuming the same
  139 + example):
152 140  
153 141 A(t): X_1(x,y) X_2(z) X_3 X_4(u,v,w).
154 142  
155   - where t is a term (or several terms separated by commas), which may
156   - make use of the symbols x, y, z, u, v and w. Of course t will be
157   - used to compute the value of A when the reduction via this
158   - production will occur. The above rule is something like a case in a
159   - conditional, except that A(t) which plays the role of the body of
160   - case, is written on the left hand side.
  143 + where t is a term (or several terms separated by commas), which may make use of the
  144 + symbols x, y, z, u, v and w. Of course t will be used to compute the value of A when
  145 + the reduction via this production will occur. The above rule is something like a case
  146 + in a conditional, except that A(t) which plays the role of the body of case, is written
  147 + on the left hand side.
161 148  
162   - Hence, an APM grammar rule is described by the following
163   - self-explanatory 'meta-grammar' (the symbol between square
164   - brackets is a precedence level):
  149 + Hence, an APM grammar rule is described by the following self-explanatory
  150 + 'meta-grammar' (the symbol between square brackets is a precedence level):
165 151  
166 152 GrammarRule -> Head : Body .
167 153 | Head : Body [ Symbol ] .
... ... @@ -178,8 +164,8 @@
178 164 Symbols_1 -> Symbol
179 165 | Symbol , Symbols_1
180 166  
181   - In a 'Head', APM does not read the 'Term', but just keeps track of
182   - matching parentheses (not contained within strings).
  167 + In a 'Head', APM does not read the 'Term', but just keeps track of matching parentheses
  168 + (not contained within strings).
183 169  
184 170 Now, an APM source file has the following format:
185 171  
... ... @@ -193,14 +179,13 @@
193 179 postambule (Anubis text)
194 180  
195 181  
196   - Both tokens and nonterminals should be acceptable Anubis
197   - symbols. Indeed, they must also be names of alternatives in the
198   - type of syntactical entities. The name of this type is formed by
199   - the concatenation of 'SyntaxTree_' and the name of the
200   - parser. Normally it is defined by the user in the preambule.
  182 + Both tokens and nonterminals should be acceptable Anubis symbols. Indeed, they must
  183 + also be names of alternatives in the type of syntactical entities. The name of this
  184 + type is formed by the concatenation of 'SyntaxTree_' and the name of the
  185 + parser. Normally it is defined by the user in the preambule.
201 186  
202   - Reading APM grammars is simple enough so that we do not need to use
203   - neither ALG nor APM.
  187 + Reading APM grammars is simple enough so that we do not need to use neither ALG nor
  188 + APM.
204 189  
205 190  
206 191  
... ... @@ -264,14 +249,12 @@ define List($T)
264 249  
265 250 *** (2) Reading APM source files.
266 251  
267   - Below are the functions which enable APM to read source
268   - files. There is also some kind of a lexer. Its state is stored into
269   - a datum of type 'APM_LexerState'. This lexer keeps track of line
270   - numbers, eliminates blank characters, and tokenizes the input into
271   - a sequence of 'meta-tokens'.
  252 + Below are the functions which enable APM to read source files. There is also some kind
  253 + of a lexer. Its state is stored into a datum of type 'APM_LexerState'. This lexer keeps
  254 + track of line numbers, eliminates blank characters, and tokenizes the input into a
  255 + sequence of 'meta-tokens'.
272 256  
273   - The meta-tokens we need to recognize in APM source files are the
274   - following:
  257 + The meta-tokens we need to recognize in APM source files are the following:
275 258  
276 259 symbols
277 260 terms (delimited by parentheses)
... ... @@ -283,16 +266,14 @@ define List($T)
283 266 premature end of file (the legal end of file will be found by
284 267 the function copying the postambule)
285 268  
286   - They are defined as the alternatives of the type 'MetaToken'. Then,
287   - assembling tokens into precedence rules or grammar rules is rather
288   - easy.
  269 + They are defined as the alternatives of the type 'MetaToken'. Then, assembling tokens
  270 + into precedence rules or grammar rules is rather easy.
289 271  
290 272  
291 273  
292 274 *** (2.1) Reading characters.
293 275  
294   - We must read characters in an extended sens, to take the end of
295   - file into account.
  276 + We must read characters in an extended sens, to take the end of file into account.
296 277  
297 278 type ExChar:
298 279 char(Int8), // normal character
... ... @@ -307,8 +288,8 @@ type APM_LexerState:
307 288 Maybe(Int8) unread). // character possibly 'unread'
308 289  
309 290  
310   - Here is how we read a character (returning both the new state of
311   - the lexer and the extended character).
  291 + Here is how we read a character (returning both the new state of the lexer and the
  292 + extended character).
312 293  
313 294 define (APM_LexerState,ExChar)
314 295 read_char
... ... @@ -345,16 +326,15 @@ define (APM_LexerState,ExChar)
345 326 char(c))
346 327 }.
347 328  
348   - Note: 'unreading' a character is done 'by hand' by functions which
349   - need to do that. They can do it because they hold the lexer state.
  329 + Note: 'unreading' a character is done 'by hand' by functions which need to do
  330 + that. They can do it because they hold the lexer state.
350 331  
351 332  
352 333  
353 334  
354 335 *** (2.2) Reading meta-tokens.
355 336  
356   - While reading grammar rules, we need to recognize several kinds of
357   - meta-tokens:
  337 + While reading grammar rules, we need to recognize several kinds of meta-tokens:
358 338  
359 339 type MetaToken:
360 340 symbol(String), // a regular Anubis symbol
... ... @@ -366,12 +346,11 @@ type MetaToken:
366 346 error(Int8), // any misplaced character
367 347 premature_end_of_file. // self explanatory
368 348  
369   - Note: (t), (x,y), (z), etc... are seen as 'term(String)'
370   - meta-tokens. This is why parentheses do not appear in the above
371   - definition of meta-tokens.
  349 + Note: (t), (x,y), (z), etc... are seen as 'term(String)' meta-tokens. This is why
  350 + parentheses do not appear in the above definition of meta-tokens.
372 351  
373 352  
374   - Here is a simple useful test for detecting the beginning of a symbol.
  353 + Here is a simple useful test for detecting the beginning of a symbol.
375 354  
376 355 define Bool
377 356 may_begin_symbol
... ... @@ -412,8 +391,8 @@ define Bool
412 391 false.
413 392  
414 393  
415   - The function below reads a symbol whose first characters (at least
416   - one) have already been read, and are given in reverse order.
  394 + The function below reads a symbol whose first characters (at least one) have already
  395 + been read, and are given in reverse order.
417 396  
418 397 define (APM_LexerState,MetaToken)
419 398 read_symbol
... ... @@ -437,14 +416,12 @@ define (APM_LexerState,MetaToken)
437 416 }.
438 417  
439 418  
440   - The function 'read_string_within_term' is called while reading a
441   - string within a term (itself delimited by parentheses). The
442   - beginning of the term has already been read. We need to declare
443   - 'read_term', because the two functions are mutually recursive. In
444   - fact 'read_term' calls (terminally) 'read_string_in_term' when the
445   - beginning of a string is detected. Similarly, 'read_string_in_term'
446   - calls (terminally) 'read_term' when the end of that string is
447   - found.
  419 + The function 'read_string_within_term' is called while reading a string within a term
  420 + (itself delimited by parentheses). The beginning of the term has already been read. We
  421 + need to declare 'read_term', because the two functions are mutually recursive. In fact
  422 + 'read_term' calls (terminally) 'read_string_in_term' when the beginning of a string is
  423 + detected. Similarly, 'read_string_in_term' calls (terminally) 'read_term' when the end
  424 + of that string is found.
448 425  
449 426 define (APM_LexerState,MetaToken)
450 427 read_term
... ... @@ -486,8 +463,8 @@ define (APM_LexerState,MetaToken)
486 463 }.
487 464  
488 465  
489   - The function below reads anything placed between balanced parentheses. The
490   - opening parenthese has already been read.
  466 + The function below reads anything placed between balanced parentheses. The opening
  467 + parenthese has already been read.
491 468  
492 469 define (APM_LexerState,MetaToken)
493 470 read_term
... ... @@ -582,8 +559,8 @@ define (APM_LexerState,MetaToken)
582 559  
583 560  
584 561  
585   - The next function reads the next meta-token from the source file,
586   - whatever this meta-token is.
  562 + The next function reads the next meta-token from the source file, whatever this
  563 + meta-token is.
587 564  
588 565 define (APM_LexerState,MetaToken)
589 566 read_meta_token
... ... @@ -617,9 +594,8 @@ define (APM_LexerState,MetaToken)
617 594  
618 595 *** (2.3) Reading precedence and association rules.
619 596  
620   - Each token may be assigned a precedence level. A precedence level
621   - is an integer, but it is implicit in the APM source file. Only the
622   - order of declarations makes sens.
  597 + Each token may be assigned a precedence level. A precedence level is an integer, but it
  598 + is implicit in the APM source file. Only the order of declarations makes sens.
623 599  
624 600 Each declaration has one of the forms:
625 601  
... ... @@ -643,8 +619,7 @@ type ReadPrecRuleResult:
643 619 premature_end_of_file.
644 620  
645 621  
646   - The next function reads (maybe) a sequence of symbols, right
647   - delimited by a dot.
  622 + The next function reads (maybe) a sequence of symbols, right delimited by a dot.
648 623  
649 624 define (APM_LexerState,Maybe(List(String)))
650 625 read_symbols
... ... @@ -660,9 +635,8 @@ define (APM_LexerState,Maybe(List(String)))
660 635 Note: names are stored in reverse order, but it does'nt matter.
661 636  
662 637  
663   - Now, we read a precedence rule whose keyword has already been
664   - successfully read and recognized (and replaced by the corresponding
665   - constructor for type 'PrecRule').
  638 + Now, we read a precedence rule whose keyword has already been successfully read and
  639 + recognized (and replaced by the corresponding constructor for type 'PrecRule').
666 640  
667 641 define (APM_LexerState,ReadPrecRuleResult)
668 642 read_prec_names
... ... @@ -680,8 +654,8 @@ define (APM_LexerState,ReadPrecRuleResult)
680 654 }.
681 655  
682 656  
683   - Here, we read a precedence rule, whose keyword has been read but
684   - not yet recognized (it is only a character string at that point).
  657 + Here, we read a precedence rule, whose keyword has been read but not yet recognized (it
  658 + is only a character string at that point).
685 659  
686 660 define (APM_LexerState,ReadPrecRuleResult)
687 661 read_after_prec_keyword
... ... @@ -716,9 +690,8 @@ define (APM_LexerState,ReadPrecRuleResult)
716 690 }.
717 691  
718 692  
719   - Now, we must be able to read a sequence of precedence rules. This
720   - is achieved by the following function, which reads precedence rules
721   - until a separator (#) is found.
  693 + Now, we must be able to read a sequence of precedence rules. This is achieved by the
  694 + following function, which reads precedence rules until a separator (#) is found.
722 695  
723 696 type ReadPrecRulesResult:
724 697 ok(List(PrecRule)),
... ... @@ -743,10 +716,9 @@ define (APM_LexerState,ReadPrecRulesResult)
743 716 }.
744 717  
745 718  
746   - Now, we can construct precedence tables. The first one gives the
747   - precedence level for each token name. The second one gives the
748   - association mode for each precedence level. They are lists of
749   - the following respective types:
  719 + Now, we can construct precedence tables. The first one gives the precedence level for
  720 + each token name. The second one gives the association mode for each precedence
  721 + level. They are lists of the following respective types:
750 722  
751 723 List((String,Int32))
752 724 List((Int32,AssocMode))
... ... @@ -757,8 +729,8 @@ type AssocMode:
757 729 non_assoc.
758 730  
759 731  
760   - The next function constructs the table of association modes from
761   - the list of precedence rules.
  732 + The next function constructs the table of association modes from the list of precedence
  733 + rules.
762 734  
763 735 define List((Int32,AssocMode))
764 736 make_assoc_table
... ... @@ -786,8 +758,8 @@ define List((Int32,AssocMode))
786 758 make_assoc_table(l,0).
787 759  
788 760  
789   - The next function constructs the list of entries in the precedence
790   - table for just one level.
  761 + The next function constructs the list of entries in the precedence table for just one
  762 + level.
791 763  
792 764 define List((String,Int32))
793 765 make_precedence_entries
... ... @@ -803,8 +775,8 @@ define List((String,Int32))
803 775 }.
804 776  
805 777  
806   - The next function constructs the table of precedence levels from
807   - the list of precedence rules.
  778 + The next function constructs the table of precedence levels from the list of precedence
  779 + rules.
808 780  
809 781 define List((String,Int32))
810 782 make_precedence_table
... ... @@ -822,8 +794,8 @@ define List((String,Int32))
822 794 }.
823 795  
824 796  
825   - The next function gives the mode for a given precedence level
826   - (using the association table).
  797 + The next function gives the mode for a given precedence level (using the association
  798 + table).
827 799  
828 800 define AssocMode
829 801 mode
... ... @@ -843,10 +815,9 @@ define AssocMode
843 815 }.
844 816  
845 817  
846   - The next function checks the precedence table. It consists in
847   - verifying that the same name is not present two times, and that no
848   - non terminal has an entry in the table (we will see later how to
849   - construct the list of names of non terminals).
  818 + The next function checks the precedence table. It consists in verifying that the same
  819 + name is not present two times, and that no non terminal has an entry in the table (we
  820 + will see later how to construct the list of names of non terminals).
850 821  
851 822 type CheckPrecResult:
852 823 ok,
... ... @@ -890,8 +861,7 @@ define CheckPrecResult
890 861 }.
891 862  
892 863  
893   - The next function gives the precedence level (if it exists) for a given
894   - token name.
  864 + The next function gives the precedence level (if it exists) for a given token name.
895 865  
896 866 define Maybe(Int32)
897 867 prec
... ... @@ -911,7 +881,7 @@ define Maybe(Int32)
911 881 }.
912 882  
913 883  
914   - The same one, but for a possibly missing name.
  884 + The same one, but for a possibly missing name.
915 885  
916 886 define Maybe(Int32)
917 887 prec
... ... @@ -946,9 +916,8 @@ type Symbol:
946 916 non_terminal(String name). // any non terminal with its name
947 917  
948 918  
949   - Grammar rules A(t) -> u [p] (where p is a possible precedence
950   - level: actually, the name of a token) are stored as data of the
951   - following type:
  919 + Grammar rules A(t) -> u [p] (where p is a possible precedence level: actually, the name
  920 + of a token) are stored as data of the following type:
952 921  
953 922 type GrammarRule:
954 923 grammar_rule(String head, // A
... ... @@ -956,11 +925,11 @@ type GrammarRule:
956 925 List((Symbol,String)) body, // u
957 926 Maybe(Int32) prec). // precedence level of p
958 927  
959   - Note: in the pair (Symbol,String), the second element represents the
960   - value of the symbol (if no value is given, it is the empty string).
  928 + Note: in the pair (Symbol,String), the second element represents the value of the
  929 + symbol (if no value is given, it is the empty string).
961 930  
962   - Below is a function which reads the right hand side of a grammar
963   - rule. We need a type to handle the result of such a reading.
  931 + Below is a function which reads the right hand side of a grammar rule. We need a type
  932 + to handle the result of such a reading.
964 933  
965 934 type RightHandResult:
966 935 ok(List((Symbol,String)), // a correct right hand side has been read
... ... @@ -1023,8 +992,8 @@ define (APM_LexerState,RightHandResult)
1023 992 }.
1024 993  
1025 994  
1026   - We also need a special type to handle all possible situations in the
1027   - result of reading a grammar rule.
  995 + We also need a special type to handle all possible situations in the result of reading
  996 + a grammar rule.
1028 997  
1029 998 type ReadGrammarRuleResult:
1030 999 ok(GrammarRule), // a grammar rule has been read successfully
... ... @@ -1035,8 +1004,8 @@ type ReadGrammarRuleResult:
1035 1004 // reading a parser section
1036 1005  
1037 1006  
1038   - Below is a function which reads a grammar rule whose head
1039   - (including the colon) has been already read.
  1007 + Below is a function which reads a grammar rule whose head (including the colon) has
  1008 + been already read.
1040 1009  
1041 1010 define (APM_LexerState,ReadGrammarRuleResult)
1042 1011 read_after_colon
... ... @@ -1056,8 +1025,8 @@ define (APM_LexerState,ReadGrammarRuleResult)
1056 1025 }.
1057 1026  
1058 1027  
1059   - Below is a function which reads a grammar rule whose head has
1060   - already been read (not including the colon).
  1028 + Below is a function which reads a grammar rule whose head has already been read (not
  1029 + including the colon).
1061 1030  
1062 1031 define (APM_LexerState,ReadGrammarRuleResult)
1063 1032 read_after_head
... ... @@ -1073,8 +1042,7 @@ define (APM_LexerState,ReadGrammarRuleResult)
1073 1042 else (ls,syntax_error).
1074 1043  
1075 1044  
1076   - Below is a function which reads a grammar rule whose head name has
1077   - already been read.
  1045 + Below is a function which reads a grammar rule whose head name has already been read.
1078 1046  
1079 1047 define (APM_LexerState,ReadGrammarRuleResult)
1080 1048 read_after_head_name
... ... @@ -1098,7 +1066,7 @@ define (APM_LexerState,ReadGrammarRuleResult)
1098 1066  
1099 1067  
1100 1068  
1101   - Below is a function, which reads a complete grammar rule from a file.
  1069 + Below is a function, which reads a complete grammar rule from a file.
1102 1070  
1103 1071 define (APM_LexerState,ReadGrammarRuleResult)
1104 1072 read_grammar_rule
... ... @@ -1145,11 +1113,10 @@ define (APM_LexerState,ReadGrammarRulesResult)
1145 1113  
1146 1114 *** (2.5) Finding non terminals.
1147 1115  
1148   - So far, the grammar has been read, but all symbols have been stored
1149   - as terminals. We must establish the list of names of all non
1150   - terminals (they simply appear at the head of grammar rules, and
1151   - change in grammar rules any symbol whose name matches one of these,
1152   - to a non terminal.
  1116 + So far, the grammar has been read, but all symbols have been stored as terminals. We
  1117 + must establish the list of names of all non terminals (they simply appear at the head
  1118 + of grammar rules, and change in grammar rules any symbol whose name matches one of
  1119 + these, to a non terminal.
1153 1120  
1154 1121  
1155 1122 define List(String)
... ... @@ -1274,10 +1241,9 @@ define Maybe(One)
1274 1241  
1275 1242  
1276 1243  
1277   - The next function reads from the first separator to the third
1278   - (last) one. It also calls the functions which will construct the
1279   - automaton and dump it into the output file and the log file. Here
1280   - is what it does:
  1244 + The next function reads from the first separator to the third (last) one. It also calls
  1245 + the functions which will construct the automaton and dump it into the output file and
  1246 + the log file. Here is what it does:
1281 1247  
1282 1248 - read the name of the parser,
1283 1249 - read precedence rules,
... ... @@ -1285,8 +1251,7 @@ define Maybe(One)
1285 1251 - construct a datum of type 'Grammar',
1286 1252 - call 'make_parser'
1287 1253  
1288   - it returns failure in case of a problem, and success(unique)
1289   - otherwise.
  1254 + it returns failure in case of a problem, and success(unique) otherwise.
1290 1255  
1291 1256  
1292 1257 type Grammar:
... ... @@ -1356,10 +1321,9 @@ define Maybe(One)
1356 1321  
1357 1322  
1358 1323  
1359   - The next function dumps the content of the input file into the output
1360   - file, until the first separator is found. In other words, it copies
1361   - the preambule to the output. It does not use the lexer, and must
1362   - update the line number itself.
  1324 + The next function dumps the content of the input file into the output file, until the
  1325 + first separator is found. In other words, it copies the preambule to the output. It
  1326 + does not use the lexer, and must update the line number itself.
1363 1327  
1364 1328 define Maybe(Int32)
1365 1329 copy_preambule
... ... @@ -1390,8 +1354,8 @@ define Maybe(Int32)
1390 1354 }.
1391 1355  
1392 1356  
1393   - The next function copies the postambule to the output. It does not
1394   - need to count line numbers.
  1357 + The next function copies the postambule to the output. It does not need to count line
  1358 + numbers.
1395 1359  
1396 1360 define One
1397 1361 copy_postambule
... ... @@ -1413,9 +1377,8 @@ define One
1413 1377  
1414 1378  
1415 1379  
1416   - The next function receives the three files (input, output and the
1417   - log file), reads the grammar and make the automaton. It proceeds in
1418   - three steps:
  1380 + The next function receives the three files (input, output and the log file), reads the
  1381 + grammar and make the automaton. It proceeds in three steps:
1419 1382  
1420 1383 - copy the preambule to the output,
1421 1384 - create a lexer state, read the precedence rules, the grammar
... ... @@ -1462,8 +1425,8 @@ define Maybe(Option)
1462 1425  
1463 1426  
1464 1427  
1465   - The next function takes the arguments of the command line and
1466   - separates options from the source file name.
  1428 + The next function takes the arguments of the command line and separates options from
  1429 + the source file name.
1467 1430  
1468 1431 define Maybe((String,List(Option)))
1469 1432 separate_options
... ... @@ -1508,8 +1471,7 @@ define Maybe((String,List(Option)))
1508 1471  
1509 1472  
1510 1473  
1511   - Finally, here is the function which is made global. It performs the
1512   - following tasks:
  1474 + Finally, here is the function which is made global. It performs the following tasks:
1513 1475  
1514 1476 - separate options from the source file name (by calling 'separate_options'),
1515 1477 - open the source file,
... ... @@ -1562,31 +1524,28 @@ global define One
1562 1524 *** (3) Making the parser automaton.
1563 1525  
1564 1526  
1565   - In order to exemplify our discussion we will refer in the sequel to
1566   - the following (ambiguous) 'example grammar':
  1527 + In order to exemplify our discussion we will refer in the sequel to the following
  1528 + (ambiguous) 'example grammar':
1567 1529  
1568 1530 S -> A
1569 1531 A ->
1570 1532 A -> a
1571 1533 A -> AA
1572 1534  
1573   - Notice that this grammar produces all sequences of a's, including
1574   - the empty sequence. It is ambiguous since for example the sequence
1575   - aaa may 'reduce' to S (or 'be derived' from S) in at least two
1576   - ways:
  1535 + Notice that this grammar produces all sequences of a's, including the empty
  1536 + sequence. It is ambiguous since for example the sequence aaa may 'reduce' to S (or 'be
  1537 + derived' from S) in at least two ways:
1577 1538  
1578 1539 S -> A -> AA -> AAA -> AAa -> Aaa -> aaa
1579 1540 S -> A -> AA -> Aa -> AAa -> Aaa -> aaa
1580 1541  
1581   - even if we use only 'rightmost' derivations, which means that when
1582   - we follow the arrows, the non terminal which is replaced is always
1583   - the rightmost one. It is the case above, as one may easily
1584   - check. In the first case the tree structure of our sequence is
1585   - a(aa), while in the second case, it is (aa)a.
  1542 + even if we use only 'rightmost' derivations, which means that when we follow the
  1543 + arrows, the non terminal which is replaced is always the rightmost one. It is the case
  1544 + above, as one may easily check. In the first case the tree structure of our sequence is
  1545 + a(aa), while in the second case, it is (aa)a.
1586 1546  
1587   - The automaton will realize the first of our two derivations above
1588   - as follows (the dot represents the current position of reading from
1589   - the input):
  1547 + The automaton will realize the first of our two derivations above as follows (the dot
  1548 + represents the current position of reading from the input):
1590 1549  
1591 1550 .aaa shift
1592 1551 a.aa reduce using rule A -> a
... ... @@ -1612,26 +1571,24 @@ global define One
1612 1571 A. reduce using rule S -> A (accept)
1613 1572 S.
1614 1573  
1615   - The ambiguity is realized here by the choice we have in the
1616   - situation:
  1574 + The ambiguity is realized here by the choice we have in the situation:
1617 1575  
1618 1576 AA.a
1619 1577  
1620   - We may either reduce using rule A -> AA or shift.
  1578 + We may either reduce using rule A -> AA or shift.
1621 1579  
1622   - However, this grammar is much more ambiguous than this. We could
1623   - for example have the following sequence:
  1580 + However, this grammar is much more ambiguous than this. We could for example have the
  1581 + following sequence:
1624 1582  
1625 1583 AA.a reduce using rule A ->
1626 1584 AAA.a reduce using rule A -> AA
1627 1585 AA.a
1628 1586  
1629   - which is obviously undesirable. In other words, our grammar has not
1630   - only a shift/reduce conflict, but at least one reduce/reduce
1631   - conflict.
  1587 + which is obviously undesirable. In other words, our grammar has not only a shift/reduce
  1588 + conflict, but at least one reduce/reduce conflict.
1632 1589  
1633   - If we want to produce the same language (all the sequences of a's)
1634   - with a non ambiguous grammar, we should use this one:
  1590 + If we want to produce the same language (all the sequences of a's) with a non ambiguous
  1591 + grammar, we should use this one:
1635 1592  
1636 1593 S -> A
1637 1594 A ->
... ... @@ -1652,18 +1609,16 @@ global define One
1652 1609  
1653 1610 *** (3.1) Computing 'First'.
1654 1611  
1655   - Any symbol in a grammar represents a set of sequences of tokens,
1656   - namely all sequences of tokens which reduce to this symbol. We also
1657   - say that such a sequence is derived from the symbol, or that it is
1658   - an 'instance' of the symbol.
  1612 + Any symbol in a grammar represents a set of sequences of tokens, namely all sequences
  1613 + of tokens which reduce to this symbol. We also say that such a sequence is derived from
  1614 + the symbol, or that it is an 'instance' of the symbol.
1659 1615  
1660   - To any symbol we associate a finite set of 'extended tokens'. Here
1661   - an extended token is either 'e' (representing the absence of a
1662   - token) or a normal token, or the end marker '$'.
  1616 + To any symbol we associate a finite set of 'extended tokens'. Here an extended token is
  1617 + either 'e' (representing the absence of a token) or a normal token, or the end marker
  1618 + '$'.
1663 1619  
1664   - By definition, 'First(X)' is the set of all tokens which may come
1665   - first in an instance of 'X', plus 'e' if the empty sequence is an
1666   - instance of 'X'.
  1620 + By definition, 'First(X)' is the set of all tokens which may come first in an instance
  1621 + of 'X', plus 'e' if the empty sequence is an instance of 'X'.
1667 1622  
1668 1623 For our example grammar, we have:
1669 1624  
... ... @@ -1680,10 +1635,9 @@ type ExToken:
1680 1635 dollar. // the end marker
1681 1636  
1682 1637  
1683   - However, computing 'First' in general is not so easy. This is
1684   - a saturation process. The main work is to compute 'First' for non
1685   - terminals, since it is trivial for tokens. Here is how we can do
1686   - this.
  1638 + However, computing 'First' in general is not so easy. This is a saturation process. The
  1639 + main work is to compute 'First' for non terminals, since it is trivial for tokens. Here
  1640 + is how we can do this.
1687 1641  
1688 1642 (1) to each non terminal associate the empty list, i.e put
1689 1643 First(A) = [ ].
... ... @@ -1700,11 +1654,11 @@ type ExToken:
1700 1654 - e is not in First(B) then add all of First(B)
1701 1655 to First(A).
1702 1656  
1703   - Of course, productions are added to the grammar only for computing
1704   - 'First', not for any other computation.
  1657 + Of course, productions are added to the grammar only for computing 'First', not for any
  1658 + other computation.
1705 1659  
1706   - We also need to compute 'First(X_1...X_k) for any sequence of
1707   - symbols. This is done by induction on k:
  1660 + We also need to compute 'First(X_1...X_k) for any sequence of symbols. This is done by
  1661 + induction on k:
1708 1662  
1709 1663 First() = [e]
1710 1664 First(X_1...X_k) =
... ... @@ -1712,8 +1666,8 @@ type ExToken:
1712 1666 - else First(X_1).
1713 1667  
1714 1668  
1715   - In practice, we compute only what we call a 'first function', which
1716   - is an association list:
  1669 + In practice, we compute only what we call a 'first function', which is an association
  1670 + list:
1717 1671  
1718 1672 [
1719 1673 (A,[...]),
... ... @@ -1721,12 +1675,12 @@ type ExToken:
1721 1675 ...
1722 1676 ]
1723 1677  
1724   - of type List((String,List(ExToken))), where 'A', 'B',... are the
1725   - non terminals, and [...] the list of extended tokens which may come
1726   - first in an instance of the corresponding non terminal.
  1678 + of type List((String,List(ExToken))), where 'A', 'B',... are the non terminals, and
  1679 + [...] the list of extended tokens which may come first in an instance of the
  1680 + corresponding non terminal.
1727 1681  
1728   - The next function computes: (l1 -[e]) union l2. However,
1729   - 'e' may belong to l2, and in that case will belong to the result.
  1682 + The next function computes: (l1 -[e]) union l2. However, 'e' may belong to l2, and in
  1683 + that case will belong to the result.
1730 1684  
1731 1685 define List(ExToken)
1732 1686 merge_except_empty
... ... @@ -1746,8 +1700,8 @@ define List(ExToken)
1746 1700 }.
1747 1701  
1748 1702  
1749   - We will need to convert an extended token to a grammar symbol. 'e'
1750   - should never be converted.
  1703 + We will need to convert an extended token to a grammar symbol. 'e' should never be
  1704 + converted.
1751 1705  
1752 1706 define Symbol
1753 1707 to_symbol
... ... @@ -1762,8 +1716,8 @@ define Symbol
1762 1716 }.
1763 1717  
1764 1718  
1765   - The function below, constructs the initial stage of our 'first
1766   - function'. In this stage all lists of tokens are empty.
  1719 + The function below, constructs the initial stage of our 'first function'. In this stage
  1720 + all lists of tokens are empty.
1767 1721  
1768 1722 define List((String,List(ExToken)))
1769 1723 initial_stage
... ... @@ -1777,9 +1731,8 @@ define List((String,List(ExToken)))
1777 1731 }.
1778 1732  
1779 1733  
1780   - We will also need to find the value of a non terminal (given by its
1781   - name) in our 'first function'. This search should always be
1782   - successful.
  1734 + We will also need to find the value of a non terminal (given by its name) in our 'first
  1735 + function'. This search should always be successful.
1783 1736  
1784 1737 define List(ExToken)
1785 1738 first
... ... @@ -1814,8 +1767,7 @@ define List(ExToken)
1814 1767 }.
1815 1768  
1816 1769  
1817   - Finally, we may compute 'First(u)' for any sequence of grammar
1818   - symbols 'u'.
  1770 + Finally, we may compute 'First(u)' for any sequence of grammar symbols 'u'.
1819 1771  
1820 1772 define List(ExToken)
1821 1773 first
... ... @@ -1837,11 +1789,10 @@ define List(ExToken)
1837 1789  
1838 1790  
1839 1791  
1840   - The following function adds a token to a set of tokens in a 'first
1841   - function'. It is given the extended token 'x' to be added, the
1842   - name of the non terminal under which it should be added, and the 'first
1843   - function' into which this operation should be performed. The
1844   - grammar is not used, but must be transmitted via terminal calls.
  1792 + The following function adds a token to a set of tokens in a 'first function'. It is
  1793 + given the extended token 'x' to be added, the name of the non terminal under which it
  1794 + should be added, and the 'first function' into which this operation should be
  1795 + performed. The grammar is not used, but must be transmitted via terminal calls.
1845 1796  
1846 1797 define (List((String,List(ExToken))),List(GrammarRule))
1847 1798 add
... ... @@ -1864,8 +1815,7 @@ define (List((String,List(ExToken))),List(GrammarRule))
1864 1815 }.
1865 1816  
1866 1817  
1867   - The next function tests if a given non terminal may represent the
1868   - empty sequence.
  1818 + The next function tests if a given non terminal may represent the empty sequence.
1869 1819  
1870 1820 define Bool
1871 1821 may_be_empty
... ... @@ -1876,8 +1826,8 @@ define Bool
1876 1826 member(empty,first(name,f)).
1877 1827  
1878 1828  
1879   - The following function adds all elements of a set of extended
1880   - tokens to a 'first list' in a given 'first function'.
  1829 + The following function adds all elements of a set of extended tokens to a 'first list'
  1830 + in a given 'first function'.
1881 1831  
1882 1832 define (List((String,List(ExToken))),List(GrammarRule))
1883 1833 add_all_of
... ... @@ -1919,8 +1869,8 @@ define (List((String,List(ExToken))),List(GrammarRule))
1919 1869 }.
1920 1870  
1921 1871  
1922   - The following function works out one grammar rule for the addition
1923   - of elements to 'First lists'.
  1872 + The following function works out one grammar rule for the addition of elements to
  1873 + 'First lists'.
1924 1874  
1925 1875 define (List((String,List(ExToken))),List(GrammarRule))
1926 1876 first_work_rule
... ... @@ -1950,10 +1900,9 @@ define (List((String,List(ExToken))),List(GrammarRule))
1950 1900 }.
1951 1901  
1952 1902  
1953   - The next function makes one step of completion of first sets (only
1954   - for non terminals), making one action for each rule in the
1955   - grammar. We need to return both the 'first function' 'f' and the
1956   - grammar, because they change during the process.
  1903 + The next function makes one step of completion of first sets (only for non terminals),
  1904 + making one action for each rule in the grammar. We need to return both the 'first
  1905 + function' 'f' and the grammar, because they change during the process.
1957 1906  
1958 1907 define (List((String,List(ExToken))),List(GrammarRule))
1959 1908 first_one_step
... ... @@ -1971,7 +1920,7 @@ define (List((String,List(ExToken))),List(GrammarRule))
1971 1920 }.
1972 1921  
1973 1922  
1974   - The next function saturates a 'first function'.
  1923 + The next function saturates a 'first function'.
1975 1924  
1976 1925 define (List((String,List(ExToken))),List(GrammarRule),Int32)
1977 1926 saturate_first
... ... @@ -1988,8 +1937,7 @@ define (List((String,List(ExToken))),List(GrammarRule),Int32)
1988 1937 else saturate_first(f_new,l_new,count+1).
1989 1938  
1990 1939  
1991   - We need to extract the list of all non terminals from the
1992   - grammar.
  1940 + We need to extract the list of all non terminals from the grammar.
1993 1941  
1994 1942 define List(String)
1995 1943 non_terminals
... ... @@ -2020,8 +1968,7 @@ define List(String)
2020 1968  
2021 1969  
2022 1970  
2023   - Here is the function which computes the 'first function' associated
2024   - to a given grammar.
  1971 + Here is the function which computes the 'first function' associated to a given grammar.
2025 1972  
2026 1973 define (List((String,List(ExToken))),Int32)
2027 1974 first_function
... ... @@ -2041,36 +1988,32 @@ define (List((String,List(ExToken))),Int32)
2041 1988  
2042 1989 *** (3.2) Scenarii.
2043 1990  
2044   - As we saw previously, reductions using a grammar rule, occur only
2045   - on top of stack. If the stack (as far as grammar symbols are
2046   - concerned) is:
  1991 + As we saw previously, reductions using a grammar rule, occur only on top of stack. If
  1992 + the stack (as far as grammar symbols are concerned) is:
2047 1993  
2048 1994 ... u
2049 1995  
2050   - i.e. if it ends by u (a sequence of grammar symbols), and if there
2051   - is a production of the form:
  1996 + i.e. if it ends by u (a sequence of grammar symbols), and if there is a production of
  1997 + the form:
2052 1998  
2053 1999 A -> uv
2054 2000  
2055   - then it is possible that after having read an instance of v, we
2056   - reduce using that rule. Furthermore, the automaton is able to look
2057   - at the next token to be read (it has one token of
2058   - 'lookahead'). This helps to make decisions, as we will see later,
2059   - using precedence and association rules. In particular, the
2060   - automaton knows which token is allowded as the lookahead for a
2061   - given reduction.
  2001 + then it is possible that after having read an instance of v, we reduce using that
  2002 + rule. Furthermore, the automaton is able to look at the next token to be read (it has
  2003 + one token of 'lookahead'). This helps to make decisions, as we will see later, using
  2004 + precedence and association rules. In particular, the automaton knows which token is
  2005 + allowded as the lookahead for a given reduction.
2062 2006  
2063   - Hence, we introduce the notion of a scenario. A 'scenario' is a pair,
2064   - denoted (in these explanations):
  2007 + Hence, we introduce the notion of a scenario. A 'scenario' is a pair, denoted (in these
  2008 + explanations):
2065 2009  
2066 2010 (A ->u.v , (a_1,...,a_k))
2067 2011  
2068   - where A -> uv is a production (whose right hand side has been split
2069   - into two parts u and v, separated by a dot, where u and/or v may be
2070   - empty), and where (a_1,...,a_k) is a non empty set of tokens.
  2012 + where A -> uv is a production (whose right hand side has been split into two parts u
  2013 + and v, separated by a dot, where u and/or v may be empty), and where (a_1,...,a_k) is a
  2014 + non empty set of tokens.
2071 2015  
2072   - In the case of our example grammar, here are all the possible
2073   - left part of scenarii:
  2016 + In the case of our example grammar, here are all the possible left part of scenarii:
2074 2017  
2075 2018 S -> .A
2076 2019 S -> A.
... ... @@ -2081,35 +2024,31 @@ define (List((String,List(ExToken))),Int32)
2081 2024 A -> A.A
2082 2025 A -> AA.
2083 2026  
2084   - That a scenario (A -> u.v, E) is 'possible' in some state s means
2085   - that the top of stack is described by u (one slot for one symbol),
2086   - and that reduction using the given grammar rule may occur if the
2087   - lookahead token (at the time the reduction takes place) belongs to
2088   - 'E'.
  2027 + That a scenario (A -> u.v, E) is 'possible' in some state s means that the top of stack
  2028 + is described by u (one slot for one symbol), and that reduction using the given grammar
  2029 + rule may occur if the lookahead token (at the time the reduction takes place) belongs
  2030 + to 'E'.
2089 2031  
2090   - It is clear that, the grammar being given as a finite set of rules
2091   - (and a finite sets of tokens), there is only a finite number of
2092   - scenarii.
  2032 + It is clear that, the grammar being given as a finite set of rules (and a finite sets
  2033 + of tokens), there is only a finite number of scenarii.
2093 2034  
2094 2035 Two scenarii:
2095 2036  
2096 2037 (A -> u.v , E)
2097 2038 (B -> w.t , F)
2098 2039  
2099   - are called 'compatible' if either u is a postfix of w, or w a
2100   - postfix of u. This simply means that there exists a stack for which
2101   - the two scenarii are possible. The top of that stack must have
2102   - the longuest of u and w on its top.
  2040 + are called 'compatible' if either u is a postfix of w, or w a postfix of u. This simply
  2041 + means that there exists a stack for which the two scenarii are possible. The top of
  2042 + that stack must have the longuest of u and w on its top.
2103 2043  
2104 2044 Two scenarii:
2105 2045  
2106 2046 (A -> u.v , E)
2107 2047 (A -> u.v , F)
2108 2048  
2109   - are called 'similar' if they have the same left part (same
2110   - production splitted at the same place). They differ only by the
2111   - sets of tokens E and F. Two such scenarii may be joined together into
2112   - the unique scenario:
  2049 + are called 'similar' if they have the same left part (same production splitted at the
  2050 + same place). They differ only by the sets of tokens E and F. Two such scenarii may be
  2051 + joined together into the unique scenario:
2113 2052  
2114 2053 (A -> u.v , G)
2115 2054  
... ... @@ -2125,9 +2064,9 @@ type Scenario:
2125 2064 List(ExToken), // E
2126 2065 Maybe(Int32)). // precedence level of grammar rule
2127 2066  
2128   - 'u' is stored in reverse order, because the most common operation
2129   - is to kake the head of 'v' and put it in front of 'u', so that the
2130   - dot in the scenario advances past one grammar symbol.
  2067 + 'u' is stored in reverse order, because the most common operation is to kake the head
  2068 + of 'v' and put it in front of 'u', so that the dot in the scenario advances past one
  2069 + grammar symbol.
2131 2070  
2132 2071  
2133 2072  
... ... @@ -2137,30 +2076,25 @@ type Scenario:
2137 2076  
2138 2077 *** (3.3) States.
2139 2078  
2140   - A state of our automaton is a finite set of two by two compatible
2141   - scenarii, which does not contain any two similar
2142   - scenarii. Intuitively, the scenarii in a state are simply those
2143   - which are still possible in this state.
  2079 + A state of our automaton is a finite set of two by two compatible scenarii, which does
  2080 + not contain any two similar scenarii. Intuitively, the scenarii in a state are simply
  2081 + those which are still possible in this state.
2144 2082  
2145   - The 'core' of a state is what remains if we ignore
2146   - lookaheads. States which do not differ by the core are called
2147   - 'similar'.
  2083 + The 'core' of a state is what remains if we ignore lookaheads. States which do not
  2084 + differ by the core are called 'similar'.
2148 2085  
2149   - Could'nt we consider similar states as equivalent ? The answer is
2150   - no in theory. But the difference of behavior of the automaton in similar
2151   - states is negligible in practice. This is the reason why we will
2152   - identify similar states (merging lists of lookahead for similar
2153   - scenarii).
  2086 + Could'nt we consider similar states as equivalent ? The answer is no in theory. But the
  2087 + difference of behavior of the automaton in similar states is negligible in
  2088 + practice. This is the reason why we will identify similar states (merging lists of
  2089 + lookahead for similar scenarii).
2154 2090  
2155   - But let's see what the difference is really. Clearly, since similar
2156   - states differ only by the lookaheads, the same shift and/or
2157   - reduces may arise. The difference is only in the decision to make
2158   - in case of a conflict. However, since the user has plenty of tools
2159   - to influence such decisions, there is no need to make any
2160   - distinction between similar states.
  2091 + But let's see what the difference is really. Clearly, since similar states differ only
  2092 + by the lookaheads, the same shift and/or reduces may arise. The difference is only in
  2093 + the decision to make in case of a conflict. However, since the user has plenty of tools
  2094 + to influence such decisions, there is no need to make any distinction between similar
  2095 + states.
2161 2096  
2162   - Of course we represent states (up to a certain point) using the
2163   - type 'List(Scenario)'.
  2097 + Of course we represent states (up to a certain point) using the type 'List(Scenario)'.
2164 2098  
2165 2099  
2166 2100  
... ... @@ -2185,9 +2119,8 @@ define Bool
2185 2119 (n1,u1,v1) = (n2,u2,v2).
2186 2120  
2187 2121  
2188   - The next function takes a scenario 's' and a state, and returns this
2189   - state from which an eventual scenario similar to 's' has been
2190   - dropped.
  2122 + The next function takes a scenario 's' and a state, and returns this state from which
  2123 + an eventual scenario similar to 's' has been dropped.
2191 2124  
2192 2125 define Maybe(List(Scenario))
2193 2126 drop_similar
... ... @@ -2229,9 +2162,8 @@ define Bool
2229 2162 }.
2230 2163  
2231 2164  
2232   - The next function tests if a list if scenarii contains only
2233   - scenarii with the splitting dot at the left end (i.e. in front of
2234   - the right member of the rule).
  2165 + The next function tests if a list if scenarii contains only scenarii with the splitting
  2166 + dot at the left end (i.e. in front of the right member of the rule).
2235 2167  
2236 2168 define Bool
2237 2169 has_only_front_dots
... ... @@ -2251,9 +2183,8 @@ define Bool
2251 2183 }.
2252 2184  
2253 2185  
2254   - The next function tests if a given non saturated state has a
2255   - saturated version similar to some saturated state. It does this
2256   - without saturating the first state.
  2186 + The next function tests if a given non saturated state has a saturated version similar
  2187 + to some saturated state. It does this without saturating the first state.
2257 2188  
2258 2189 define Bool
2259 2190 saturated_is_similar
... ... @@ -2286,18 +2217,17 @@ define Bool
2286 2217  
2287 2218 (A -> u.Bv , E)
2288 2219  
2289   - (where B is a non terminal), it is possible that the next sequence
2290   - of tokens to be read matches B. This means that, if B -> w is any
2291   - B-production, the scenario
  2220 + (where B is a non terminal), it is possible that the next sequence of tokens to be read
  2221 + matches B. This means that, if B -> w is any B-production, the scenario
2292 2222  
2293 2223 (B -> .w, ?)
2294 2224  
2295   - should also be possible in the same state. Now, what are the
2296   - acceptable lookaheads for this scenario ? They are obviously all
2297   - the tokens which may begin an instance of va, for any a in E.
  2225 + should also be possible in the same state. Now, what are the acceptable lookaheads for
  2226 + this scenario ? They are obviously all the tokens which may begin an instance of va,
  2227 + for any a in E.
2298 2228  
2299   - This remark provides a procedure for 'saturating' states. A state
2300   - is 'saturated' if whenever it contains:
  2229 + This remark provides a procedure for 'saturating' states. A state is 'saturated' if
  2230 + whenever it contains:
2301 2231  
2302 2232 (A -> u.Bv , (a_1,...,a_k))
2303 2233  
... ... @@ -2308,8 +2238,8 @@ define Bool
2308 2238  
2309 2239 for all B-productions B -> w.
2310 2240  
2311   - In the sequel, we will compute saturated states, but states are
2312   - often more conveniently represented by their non saturated version.
  2241 + In the sequel, we will compute saturated states, but states are often more conveniently
  2242 + represented by their non saturated version.
2313 2243  
2314 2244  
2315 2245 Below is a function which computes union First(va_i):
... ... @@ -2329,11 +2259,10 @@ define List(ExToken)
2329 2259 }.
2330 2260  
2331 2261  
2332   - The next function tests if a given state is similar to some state
2333   - in a given list of states. This is needed for our saturation
2334   - process, because we must not add to a state a scenario which
2335   - already belongs (maybe in a similar form) to that state. Otherwise,
2336   - our process would never end.
  2262 + The next function tests if a given state is similar to some state in a given list of
  2263 + states. This is needed for our saturation process, because we must not add to a state a
  2264 + scenario which already belongs (maybe in a similar form) to that state. Otherwise, our
  2265 + process would never end.
2337 2266  
2338 2267 define Bool
2339 2268 already_present
... ... @@ -2351,10 +2280,9 @@ define Bool
2351 2280 }.
2352 2281  
2353 2282  
2354   - The next function is given a (new) scenario to be inserted
2355   - into a list of scenarii. If this list contains a similar scenario,
2356   - the new scenario is just merged to that one. Otherwise, it is
2357   - simply added to the list.
  2283 + The next function is given a (new) scenario to be inserted into a list of scenarii. If
  2284 + this list contains a similar scenario, the new scenario is just merged to that
  2285 + one. Otherwise, it is simply added to the list.
2358 2286  
2359 2287 define List(Scenario)
2360 2288 insert_scenario
... ... @@ -2375,8 +2303,8 @@ define List(Scenario)
2375 2303 }.
2376 2304  
2377 2305  
2378   - The next function extracts the symbols from the right hand side of
2379   - a grammar rule (dropping the 'term' part).
  2306 + The next function extracts the symbols from the right hand side of a grammar rule
  2307 + (dropping the 'term' part).
2380 2308  
2381 2309 define List(Symbol)
2382 2310 symbols
... ... @@ -2391,9 +2319,8 @@ define List(Symbol)
2391 2319 }.
2392 2320  
2393 2321  
2394   - The following function adds to a given state 's', all the scenarii
2395   - of the form (B -> .w , F), for all B-productions. The set of
2396   - lookaheads F is given.
  2322 + The following function adds to a given state 's', all the scenarii of the form (B -> .w
  2323 + , F), for all B-productions. The set of lookaheads F is given.
2397 2324  
2398 2325 define List(Scenario)
2399 2326 add_scenarii
... ... @@ -2429,24 +2356,20 @@ define List(Scenario)
2429 2356  
2430 2357  
2431 2358  
2432   - The next function performs one step in the saturation of a
2433   - state. This step consists in a loop on all scenarii in the
2434   - state. The list l is the list of scenarii which have not yet been
2435   - used for saturation, while 'all' is the set of all known scenarii
2436   - in the state at any time.
  2359 + The next function performs one step in the saturation of a state. This step consists in
  2360 + a loop on all scenarii in the state. The list l is the list of scenarii which have not
  2361 + yet been used for saturation, while 'all' is the set of all known scenarii in the state
  2362 + at any time.
2437 2363  
2438   - For each scenario ('sc1' below), of the form (A -> u.v , E), we
2439   - first check the form of 'v'. If 'v' is empty the scenario does not
2440   - participate to saturation, and we just re-enter the loop with the
2441   - tail of 'l' instead of 'l'.
  2364 + For each scenario ('sc1' below), of the form (A -> u.v , E), we first check the form of
  2365 + 'v'. If 'v' is empty the scenario does not participate to saturation, and we just
  2366 + re-enter the loop with the tail of 'l' instead of 'l'.
2442 2367  
2443   - If 'v' is not empty, it has a first symbol ('_B' below). This _B
2444   - cannot be a $. If it is a token, the scenario does not participate
2445   - to saturation, like above.
  2368 + If 'v' is not empty, it has a first symbol ('_B' below). This _B cannot be a $. If it
  2369 + is a token, the scenario does not participate to saturation, like above.
2446 2370  
2447   - Now, if _B is a non terminal, we add to 'all' all the scenarii
2448   - derived by the previous function from B-productions, and we continue
2449   - our loop.
  2371 + Now, if _B is a non terminal, we add to 'all' all the scenarii derived by the previous
  2372 + function from B-productions, and we continue our loop.
2450 2373  
2451 2374 define List(Scenario)
2452 2375 saturate_state_one_step
... ... @@ -2487,8 +2410,8 @@ define List(Scenario)
2487 2410 }.
2488 2411  
2489 2412  
2490   - Now, saturating a state is just performing saturation steps until a
2491   - step does not change the state any more.
  2413 + Now, saturating a state is just performing saturation steps until a step does not
  2414 + change the state any more.
2492 2415  
2493 2416 define List(Scenario)
2494 2417 saturate_state
... ... @@ -2511,69 +2434,56 @@ define List(Scenario)
2511 2434  
2512 2435 *** (3.6) The initial state.
2513 2436  
2514   - The non terminal S represents the totality of what we want to read
2515   - from the input. More precisely, if the input is correct, it is an
2516   - instance of S. Hence, since there is only one S-production S -> A,
2517   - our reading (if successful) will end by a reduction via this rule,
2518   - and it will be correct if and only if the lookahead token is the
2519   - end marker: $.
  2437 + The non terminal S represents the totality of what we want to read from the input. More
  2438 + precisely, if the input is correct, it is an instance of S. Hence, since there is only
  2439 + one S-production S -> A, our reading (if successful) will end by a reduction via this
  2440 + rule, and it will be correct if and only if the lookahead token is the end marker: $.
2520 2441  
2521   - Hence, at the beginning, there is obviously one and only one
2522   - wanted scenario, which is:
  2442 + Hence, at the beginning, there is obviously one and only one wanted scenario, which is:
2523 2443  
2524 2444 (S -> .A , ($))
2525 2445  
2526   - This scenario (which will be called the 'initial scenario') needs
2527   - to belong to the initial state. In fact, the initial state is
2528   - simply the smallest saturated state which contains this
2529   - scenario. In the case of our example, this saturated state will be
2530   - (after two steps of saturation):
  2446 + This scenario (which will be called the 'initial scenario') needs to belong to the
  2447 + initial state. In fact, the initial state is simply the smallest saturated state which
  2448 + contains this scenario. In the case of our example, this saturated state will be (after
  2449 + two steps of saturation):
2531 2450  
2532 2451 (S -> .A , ($))
2533 2452 (A -> . , (a,$))
2534 2453 (A -> .a , (a,$))
2535 2454 (A -> .AA , (a,$))
2536 2455  
2537   - Note that the rule S -> A appears only one time in the
2538   - initial state since the state saturation process cannot produce a
2539   - scenario using this rule.
2540   -
2541   - Now the state generation process will produce a state with the
2542   - scenario (S -> A. , ($)). Obviously, we cannot have other scenarii
2543   - using this rule.
2544   -
2545   - The state which contains the scenario (S -> A. , ($)) is our
2546   - 'accepting state'. Indeed, the input has been read entirely only
2547   - when we are on the point to reduce using this scenario. In that
2548   - case the next token to be read is the end marker, and we 'accept'
2549   - the input.
2550   -
2551   - However, we may have a reduce/reduce conflict with this
2552   - scenario. It is the case in our example grammar. Indeed, in state
2553   - 2 (see below), and if the next token to be read is the end marker,
2554   - we may either reduce using the scenario (S -> A. , ($)) or the
2555   - scenario (A -> . , (a,$)). Notice that it is not possible to have a
2556   - shift/reduce conflict with scenario (S -> A. ,($)), because the
2557   - token '$' cannot be shifted (it cannot appear in the right member
2558   - of a rule).
2559   -
2560   - Of course the user cannot choose between these two reductions
2561   - because he does'nt know about the existence of rule S -> A.
2562   -
2563   - Nevertheless, in that case, we avoid the conflict by reducing
2564   - systematically using rule (S -> A. , ($)). This may be justified as
2565   - follows.
2566   -
2567   - The initial state contains the initial scenario, and scenarii
2568   - obtained by saturation, i.e. with the dot in front of the right
2569   - member. Hence the accepting state may only contain the accepting
2570   - scenario, scenarii of the form (? -> A.? , ?) (because we make a
2571   - transition on A between the two states), and scenarii with the
2572   - dot in front of the right member. Hence all scenarii in the
2573   - accepting state have at most one symbol on the left of the
2574   - dot. This means that if a reduce/reduce conflict arises between the
2575   - accepting scenario and another scenario, this other scenario is
2576   - either of the form:
  2456 + Note that the rule S -> A appears only one time in the initial state since the state
  2457 + saturation process cannot produce a scenario using this rule.
  2458 +
  2459 + Now the state generation process will produce a state with the scenario (S -> A. ,
  2460 + ($)). Obviously, we cannot have other scenarii using this rule.
  2461 +
  2462 + The state which contains the scenario (S -> A. , ($)) is our 'accepting state'. Indeed,
  2463 + the input has been read entirely only when we are on the point to reduce using this
  2464 + scenario. In that case the next token to be read is the end marker, and we 'accept' the
  2465 + input.
  2466 +
  2467 + However, we may have a reduce/reduce conflict with this scenario. It is the case in our
  2468 + example grammar. Indeed, in state 2 (see below), and if the next token to be read is
  2469 + the end marker, we may either reduce using the scenario (S -> A. , ($)) or the scenario
  2470 + (A -> . , (a,$)). Notice that it is not possible to have a shift/reduce conflict with
  2471 + scenario (S -> A. ,($)), because the token '$' cannot be shifted (it cannot appear in
  2472 + the right member of a rule).
  2473 +
  2474 + Of course the user cannot choose between these two reductions because he does'nt know
  2475 + about the existence of rule S -> A.
  2476 +
  2477 + Nevertheless, in that case, we avoid the conflict by reducing systematically using rule
  2478 + (S -> A. , ($)). This may be justified as follows.
  2479 +
  2480 + The initial state contains the initial scenario, and scenarii obtained by saturation,
  2481 + i.e. with the dot in front of the right member. Hence the accepting state may only
  2482 + contain the accepting scenario, scenarii of the form (? -> A.? , ?) (because we make a
  2483 + transition on A between the two states), and scenarii with the dot in front of the
  2484 + right member. Hence all scenarii in the accepting state have at most one symbol on the
  2485 + left of the dot. This means that if a reduce/reduce conflict arises between the
  2486 + accepting scenario and another scenario, this other scenario is either of the form:
2577 2487  
2578 2488 (B -> . , ($ ...))
2579 2489  
... ... @@ -2584,9 +2494,9 @@ define List(Scenario)
2584 2494 In the first case, ???
2585 2495  
2586 2496  
2587   - The following function constructs the non saturated initial state
2588   - for a given grammar. It simply looks for the unique S-production,
2589   - and constructs state 0 containing the unique initial scenario.
  2497 + The following function constructs the non saturated initial state for a given
  2498 + grammar. It simply looks for the unique S-production, and constructs state 0 containing
  2499 + the unique initial scenario.
2590 2500  
2591 2501 define List(Scenario)
2592 2502 initial_state
... ... @@ -2611,20 +2521,18 @@ define List(Scenario)
2611 2521  
2612 2522 *** (3.7) Transitions.
2613 2523  
2614   - Of course our automaton has transitions. It has two kinds of
2615   - transitions: those which result from the reading of a token, and
2616   - those which result from the reduction via a rule, after a sequence
2617   - of tokens has been read which is an instance of the right side of
2618   - this rule. The first ones are labelled by tokens, while the others
2619   - are labelled by non terminals.
  2524 + Of course our automaton has transitions. It has two kinds of transitions: those which
  2525 + result from the reading of a token, and those which result from the reduction via a
  2526 + rule, after a sequence of tokens has been read which is an instance of the right side
  2527 + of this rule. The first ones are labelled by tokens, while the others are labelled by
  2528 + non terminals.
2620 2529  
2621 2530 If in some state, we have the scenario:
2622 2531  
2623 2532 (A -> u.av , E)
2624 2533  
2625   - (where 'a' is a token) then, if the next token to be read is 'a',
2626   - it is clear that the transition will be performed to a state
2627   - containing the scenario:
  2534 + (where 'a' is a token) then, if the next token to be read is 'a', it is clear that the
  2535 + transition will be performed to a state containing the scenario:
2628 2536  
2629 2537 (A -> ua.v , E)
2630 2538  
... ... @@ -2634,15 +2542,14 @@ define List(Scenario)
2634 2542  
2635 2543 (A -> u.Bv , E)
2636 2544  
2637   - and if, after reading some tokens, we reduce via this B-production and
2638   - return to this state, we will have to make a transition to a state
2639   - containing:
  2545 + and if, after reading some tokens, we reduce via this B-production and return to this
  2546 + state, we will have to make a transition to a state containing:
2640 2547  
2641 2548 (A -> uB.v , E)
2642 2549  
2643 2550 (E again unchanged).
2644 2551  
2645   - All our transitions will occur in one of these two situations.
  2552 + All our transitions will occur in one of these two situations.
2646 2553  
2647 2554  
2648 2555  
... ... @@ -2654,11 +2561,12 @@ define List(Scenario)
2654 2561  
2655 2562 *** (3.8) Generating the states.
2656 2563  
2657   - Which states do we needs ? We need the initial state, and all the
2658   - states which are reachable from it via one of the two above kinds
2659   - of transitions. This gives the method for generating states.
  2564 + Which states do we needs ? We need the initial state, and all the states which are
  2565 + reachable from it via one of the two above kinds of transitions. This gives the method
  2566 + for generating states.
2660 2567  
2661 2568 (1) when creating a new state, saturate it,
  2569 +
2662 2570 (2) for each symbol for which there are scenarii in the state with
2663 2571 this symbol after the dot, construct the state needed for the
2664 2572 corresponding transition.
... ... @@ -2728,8 +2636,8 @@ define List(Scenario)
2728 2636 *** (3.9) Making the automaton.
2729 2637  
2730 2638  
2731   - The following function takes a scenario (A -> u.Xv , E), where X is
2732   - any grammar symbol, and a list of lists of scenarii of the form:
  2639 + The following function takes a scenario (A -> u.Xv , E), where X is any grammar symbol,
  2640 + and a list of lists of scenarii of the form:
2733 2641  
2734 2642 [
2735 2643 [
... ... @@ -2740,17 +2648,14 @@ define List(Scenario)
2740 2648 ...
2741 2649 ]
2742 2650  
2743   - i.e. such that in each list (called a 'class'), the scenarii (? ->
2744   - u.? , ?) have the same symbol as the last one in 'u' (i.e. the
2745   - first one in our representation, since 'u' is stored in reverse
2746   - order). The class above is said ''corresponding to Y''.
  2651 + i.e. such that in each list (called a 'class'), the scenarii (? -> u.? , ?) have the
  2652 + same symbol as the last one in 'u' (i.e. the first one in our representation, since 'u'
  2653 + is stored in reverse order). The class above is said ''corresponding to Y''.
2747 2654  
2748   - The function looks for a class corresponding to X. If it exists the
2749   - scenario is added to this class, after its dot has been put past
2750   - X. Otherwise, it makes a new class.
  2655 + The function looks for a class corresponding to X. If it exists the scenario is added
  2656 + to this class, after its dot has been put past X. Otherwise, it makes a new class.
2751 2657  
2752   - If the scenario has no symbol after the dot, it is not classified
2753   - at all.
  2658 + If the scenario has no symbol after the dot, it is not classified at all.
2754 2659  
2755 2660 define List(List(Scenario))
2756 2661 classify
... ... @@ -2794,14 +2699,13 @@ define List(List(Scenario))
2794 2699  
2795 2700  
2796 2701  
2797   - The function 'next_states' takes a state 'state', and produces the
2798   - list of all states which may be reached from 'state' via a single
2799   - transition (either on shifting a token or after reduction to a non
2800   - terminal).
  2702 + The function 'next_states' takes a state 'state', and produces the list of all states
  2703 + which may be reached from 'state' via a single transition (either on shifting a token
  2704 + or after reduction to a non terminal).
2801 2705  
2802   - It works as follows. It partitions 'state' so that each element of
2803   - the partition has scenarii with the same symbol after the dot. Then
2804   - the dot is put past this symbol. For example, if 'state' is:
  2706 + It works as follows. It partitions 'state' so that each element of the partition has
  2707 + scenarii with the same symbol after the dot. Then the dot is put past this symbol. For
  2708 + example, if 'state' is:
2805 2709  
2806 2710 [
2807 2711 (A -> u.av , E)
... ... @@ -2822,10 +2726,10 @@ define List(List(Scenario))
2822 2726 ]
2823 2727  
2824 2728  
2825   - The next function takes a (non saturated) state, and computes the
2826   - list of all (non saturated) states which may be the target of a
2827   - transition (either on a token or on a non terminal) from that
2828   - state. It transforms a state into a set of classes like the above.
  2729 + The next function takes a (non saturated) state, and computes the list of all (non
  2730 + saturated) states which may be the target of a transition (either on a token or on a
  2731 + non terminal) from that state. It transforms a state into a set of classes like the
  2732 + above.
2829 2733  
2830 2734 define List(List(Scenario))
2831 2735 next_states
... ... @@ -2843,12 +2747,11 @@ define List(List(Scenario))
2843 2747  
2844 2748  
2845 2749  
2846   - Now, in order to compute our automaton (of type
2847   - 'List(List(Scenario))'), we must start with the initial non
2848   - saturated state and add 'next' states until no more state may be
2849   - added. Of course, we add states only if they are not already
2850   - present in the automaton. More presisely, if there is a similar
2851   - state in the automaton, we must merge those two states.
  2750 + Now, in order to compute our automaton (of type 'List(List(Scenario))'), we must start
  2751 + with the initial non saturated state and add 'next' states until no more state may be
  2752 + added. Of course, we add states only if they are not already present in the
  2753 + automaton. More presisely, if there is a similar state in the automaton, we must merge
  2754 + those two states.
2852 2755  
2853 2756 Here is how we merge states.
2854 2757  
... ... @@ -2909,8 +2812,7 @@ define List(List(Scenario))
2909 2812 }.
2910 2813  
2911 2814  
2912   - At each step of the construction of our automaton, we have two
2913   - lists:
  2815 + At each step of the construction of our automaton, we have two lists:
2914 2816  
2915 2817 - the list 'have_next' of those states for which next states
2916 2818 have been already constructed,
... ... @@ -2997,8 +2899,8 @@ define List(List(Scenario))
2997 2899  
2998 2900 *** (4.1) Numbering states and adding transitions lists.
2999 2901  
3000   - Now that our states are established, we need to rework them. Here
3001   - are the operations performed:
  2902 + Now that our states are established, we need to rework them. Here are the operations
  2903 + performed:
3002 2904  
3003 2905 - Put an identifying number on each state (beginning at 0)
3004 2906  
... ... @@ -3011,7 +2913,7 @@ type IntermediateState:
3011 2913 List((Symbol,Int32)) transitions).
3012 2914  
3013 2915  
3014   - The next function just add numbers identifying states.
  2916 + The next function just add numbers identifying states.
3015 2917  
3016 2918 define List(IntermediateState)
3017 2919 number
... ... @@ -3027,8 +2929,8 @@ define List(IntermediateState)
3027 2929 }.
3028 2930  
3029 2931  
3030   - The next function gives the number identifying a non saturated
3031   - state in a list of intermediate states.
  2932 + The next function gives the number identifying a non saturated state in a list of
  2933 + intermediate states.
3032 2934  
3033 2935 define Int32
3034 2936 find_id
... ... @@ -3047,11 +2949,10 @@ define Int32
3047 2949 }.
3048 2950  
3049 2951  
3050   - The next function takes a class (a list of scenarii with the same
3051   - grammar symbol Y before the dot) and an automaton in the form os a
3052   - list of intermediate states, and returns the pair (Y,n), where Y is the
3053   - previous grammar symbol and n the integer identifying that class in
3054   - the automaton.
  2952 + The next function takes a class (a list of scenarii with the same grammar symbol Y
  2953 + before the dot) and an automaton in the form os a list of intermediate states, and
  2954 + returns the pair (Y,n), where Y is the previous grammar symbol and n the integer
  2955 + identifying that class in the automaton.
3055 2956  
3056 2957  
3057 2958 define (Symbol,Int32)
... ... @@ -3075,10 +2976,9 @@ define (Symbol,Int32)
3075 2976  
3076 2977  
3077 2978  
3078   - The following function takes a partition of a state (in the form of
3079   - a list of classes), an automaton (in the form of a list of
3080   - intermediate states), and returns a list of pairs (X,n) saying ``if
3081   - transition is on X, then go to state n''.
  2979 + The following function takes a partition of a state (in the form of a list of classes),
  2980 + an automaton (in the form of a list of intermediate states), and returns a list of
  2981 + pairs (X,n) saying ``if transition is on X, then go to state n''.
3082 2982  
3083 2983 define List((Symbol,Int32))
3084 2984 make_transitions
... ... @@ -3099,8 +2999,7 @@ define List((Symbol,Int32))
3099 2999  
3100 3000  
3101 3001  
3102   - The next function adds transitions to all intermediate states in
3103   - our automaton.
  3002 + The next function adds transitions to all intermediate states in our automaton.
3104 3003  
3105 3004 define List(IntermediateState)
3106 3005 add_transitions
... ... @@ -3143,15 +3042,15 @@ define List(IntermediateState)
3143 3042  
3144 3043 ( A-> u.v , E)
3145 3044  
3146   - and if v is not empty, E is no more needed. Such a scenario is
3147   - called a 'shifting' scenario, because it will cause the shifting of
3148   - either a token or of an instance of a non terminal.
  3045 + and if v is not empty, E is no more needed. Such a scenario is called a 'shifting'
  3046 + scenario, because it will cause the shifting of either a token or of an instance of a
  3047 + non terminal.
3149 3048  
3150   - On the contrary, scenarii of the form
  3049 + On the contrary, scenarii of the form
3151 3050  
3152 3051 (A -> u. , E)
3153 3052  
3154   - are called 'reducing' scenarii, because they call for a reduction.
  3053 + are called 'reducing' scenarii, because they call for a reduction.
3155 3054  
3156 3055  
3157 3056 type NonEmptyList($T):
... ... @@ -3184,12 +3083,11 @@ type NewState:
3184 3083 List(Conflict) conflicts).
3185 3084  
3186 3085  
3187   - Given an automaton in the form of a list of intermediate states, we
3188   - transform it into an automaton in the form of a list of new
3189   - states. This is a state by state operation.
  3086 + Given an automaton in the form of a list of intermediate states, we transform it into
  3087 + an automaton in the form of a list of new states. This is a state by state operation.
3190 3088  
3191   - The next function checks if a precedence level may be deduced from
3192   - the right member of the rule.
  3089 + The next function checks if a precedence level may be deduced from the right member of
  3090 + the rule.
3193 3091  
3194 3092  
3195 3093 define Maybe(Int32)
... ... @@ -3235,8 +3133,8 @@ define Maybe(Int32)
3235 3133  
3236 3134  
3237 3135  
3238   - For each state, we just need to separate the list of scenarii, and
3239   - slightly rearrange each of them.
  3136 + For each state, we just need to separate the list of scenarii, and slightly rearrange
  3137 + each of them.
3240 3138  
3241 3139 define (List(ReducingScenario),List(ShiftingScenario))
3242 3140 separate
... ... @@ -3262,9 +3160,8 @@ define (List(ReducingScenario),List(ShiftingScenario))
3262 3160 }.
3263 3161  
3264 3162  
3265   - The next function establishes the list of conflict in a given
3266   - state, from the two lists of reducing scenarii and shifting
3267   - scenarii.
  3163 + The next function establishes the list of conflict in a given state, from the two lists
  3164 + of reducing scenarii and shifting scenarii.
3268 3165  
3269 3166  
3270 3167 define List($T)
... ... @@ -3409,10 +3306,9 @@ define Int32
3409 3306 *** (4.3) Making decisions.
3410 3307  
3411 3308  
3412   - We will now examine our states to decide what to do in the presence
3413   - of a given lookahead. In other words, we must construct our
3414   - 'action' function. We continue with the same example. We record all
3415   - possibilities in the following table:
  3309 + We will now examine our states to decide what to do in the presence of a given
  3310 + lookahead. In other words, we must construct our 'action' function. We continue with
  3311 + the same example. We record all possibilities in the following table:
3416 3312  
3417 3313 | a $
3418 3314 --+-------------------------
... ... @@ -3421,15 +3317,13 @@ define Int32
3421 3317 2 | s1/r2 r1/r2
3422 3318 3 | s1/r2/r4 r2/r4
3423 3319  
3424   - Indeed, in state 0, if we see an 'a' we may either shift and go to
3425   - state 1, or reduce using rule 2 (A -> ). If we see a '$' we can
3426   - only reduce using rule 2. In state 1, we can only reduce using rule
3427   - 3 (A -> a). In state 2, if we see 'a', we ca shift and go to state
3428   - 1, or reduce using rule 2 (A -> ). If we see a '$' we can reduce
3429   - using either rule 1 (S -> A) or rule 2 (A -> ). In state 3, if we
3430   - see 'a', we can shift and go to state 1, or reduce using either
3431   - rule 2 (A -> ) or rule 4 (A -> AA). If we see '$', we can reduce
3432   - using either rule 2 or rule 4.
  3320 + Indeed, in state 0, if we see an 'a' we may either shift and go to state 1, or reduce
  3321 + using rule 2 (A -> ). If we see a '$' we can only reduce using rule 2. In state 1, we
  3322 + can only reduce using rule 3 (A -> a). In state 2, if we see 'a', we ca shift and go to
  3323 + state 1, or reduce using rule 2 (A -> ). If we see a '$' we can reduce using either
  3324 + rule 1 (S -> A) or rule 2 (A -> ). In state 3, if we see 'a', we can shift and go to
  3325 + state 1, or reduce using either rule 2 (A -> ) or rule 4 (A -> AA). If we see '$', we
  3326 + can reduce using either rule 2 or rule 4.
3433 3327  
3434 3328 Hence, as expected, the example grammar is highly ambiguous.
3435 3329  
... ... @@ -3517,8 +3411,8 @@ define Int32
3517 3411  
3518 3412  
3519 3413  
3520   - Finally, here is a tool to print a 'first function'. We begin by a
3521   - function printing a list of extended tokens.
  3414 + Finally, here is a tool to print a 'first function'. We begin by a function printing a
  3415 + list of extended tokens.
3522 3416  
3523 3417 define One
3524 3418 print
... ... @@ -3923,14 +3817,6 @@ define One
3923 3817 }
3924 3818 }.
3925 3819  
3926   - define One
3927   - print
3928   - (
3929   - WAddr(Int8) file,
3930   - String s
3931   - ) =
3932   - print(file,s,0).
3933   -
3934 3820 define One
3935 3821 trace_body
3936 3822 (
... ... @@ -4011,9 +3897,8 @@ define One
4011 3897  
4012 3898 read trace_apg.anubis
4013 3899  
4014   - The function 'make_parser' receives the grammar read from the
4015   - source file (together with its name, its precedence and association
4016   - rules), and also the two output files.
  3900 + The function 'make_parser' receives the grammar read from the source file (together
  3901 + with its name, its precedence and association rules), and also the two output files.
4017 3902  
4018 3903 define Maybe(One)
4019 3904 make_parser
... ...