Commit b61efc885dd9bb72a0d38d082f52a7de021d9079
1 parent
b328018b
*** empty log message ***
Showing
1 changed file
with
415 additions
and
530 deletions
Show diff stats
anubis_dev/library/syntactic_analysis/parser_maker.anubis
| ... | ... | @@ -3,7 +3,7 @@ |
| 3 | 3 | |
| 4 | 4 | The Anubis Project |
| 5 | 5 | |
| 6 | - The Anubis Parser Generator | |
| 6 | + The Anubis Parser Maker | |
| 7 | 7 | |
| 8 | 8 | Copyright (c) Alain Prouté 2006. |
| 9 | 9 | |
| ... | ... | @@ -11,11 +11,10 @@ |
| 11 | 11 | |
| 12 | 12 | |
| 13 | 13 | |
| 14 | - This is the source file for the Anubis Parser Generator. | |
| 15 | - | |
| 16 | - From a grammar, APM generates an Anubis source file containing a program (called a | |
| 17 | - parser) able to recognize sentences of the corresponding language. APM is very similar | |
| 18 | - to the well known UNIX tool 'YACC' (or its GNU equivalent 'BISON'). | |
| 14 | + From a grammar, APM (the 'Anubis Parser Maker') generates an Anubis source file | |
| 15 | + containing a program (called a 'parser') able to recognize sentences of the | |
| 16 | + corresponding language. APM is very similar to the well known UNIX tool 'YACC' (or its | |
| 17 | + GNU equivalent 'BISON'). | |
| 19 | 18 | |
| 20 | 19 | |
| 21 | 20 | |
| ... | ... | @@ -60,9 +59,9 @@ |
| 60 | 59 | *** (5.3) States as functions. |
| 61 | 60 | |
| 62 | 61 | *** (6) Putting it all together. |
| 62 | + (this is still under construction) | |
| 63 | 63 | |
| 64 | - | |
| 65 | - ------------------------------------ | |
| 64 | + --------------------------------------------------------------------------------------- | |
| 66 | 65 | |
| 67 | 66 | |
| 68 | 67 | |
| ... | ... | @@ -75,10 +74,9 @@ |
| 75 | 74 | |
| 76 | 75 | *** (1.1) In theory. |
| 77 | 76 | |
| 78 | - We have two finite (and disjoint) sets of symbols: 'tokens' (also | |
| 79 | - called 'terminals') and 'non terminals'. Here are our notational | |
| 80 | - conventions (used in these explanations only, not in APM source | |
| 81 | - files): | |
| 77 | + We have two finite (and disjoint) sets of symbols: 'tokens' (also called 'terminals') | |
| 78 | + and 'non terminals'. Here are our notational conventions (used in these explanations | |
| 79 | + only, not in APM source files): | |
| 82 | 80 | |
| 83 | 81 | a, b, c,... represent tokens |
| 84 | 82 | A, B, C,... represent non terminals |
| ... | ... | @@ -88,80 +86,68 @@ |
| 88 | 86 | e represent the empty sequence of grammar symbols |
| 89 | 87 | $ is the end marker (a special additional token) |
| 90 | 88 | |
| 91 | - A 'grammar rule' (or 'production') has the form: A -> u (this one | |
| 92 | - is called an 'A-production'). In other words, it has a non terminal | |
| 93 | - on the left of the arrow, and a (possibly empty) sequence of | |
| 94 | - grammar symbols on the right of the arrow. Its meaning is that we | |
| 95 | - can produce an expression 'of type' 'A', by concatenating expressions | |
| 96 | - of types X_1...X_k, where u = X_1...X_k. In this interpretation, | |
| 97 | - tokens represent themselves. | |
| 89 | + A 'grammar rule' (or 'production') has the form: A -> u (this one is called an | |
| 90 | + 'A-production'). In other words, it has a non terminal on the left of the arrow, and a | |
| 91 | + (possibly empty) sequence of grammar symbols on the right of the arrow. Its meaning is | |
| 92 | + that we can produce an expression 'of type' 'A', by concatenating expressions of types | |
| 93 | + X_1...X_k, where u = X_1...X_k. In this interpretation, tokens represent themselves. | |
| 98 | 94 | |
| 99 | - A 'grammar' is a finite set of grammar rules, together with a | |
| 100 | - distinguished non terminal (denoted 'S' in these explanations), | |
| 101 | - called the 'axiom'. The 'language' associated to the grammar is the | |
| 102 | - set of all sequences of tokens which may produce 'S' (we also say | |
| 103 | - that they are 'instances' of 'S'). | |
| 95 | + A 'grammar' is a finite set of grammar rules, together with a distinguished non | |
| 96 | + terminal (denoted 'S' in these explanations), called the 'axiom'. The 'language' | |
| 97 | + associated to the grammar is the set of all sequences of tokens which may produce 'S' | |
| 98 | + (we also say that they are 'instances' of 'S'). | |
| 104 | 99 | |
| 105 | - For our convenience, we assume that there is one and only one | |
| 106 | - S-production, and that it has the form: S -> A. Furthermore, S | |
| 107 | - cannot appear in the right hand member of a production. It is | |
| 108 | - trivial to replace a given grammar by a grammar fulfilling these | |
| 109 | - conditions, by adding a new non terminal S, and the single new rule | |
| 110 | - S -> A, where A is the axiom of the original grammar. This | |
| 111 | - operation does not change the corresponding language. It is | |
| 112 | - realized below by the function 'add_S_rule'. | |
| 100 | + For our convenience, we assume that there is one and only one S-production, and that it | |
| 101 | + has the form: S -> A. Furthermore, S cannot appear in the right hand member of a | |
| 102 | + production. It is trivial to replace a given grammar by a grammar fulfilling these | |
| 103 | + conditions, by adding a new non terminal S, and the single new rule S -> A, where A is | |
| 104 | + the axiom of the original grammar. This operation does not change the corresponding | |
| 105 | + language. It is realized below by the function 'add_S_rule'. | |
| 113 | 106 | |
| 114 | 107 | |
| 115 | 108 | |
| 116 | 109 | |
| 117 | 110 | *** (1.2) In APM source files. |
| 118 | 111 | |
| 119 | - Of course, we need to read grammars from a source file (an APM | |
| 120 | - source file). The denotation for grammars in APM source files is | |
| 121 | - somewhat more complicated, because we must take the values of | |
| 122 | - grammar symbols into account. | |
| 123 | - | |
| 124 | - Indeed, in practice, terminals and non terminals may have | |
| 125 | - values. Hence, we have an Anubis type (the type of syntactical | |
| 126 | - entities) whose alternatives describe the required values (for both | |
| 127 | - terminals and non terminals). | |
| 128 | - | |
| 129 | - When the ALG lexer returns a token, this token already has received a | |
| 130 | - value. When the parser reduces a sequence X_1...X_k of grammar | |
| 131 | - symbols, using the production A -> X_1...X_k, it computes the value | |
| 132 | - of A from the values of X_1...X_k. Hence, the denotation for | |
| 133 | - productions should allow the description of this computation. In | |
| 134 | - YACC and BISON, this computation is described (in the language C) | |
| 135 | - within so-called 'actions', which are post-fixed to grammar | |
| 136 | - rules. In APM it is somewhat different. | |
| 112 | + Of course, we need to read grammars from a source file (an APM source file). The | |
| 113 | + denotation for grammars in APM source files is somewhat more complicated, because we | |
| 114 | + must take the values of grammar symbols into account. | |
| 115 | + | |
| 116 | + Indeed, in practice, terminals and non terminals may have values. Hence, we have an | |
| 117 | + Anubis type (the type of syntactical entities) whose alternatives describe the required | |
| 118 | + values (for both terminals and non terminals). | |
| 119 | + | |
| 120 | + When the ALG lexer returns a token, this token already has received a value. When the | |
| 121 | + parser reduces a sequence X_1...X_k of grammar symbols, using the production A -> | |
| 122 | + X_1...X_k, it computes the value of A from the values of X_1...X_k. Hence, the | |
| 123 | + denotation for productions should allow the description of this computation. In YACC | |
| 124 | + and BISON, this computation is described (in the language C) within so-called | |
| 125 | + 'actions', which are post-fixed to grammar rules. In APM it is somewhat different. | |
| 137 | 126 | |
| 138 | - Since APM grammar symbols may be also names of alternatives, they | |
| 139 | - may have operands, and the right hand side X_1...X_k of a | |
| 140 | - production, will be written for example as: | |
| 127 | + Since APM grammar symbols may be also names of alternatives, they may have operands, | |
| 128 | + and the right hand side X_1...X_k of a production, will be written for example as: | |
| 141 | 129 | |
| 142 | 130 | X_1(x,y) X_2(z) X_3 X_4(u,v,w) |
| 143 | 131 | |
| 144 | - assuming in this example that the grammar symbol X_1 has two | |
| 145 | - operands, X_2 one operand, X_3 no operand and X_4 three operands. | |
| 132 | + assuming in this example that the grammar symbol X_1 has two operands, X_2 one operand, | |
| 133 | + X_3 no operand and X_4 three operands. | |
| 146 | 134 | |
| 147 | - In this denotation, x, y, z, u, v and w must be symbols. In the | |
| 148 | - automaton produced by APM, they will become resurgent symbols. | |
| 135 | + In this denotation, x, y, z, u, v and w must be symbols. In the automaton produced by | |
| 136 | + APM, they will become resurgent symbols. | |
| 149 | 137 | |
| 150 | - Now, the complete production A -> X_1...X_k will be denoted | |
| 151 | - (assuming the same example): | |
| 138 | + Now, the complete production A -> X_1...X_k will be denoted (assuming the same | |
| 139 | + example): | |
| 152 | 140 | |
| 153 | 141 | A(t): X_1(x,y) X_2(z) X_3 X_4(u,v,w). |
| 154 | 142 | |
| 155 | - where t is a term (or several terms separated by commas), which may | |
| 156 | - make use of the symbols x, y, z, u, v and w. Of course t will be | |
| 157 | - used to compute the value of A when the reduction via this | |
| 158 | - production will occur. The above rule is something like a case in a | |
| 159 | - conditional, except that A(t) which plays the role of the body of | |
| 160 | - case, is written on the left hand side. | |
| 143 | + where t is a term (or several terms separated by commas), which may make use of the | |
| 144 | + symbols x, y, z, u, v and w. Of course t will be used to compute the value of A when | |
| 145 | + the reduction via this production will occur. The above rule is something like a case | |
| 146 | + in a conditional, except that A(t) which plays the role of the body of case, is written | |
| 147 | + on the left hand side. | |
| 161 | 148 | |
| 162 | - Hence, an APM grammar rule is described by the following | |
| 163 | - self-explanatory 'meta-grammar' (the symbol between square | |
| 164 | - brackets is a precedence level): | |
| 149 | + Hence, an APM grammar rule is described by the following self-explanatory | |
| 150 | + 'meta-grammar' (the symbol between square brackets is a precedence level): | |
| 165 | 151 | |
| 166 | 152 | GrammarRule -> Head : Body . |
| 167 | 153 | | Head : Body [ Symbol ] . |
| ... | ... | @@ -178,8 +164,8 @@ |
| 178 | 164 | Symbols_1 -> Symbol |
| 179 | 165 | | Symbol , Symbols_1 |
| 180 | 166 | |
| 181 | - In a 'Head', APM does not read the 'Term', but just keeps track of | |
| 182 | - matching parentheses (not contained within strings). | |
| 167 | + In a 'Head', APM does not read the 'Term', but just keeps track of matching parentheses | |
| 168 | + (not contained within strings). | |
| 183 | 169 | |
| 184 | 170 | Now, an APM source file has the following format: |
| 185 | 171 | |
| ... | ... | @@ -193,14 +179,13 @@ |
| 193 | 179 | postambule (Anubis text) |
| 194 | 180 | |
| 195 | 181 | |
| 196 | - Both tokens and nonterminals should be acceptable Anubis | |
| 197 | - symbols. Indeed, they must also be names of alternatives in the | |
| 198 | - type of syntactical entities. The name of this type is formed by | |
| 199 | - the concatenation of 'SyntaxTree_' and the name of the | |
| 200 | - parser. Normally it is defined by the user in the preambule. | |
| 182 | + Both tokens and nonterminals should be acceptable Anubis symbols. Indeed, they must | |
| 183 | + also be names of alternatives in the type of syntactical entities. The name of this | |
| 184 | + type is formed by the concatenation of 'SyntaxTree_' and the name of the | |
| 185 | + parser. Normally it is defined by the user in the preambule. | |
| 201 | 186 | |
| 202 | - Reading APM grammars is simple enough so that we do not need to use | |
| 203 | - neither ALG nor APM. | |
| 187 | + Reading APM grammars is simple enough so that we do not need to use neither ALG nor | |
| 188 | + APM. | |
| 204 | 189 | |
| 205 | 190 | |
| 206 | 191 | |
| ... | ... | @@ -264,14 +249,12 @@ define List($T) |
| 264 | 249 | |
| 265 | 250 | *** (2) Reading APM source files. |
| 266 | 251 | |
| 267 | - Below are the functions which enable APM to read source | |
| 268 | - files. There is also some kind of a lexer. Its state is stored into | |
| 269 | - a datum of type 'APM_LexerState'. This lexer keeps track of line | |
| 270 | - numbers, eliminates blank characters, and tokenizes the input into | |
| 271 | - a sequence of 'meta-tokens'. | |
| 252 | + Below are the functions which enable APM to read source files. There is also some kind | |
| 253 | + of a lexer. Its state is stored into a datum of type 'APM_LexerState'. This lexer keeps | |
| 254 | + track of line numbers, eliminates blank characters, and tokenizes the input into a | |
| 255 | + sequence of 'meta-tokens'. | |
| 272 | 256 | |
| 273 | - The meta-tokens we need to recognize in APM source files are the | |
| 274 | - following: | |
| 257 | + The meta-tokens we need to recognize in APM source files are the following: | |
| 275 | 258 | |
| 276 | 259 | symbols |
| 277 | 260 | terms (delimited by parentheses) |
| ... | ... | @@ -283,16 +266,14 @@ define List($T) |
| 283 | 266 | premature end of file (the legal end of file will be found by |
| 284 | 267 | the function copying the postambule) |
| 285 | 268 | |
| 286 | - They are defined as the alternatives of the type 'MetaToken'. Then, | |
| 287 | - assembling tokens into precedence rules or grammar rules is rather | |
| 288 | - easy. | |
| 269 | + They are defined as the alternatives of the type 'MetaToken'. Then, assembling tokens | |
| 270 | + into precedence rules or grammar rules is rather easy. | |
| 289 | 271 | |
| 290 | 272 | |
| 291 | 273 | |
| 292 | 274 | *** (2.1) Reading characters. |
| 293 | 275 | |
| 294 | - We must read characters in an extended sens, to take the end of | |
| 295 | - file into account. | |
| 276 | + We must read characters in an extended sens, to take the end of file into account. | |
| 296 | 277 | |
| 297 | 278 | type ExChar: |
| 298 | 279 | char(Int8), // normal character |
| ... | ... | @@ -307,8 +288,8 @@ type APM_LexerState: |
| 307 | 288 | Maybe(Int8) unread). // character possibly 'unread' |
| 308 | 289 | |
| 309 | 290 | |
| 310 | - Here is how we read a character (returning both the new state of | |
| 311 | - the lexer and the extended character). | |
| 291 | + Here is how we read a character (returning both the new state of the lexer and the | |
| 292 | + extended character). | |
| 312 | 293 | |
| 313 | 294 | define (APM_LexerState,ExChar) |
| 314 | 295 | read_char |
| ... | ... | @@ -345,16 +326,15 @@ define (APM_LexerState,ExChar) |
| 345 | 326 | char(c)) |
| 346 | 327 | }. |
| 347 | 328 | |
| 348 | - Note: 'unreading' a character is done 'by hand' by functions which | |
| 349 | - need to do that. They can do it because they hold the lexer state. | |
| 329 | + Note: 'unreading' a character is done 'by hand' by functions which need to do | |
| 330 | + that. They can do it because they hold the lexer state. | |
| 350 | 331 | |
| 351 | 332 | |
| 352 | 333 | |
| 353 | 334 | |
| 354 | 335 | *** (2.2) Reading meta-tokens. |
| 355 | 336 | |
| 356 | - While reading grammar rules, we need to recognize several kinds of | |
| 357 | - meta-tokens: | |
| 337 | + While reading grammar rules, we need to recognize several kinds of meta-tokens: | |
| 358 | 338 | |
| 359 | 339 | type MetaToken: |
| 360 | 340 | symbol(String), // a regular Anubis symbol |
| ... | ... | @@ -366,12 +346,11 @@ type MetaToken: |
| 366 | 346 | error(Int8), // any misplaced character |
| 367 | 347 | premature_end_of_file. // self explanatory |
| 368 | 348 | |
| 369 | - Note: (t), (x,y), (z), etc... are seen as 'term(String)' | |
| 370 | - meta-tokens. This is why parentheses do not appear in the above | |
| 371 | - definition of meta-tokens. | |
| 349 | + Note: (t), (x,y), (z), etc... are seen as 'term(String)' meta-tokens. This is why | |
| 350 | + parentheses do not appear in the above definition of meta-tokens. | |
| 372 | 351 | |
| 373 | 352 | |
| 374 | - Here is a simple useful test for detecting the beginning of a symbol. | |
| 353 | + Here is a simple useful test for detecting the beginning of a symbol. | |
| 375 | 354 | |
| 376 | 355 | define Bool |
| 377 | 356 | may_begin_symbol |
| ... | ... | @@ -412,8 +391,8 @@ define Bool |
| 412 | 391 | false. |
| 413 | 392 | |
| 414 | 393 | |
| 415 | - The function below reads a symbol whose first characters (at least | |
| 416 | - one) have already been read, and are given in reverse order. | |
| 394 | + The function below reads a symbol whose first characters (at least one) have already | |
| 395 | + been read, and are given in reverse order. | |
| 417 | 396 | |
| 418 | 397 | define (APM_LexerState,MetaToken) |
| 419 | 398 | read_symbol |
| ... | ... | @@ -437,14 +416,12 @@ define (APM_LexerState,MetaToken) |
| 437 | 416 | }. |
| 438 | 417 | |
| 439 | 418 | |
| 440 | - The function 'read_string_within_term' is called while reading a | |
| 441 | - string within a term (itself delimited by parentheses). The | |
| 442 | - beginning of the term has already been read. We need to declare | |
| 443 | - 'read_term', because the two functions are mutually recursive. In | |
| 444 | - fact 'read_term' calls (terminally) 'read_string_in_term' when the | |
| 445 | - beginning of a string is detected. Similarly, 'read_string_in_term' | |
| 446 | - calls (terminally) 'read_term' when the end of that string is | |
| 447 | - found. | |
| 419 | + The function 'read_string_within_term' is called while reading a string within a term | |
| 420 | + (itself delimited by parentheses). The beginning of the term has already been read. We | |
| 421 | + need to declare 'read_term', because the two functions are mutually recursive. In fact | |
| 422 | + 'read_term' calls (terminally) 'read_string_in_term' when the beginning of a string is | |
| 423 | + detected. Similarly, 'read_string_in_term' calls (terminally) 'read_term' when the end | |
| 424 | + of that string is found. | |
| 448 | 425 | |
| 449 | 426 | define (APM_LexerState,MetaToken) |
| 450 | 427 | read_term |
| ... | ... | @@ -486,8 +463,8 @@ define (APM_LexerState,MetaToken) |
| 486 | 463 | }. |
| 487 | 464 | |
| 488 | 465 | |
| 489 | - The function below reads anything placed between balanced parentheses. The | |
| 490 | - opening parenthese has already been read. | |
| 466 | + The function below reads anything placed between balanced parentheses. The opening | |
| 467 | + parenthese has already been read. | |
| 491 | 468 | |
| 492 | 469 | define (APM_LexerState,MetaToken) |
| 493 | 470 | read_term |
| ... | ... | @@ -582,8 +559,8 @@ define (APM_LexerState,MetaToken) |
| 582 | 559 | |
| 583 | 560 | |
| 584 | 561 | |
| 585 | - The next function reads the next meta-token from the source file, | |
| 586 | - whatever this meta-token is. | |
| 562 | + The next function reads the next meta-token from the source file, whatever this | |
| 563 | + meta-token is. | |
| 587 | 564 | |
| 588 | 565 | define (APM_LexerState,MetaToken) |
| 589 | 566 | read_meta_token |
| ... | ... | @@ -617,9 +594,8 @@ define (APM_LexerState,MetaToken) |
| 617 | 594 | |
| 618 | 595 | *** (2.3) Reading precedence and association rules. |
| 619 | 596 | |
| 620 | - Each token may be assigned a precedence level. A precedence level | |
| 621 | - is an integer, but it is implicit in the APM source file. Only the | |
| 622 | - order of declarations makes sens. | |
| 597 | + Each token may be assigned a precedence level. A precedence level is an integer, but it | |
| 598 | + is implicit in the APM source file. Only the order of declarations makes sens. | |
| 623 | 599 | |
| 624 | 600 | Each declaration has one of the forms: |
| 625 | 601 | |
| ... | ... | @@ -643,8 +619,7 @@ type ReadPrecRuleResult: |
| 643 | 619 | premature_end_of_file. |
| 644 | 620 | |
| 645 | 621 | |
| 646 | - The next function reads (maybe) a sequence of symbols, right | |
| 647 | - delimited by a dot. | |
| 622 | + The next function reads (maybe) a sequence of symbols, right delimited by a dot. | |
| 648 | 623 | |
| 649 | 624 | define (APM_LexerState,Maybe(List(String))) |
| 650 | 625 | read_symbols |
| ... | ... | @@ -660,9 +635,8 @@ define (APM_LexerState,Maybe(List(String))) |
| 660 | 635 | Note: names are stored in reverse order, but it does'nt matter. |
| 661 | 636 | |
| 662 | 637 | |
| 663 | - Now, we read a precedence rule whose keyword has already been | |
| 664 | - successfully read and recognized (and replaced by the corresponding | |
| 665 | - constructor for type 'PrecRule'). | |
| 638 | + Now, we read a precedence rule whose keyword has already been successfully read and | |
| 639 | + recognized (and replaced by the corresponding constructor for type 'PrecRule'). | |
| 666 | 640 | |
| 667 | 641 | define (APM_LexerState,ReadPrecRuleResult) |
| 668 | 642 | read_prec_names |
| ... | ... | @@ -680,8 +654,8 @@ define (APM_LexerState,ReadPrecRuleResult) |
| 680 | 654 | }. |
| 681 | 655 | |
| 682 | 656 | |
| 683 | - Here, we read a precedence rule, whose keyword has been read but | |
| 684 | - not yet recognized (it is only a character string at that point). | |
| 657 | + Here, we read a precedence rule, whose keyword has been read but not yet recognized (it | |
| 658 | + is only a character string at that point). | |
| 685 | 659 | |
| 686 | 660 | define (APM_LexerState,ReadPrecRuleResult) |
| 687 | 661 | read_after_prec_keyword |
| ... | ... | @@ -716,9 +690,8 @@ define (APM_LexerState,ReadPrecRuleResult) |
| 716 | 690 | }. |
| 717 | 691 | |
| 718 | 692 | |
| 719 | - Now, we must be able to read a sequence of precedence rules. This | |
| 720 | - is achieved by the following function, which reads precedence rules | |
| 721 | - until a separator (#) is found. | |
| 693 | + Now, we must be able to read a sequence of precedence rules. This is achieved by the | |
| 694 | + following function, which reads precedence rules until a separator (#) is found. | |
| 722 | 695 | |
| 723 | 696 | type ReadPrecRulesResult: |
| 724 | 697 | ok(List(PrecRule)), |
| ... | ... | @@ -743,10 +716,9 @@ define (APM_LexerState,ReadPrecRulesResult) |
| 743 | 716 | }. |
| 744 | 717 | |
| 745 | 718 | |
| 746 | - Now, we can construct precedence tables. The first one gives the | |
| 747 | - precedence level for each token name. The second one gives the | |
| 748 | - association mode for each precedence level. They are lists of | |
| 749 | - the following respective types: | |
| 719 | + Now, we can construct precedence tables. The first one gives the precedence level for | |
| 720 | + each token name. The second one gives the association mode for each precedence | |
| 721 | + level. They are lists of the following respective types: | |
| 750 | 722 | |
| 751 | 723 | List((String,Int32)) |
| 752 | 724 | List((Int32,AssocMode)) |
| ... | ... | @@ -757,8 +729,8 @@ type AssocMode: |
| 757 | 729 | non_assoc. |
| 758 | 730 | |
| 759 | 731 | |
| 760 | - The next function constructs the table of association modes from | |
| 761 | - the list of precedence rules. | |
| 732 | + The next function constructs the table of association modes from the list of precedence | |
| 733 | + rules. | |
| 762 | 734 | |
| 763 | 735 | define List((Int32,AssocMode)) |
| 764 | 736 | make_assoc_table |
| ... | ... | @@ -786,8 +758,8 @@ define List((Int32,AssocMode)) |
| 786 | 758 | make_assoc_table(l,0). |
| 787 | 759 | |
| 788 | 760 | |
| 789 | - The next function constructs the list of entries in the precedence | |
| 790 | - table for just one level. | |
| 761 | + The next function constructs the list of entries in the precedence table for just one | |
| 762 | + level. | |
| 791 | 763 | |
| 792 | 764 | define List((String,Int32)) |
| 793 | 765 | make_precedence_entries |
| ... | ... | @@ -803,8 +775,8 @@ define List((String,Int32)) |
| 803 | 775 | }. |
| 804 | 776 | |
| 805 | 777 | |
| 806 | - The next function constructs the table of precedence levels from | |
| 807 | - the list of precedence rules. | |
| 778 | + The next function constructs the table of precedence levels from the list of precedence | |
| 779 | + rules. | |
| 808 | 780 | |
| 809 | 781 | define List((String,Int32)) |
| 810 | 782 | make_precedence_table |
| ... | ... | @@ -822,8 +794,8 @@ define List((String,Int32)) |
| 822 | 794 | }. |
| 823 | 795 | |
| 824 | 796 | |
| 825 | - The next function gives the mode for a given precedence level | |
| 826 | - (using the association table). | |
| 797 | + The next function gives the mode for a given precedence level (using the association | |
| 798 | + table). | |
| 827 | 799 | |
| 828 | 800 | define AssocMode |
| 829 | 801 | mode |
| ... | ... | @@ -843,10 +815,9 @@ define AssocMode |
| 843 | 815 | }. |
| 844 | 816 | |
| 845 | 817 | |
| 846 | - The next function checks the precedence table. It consists in | |
| 847 | - verifying that the same name is not present two times, and that no | |
| 848 | - non terminal has an entry in the table (we will see later how to | |
| 849 | - construct the list of names of non terminals). | |
| 818 | + The next function checks the precedence table. It consists in verifying that the same | |
| 819 | + name is not present two times, and that no non terminal has an entry in the table (we | |
| 820 | + will see later how to construct the list of names of non terminals). | |
| 850 | 821 | |
| 851 | 822 | type CheckPrecResult: |
| 852 | 823 | ok, |
| ... | ... | @@ -890,8 +861,7 @@ define CheckPrecResult |
| 890 | 861 | }. |
| 891 | 862 | |
| 892 | 863 | |
| 893 | - The next function gives the precedence level (if it exists) for a given | |
| 894 | - token name. | |
| 864 | + The next function gives the precedence level (if it exists) for a given token name. | |
| 895 | 865 | |
| 896 | 866 | define Maybe(Int32) |
| 897 | 867 | prec |
| ... | ... | @@ -911,7 +881,7 @@ define Maybe(Int32) |
| 911 | 881 | }. |
| 912 | 882 | |
| 913 | 883 | |
| 914 | - The same one, but for a possibly missing name. | |
| 884 | + The same one, but for a possibly missing name. | |
| 915 | 885 | |
| 916 | 886 | define Maybe(Int32) |
| 917 | 887 | prec |
| ... | ... | @@ -946,9 +916,8 @@ type Symbol: |
| 946 | 916 | non_terminal(String name). // any non terminal with its name |
| 947 | 917 | |
| 948 | 918 | |
| 949 | - Grammar rules A(t) -> u [p] (where p is a possible precedence | |
| 950 | - level: actually, the name of a token) are stored as data of the | |
| 951 | - following type: | |
| 919 | + Grammar rules A(t) -> u [p] (where p is a possible precedence level: actually, the name | |
| 920 | + of a token) are stored as data of the following type: | |
| 952 | 921 | |
| 953 | 922 | type GrammarRule: |
| 954 | 923 | grammar_rule(String head, // A |
| ... | ... | @@ -956,11 +925,11 @@ type GrammarRule: |
| 956 | 925 | List((Symbol,String)) body, // u |
| 957 | 926 | Maybe(Int32) prec). // precedence level of p |
| 958 | 927 | |
| 959 | - Note: in the pair (Symbol,String), the second element represents the | |
| 960 | - value of the symbol (if no value is given, it is the empty string). | |
| 928 | + Note: in the pair (Symbol,String), the second element represents the value of the | |
| 929 | + symbol (if no value is given, it is the empty string). | |
| 961 | 930 | |
| 962 | - Below is a function which reads the right hand side of a grammar | |
| 963 | - rule. We need a type to handle the result of such a reading. | |
| 931 | + Below is a function which reads the right hand side of a grammar rule. We need a type | |
| 932 | + to handle the result of such a reading. | |
| 964 | 933 | |
| 965 | 934 | type RightHandResult: |
| 966 | 935 | ok(List((Symbol,String)), // a correct right hand side has been read |
| ... | ... | @@ -1023,8 +992,8 @@ define (APM_LexerState,RightHandResult) |
| 1023 | 992 | }. |
| 1024 | 993 | |
| 1025 | 994 | |
| 1026 | - We also need a special type to handle all possible situations in the | |
| 1027 | - result of reading a grammar rule. | |
| 995 | + We also need a special type to handle all possible situations in the result of reading | |
| 996 | + a grammar rule. | |
| 1028 | 997 | |
| 1029 | 998 | type ReadGrammarRuleResult: |
| 1030 | 999 | ok(GrammarRule), // a grammar rule has been read successfully |
| ... | ... | @@ -1035,8 +1004,8 @@ type ReadGrammarRuleResult: |
| 1035 | 1004 | // reading a parser section |
| 1036 | 1005 | |
| 1037 | 1006 | |
| 1038 | - Below is a function which reads a grammar rule whose head | |
| 1039 | - (including the colon) has been already read. | |
| 1007 | + Below is a function which reads a grammar rule whose head (including the colon) has | |
| 1008 | + been already read. | |
| 1040 | 1009 | |
| 1041 | 1010 | define (APM_LexerState,ReadGrammarRuleResult) |
| 1042 | 1011 | read_after_colon |
| ... | ... | @@ -1056,8 +1025,8 @@ define (APM_LexerState,ReadGrammarRuleResult) |
| 1056 | 1025 | }. |
| 1057 | 1026 | |
| 1058 | 1027 | |
| 1059 | - Below is a function which reads a grammar rule whose head has | |
| 1060 | - already been read (not including the colon). | |
| 1028 | + Below is a function which reads a grammar rule whose head has already been read (not | |
| 1029 | + including the colon). | |
| 1061 | 1030 | |
| 1062 | 1031 | define (APM_LexerState,ReadGrammarRuleResult) |
| 1063 | 1032 | read_after_head |
| ... | ... | @@ -1073,8 +1042,7 @@ define (APM_LexerState,ReadGrammarRuleResult) |
| 1073 | 1042 | else (ls,syntax_error). |
| 1074 | 1043 | |
| 1075 | 1044 | |
| 1076 | - Below is a function which reads a grammar rule whose head name has | |
| 1077 | - already been read. | |
| 1045 | + Below is a function which reads a grammar rule whose head name has already been read. | |
| 1078 | 1046 | |
| 1079 | 1047 | define (APM_LexerState,ReadGrammarRuleResult) |
| 1080 | 1048 | read_after_head_name |
| ... | ... | @@ -1098,7 +1066,7 @@ define (APM_LexerState,ReadGrammarRuleResult) |
| 1098 | 1066 | |
| 1099 | 1067 | |
| 1100 | 1068 | |
| 1101 | - Below is a function, which reads a complete grammar rule from a file. | |
| 1069 | + Below is a function, which reads a complete grammar rule from a file. | |
| 1102 | 1070 | |
| 1103 | 1071 | define (APM_LexerState,ReadGrammarRuleResult) |
| 1104 | 1072 | read_grammar_rule |
| ... | ... | @@ -1145,11 +1113,10 @@ define (APM_LexerState,ReadGrammarRulesResult) |
| 1145 | 1113 | |
| 1146 | 1114 | *** (2.5) Finding non terminals. |
| 1147 | 1115 | |
| 1148 | - So far, the grammar has been read, but all symbols have been stored | |
| 1149 | - as terminals. We must establish the list of names of all non | |
| 1150 | - terminals (they simply appear at the head of grammar rules, and | |
| 1151 | - change in grammar rules any symbol whose name matches one of these, | |
| 1152 | - to a non terminal. | |
| 1116 | + So far, the grammar has been read, but all symbols have been stored as terminals. We | |
| 1117 | + must establish the list of names of all non terminals (they simply appear at the head | |
| 1118 | + of grammar rules, and change in grammar rules any symbol whose name matches one of | |
| 1119 | + these, to a non terminal. | |
| 1153 | 1120 | |
| 1154 | 1121 | |
| 1155 | 1122 | define List(String) |
| ... | ... | @@ -1274,10 +1241,9 @@ define Maybe(One) |
| 1274 | 1241 | |
| 1275 | 1242 | |
| 1276 | 1243 | |
| 1277 | - The next function reads from the first separator to the third | |
| 1278 | - (last) one. It also calls the functions which will construct the | |
| 1279 | - automaton and dump it into the output file and the log file. Here | |
| 1280 | - is what it does: | |
| 1244 | + The next function reads from the first separator to the third (last) one. It also calls | |
| 1245 | + the functions which will construct the automaton and dump it into the output file and | |
| 1246 | + the log file. Here is what it does: | |
| 1281 | 1247 | |
| 1282 | 1248 | - read the name of the parser, |
| 1283 | 1249 | - read precedence rules, |
| ... | ... | @@ -1285,8 +1251,7 @@ define Maybe(One) |
| 1285 | 1251 | - construct a datum of type 'Grammar', |
| 1286 | 1252 | - call 'make_parser' |
| 1287 | 1253 | |
| 1288 | - it returns failure in case of a problem, and success(unique) | |
| 1289 | - otherwise. | |
| 1254 | + it returns failure in case of a problem, and success(unique) otherwise. | |
| 1290 | 1255 | |
| 1291 | 1256 | |
| 1292 | 1257 | type Grammar: |
| ... | ... | @@ -1356,10 +1321,9 @@ define Maybe(One) |
| 1356 | 1321 | |
| 1357 | 1322 | |
| 1358 | 1323 | |
| 1359 | - The next function dumps the content of the input file into the output | |
| 1360 | - file, until the first separator is found. In other words, it copies | |
| 1361 | - the preambule to the output. It does not use the lexer, and must | |
| 1362 | - update the line number itself. | |
| 1324 | + The next function dumps the content of the input file into the output file, until the | |
| 1325 | + first separator is found. In other words, it copies the preambule to the output. It | |
| 1326 | + does not use the lexer, and must update the line number itself. | |
| 1363 | 1327 | |
| 1364 | 1328 | define Maybe(Int32) |
| 1365 | 1329 | copy_preambule |
| ... | ... | @@ -1390,8 +1354,8 @@ define Maybe(Int32) |
| 1390 | 1354 | }. |
| 1391 | 1355 | |
| 1392 | 1356 | |
| 1393 | - The next function copies the postambule to the output. It does not | |
| 1394 | - need to count line numbers. | |
| 1357 | + The next function copies the postambule to the output. It does not need to count line | |
| 1358 | + numbers. | |
| 1395 | 1359 | |
| 1396 | 1360 | define One |
| 1397 | 1361 | copy_postambule |
| ... | ... | @@ -1413,9 +1377,8 @@ define One |
| 1413 | 1377 | |
| 1414 | 1378 | |
| 1415 | 1379 | |
| 1416 | - The next function receives the three files (input, output and the | |
| 1417 | - log file), reads the grammar and make the automaton. It proceeds in | |
| 1418 | - three steps: | |
| 1380 | + The next function receives the three files (input, output and the log file), reads the | |
| 1381 | + grammar and make the automaton. It proceeds in three steps: | |
| 1419 | 1382 | |
| 1420 | 1383 | - copy the preambule to the output, |
| 1421 | 1384 | - create a lexer state, read the precedence rules, the grammar |
| ... | ... | @@ -1462,8 +1425,8 @@ define Maybe(Option) |
| 1462 | 1425 | |
| 1463 | 1426 | |
| 1464 | 1427 | |
| 1465 | - The next function takes the arguments of the command line and | |
| 1466 | - separates options from the source file name. | |
| 1428 | + The next function takes the arguments of the command line and separates options from | |
| 1429 | + the source file name. | |
| 1467 | 1430 | |
| 1468 | 1431 | define Maybe((String,List(Option))) |
| 1469 | 1432 | separate_options |
| ... | ... | @@ -1508,8 +1471,7 @@ define Maybe((String,List(Option))) |
| 1508 | 1471 | |
| 1509 | 1472 | |
| 1510 | 1473 | |
| 1511 | - Finally, here is the function which is made global. It performs the | |
| 1512 | - following tasks: | |
| 1474 | + Finally, here is the function which is made global. It performs the following tasks: | |
| 1513 | 1475 | |
| 1514 | 1476 | - separate options from the source file name (by calling 'separate_options'), |
| 1515 | 1477 | - open the source file, |
| ... | ... | @@ -1562,31 +1524,28 @@ global define One |
| 1562 | 1524 | *** (3) Making the parser automaton. |
| 1563 | 1525 | |
| 1564 | 1526 | |
| 1565 | - In order to exemplify our discussion we will refer in the sequel to | |
| 1566 | - the following (ambiguous) 'example grammar': | |
| 1527 | + In order to exemplify our discussion we will refer in the sequel to the following | |
| 1528 | + (ambiguous) 'example grammar': | |
| 1567 | 1529 | |
| 1568 | 1530 | S -> A |
| 1569 | 1531 | A -> |
| 1570 | 1532 | A -> a |
| 1571 | 1533 | A -> AA |
| 1572 | 1534 | |
| 1573 | - Notice that this grammar produces all sequences of a's, including | |
| 1574 | - the empty sequence. It is ambiguous since for example the sequence | |
| 1575 | - aaa may 'reduce' to S (or 'be derived' from S) in at least two | |
| 1576 | - ways: | |
| 1535 | + Notice that this grammar produces all sequences of a's, including the empty | |
| 1536 | + sequence. It is ambiguous since for example the sequence aaa may 'reduce' to S (or 'be | |
| 1537 | + derived' from S) in at least two ways: | |
| 1577 | 1538 | |
| 1578 | 1539 | S -> A -> AA -> AAA -> AAa -> Aaa -> aaa |
| 1579 | 1540 | S -> A -> AA -> Aa -> AAa -> Aaa -> aaa |
| 1580 | 1541 | |
| 1581 | - even if we use only 'rightmost' derivations, which means that when | |
| 1582 | - we follow the arrows, the non terminal which is replaced is always | |
| 1583 | - the rightmost one. It is the case above, as one may easily | |
| 1584 | - check. In the first case the tree structure of our sequence is | |
| 1585 | - a(aa), while in the second case, it is (aa)a. | |
| 1542 | + even if we use only 'rightmost' derivations, which means that when we follow the | |
| 1543 | + arrows, the non terminal which is replaced is always the rightmost one. It is the case | |
| 1544 | + above, as one may easily check. In the first case the tree structure of our sequence is | |
| 1545 | + a(aa), while in the second case, it is (aa)a. | |
| 1586 | 1546 | |
| 1587 | - The automaton will realize the first of our two derivations above | |
| 1588 | - as follows (the dot represents the current position of reading from | |
| 1589 | - the input): | |
| 1547 | + The automaton will realize the first of our two derivations above as follows (the dot | |
| 1548 | + represents the current position of reading from the input): | |
| 1590 | 1549 | |
| 1591 | 1550 | .aaa shift |
| 1592 | 1551 | a.aa reduce using rule A -> a |
| ... | ... | @@ -1612,26 +1571,24 @@ global define One |
| 1612 | 1571 | A. reduce using rule S -> A (accept) |
| 1613 | 1572 | S. |
| 1614 | 1573 | |
| 1615 | - The ambiguity is realized here by the choice we have in the | |
| 1616 | - situation: | |
| 1574 | + The ambiguity is realized here by the choice we have in the situation: | |
| 1617 | 1575 | |
| 1618 | 1576 | AA.a |
| 1619 | 1577 | |
| 1620 | - We may either reduce using rule A -> AA or shift. | |
| 1578 | + We may either reduce using rule A -> AA or shift. | |
| 1621 | 1579 | |
| 1622 | - However, this grammar is much more ambiguous than this. We could | |
| 1623 | - for example have the following sequence: | |
| 1580 | + However, this grammar is much more ambiguous than this. We could for example have the | |
| 1581 | + following sequence: | |
| 1624 | 1582 | |
| 1625 | 1583 | AA.a reduce using rule A -> |
| 1626 | 1584 | AAA.a reduce using rule A -> AA |
| 1627 | 1585 | AA.a |
| 1628 | 1586 | |
| 1629 | - which is obviously undesirable. In other words, our grammar has not | |
| 1630 | - only a shift/reduce conflict, but at least one reduce/reduce | |
| 1631 | - conflict. | |
| 1587 | + which is obviously undesirable. In other words, our grammar has not only a shift/reduce | |
| 1588 | + conflict, but at least one reduce/reduce conflict. | |
| 1632 | 1589 | |
| 1633 | - If we want to produce the same language (all the sequences of a's) | |
| 1634 | - with a non ambiguous grammar, we should use this one: | |
| 1590 | + If we want to produce the same language (all the sequences of a's) with a non ambiguous | |
| 1591 | + grammar, we should use this one: | |
| 1635 | 1592 | |
| 1636 | 1593 | S -> A |
| 1637 | 1594 | A -> |
| ... | ... | @@ -1652,18 +1609,16 @@ global define One |
| 1652 | 1609 | |
| 1653 | 1610 | *** (3.1) Computing 'First'. |
| 1654 | 1611 | |
| 1655 | - Any symbol in a grammar represents a set of sequences of tokens, | |
| 1656 | - namely all sequences of tokens which reduce to this symbol. We also | |
| 1657 | - say that such a sequence is derived from the symbol, or that it is | |
| 1658 | - an 'instance' of the symbol. | |
| 1612 | + Any symbol in a grammar represents a set of sequences of tokens, namely all sequences | |
| 1613 | + of tokens which reduce to this symbol. We also say that such a sequence is derived from | |
| 1614 | + the symbol, or that it is an 'instance' of the symbol. | |
| 1659 | 1615 | |
| 1660 | - To any symbol we associate a finite set of 'extended tokens'. Here | |
| 1661 | - an extended token is either 'e' (representing the absence of a | |
| 1662 | - token) or a normal token, or the end marker '$'. | |
| 1616 | + To any symbol we associate a finite set of 'extended tokens'. Here an extended token is | |
| 1617 | + either 'e' (representing the absence of a token) or a normal token, or the end marker | |
| 1618 | + '$'. | |
| 1663 | 1619 | |
| 1664 | - By definition, 'First(X)' is the set of all tokens which may come | |
| 1665 | - first in an instance of 'X', plus 'e' if the empty sequence is an | |
| 1666 | - instance of 'X'. | |
| 1620 | + By definition, 'First(X)' is the set of all tokens which may come first in an instance | |
| 1621 | + of 'X', plus 'e' if the empty sequence is an instance of 'X'. | |
| 1667 | 1622 | |
| 1668 | 1623 | For our example grammar, we have: |
| 1669 | 1624 | |
| ... | ... | @@ -1680,10 +1635,9 @@ type ExToken: |
| 1680 | 1635 | dollar. // the end marker |
| 1681 | 1636 | |
| 1682 | 1637 | |
| 1683 | - However, computing 'First' in general is not so easy. This is | |
| 1684 | - a saturation process. The main work is to compute 'First' for non | |
| 1685 | - terminals, since it is trivial for tokens. Here is how we can do | |
| 1686 | - this. | |
| 1638 | + However, computing 'First' in general is not so easy. This is a saturation process. The | |
| 1639 | + main work is to compute 'First' for non terminals, since it is trivial for tokens. Here | |
| 1640 | + is how we can do this. | |
| 1687 | 1641 | |
| 1688 | 1642 | (1) to each non terminal associate the empty list, i.e put |
| 1689 | 1643 | First(A) = [ ]. |
| ... | ... | @@ -1700,11 +1654,11 @@ type ExToken: |
| 1700 | 1654 | - e is not in First(B) then add all of First(B) |
| 1701 | 1655 | to First(A). |
| 1702 | 1656 | |
| 1703 | - Of course, productions are added to the grammar only for computing | |
| 1704 | - 'First', not for any other computation. | |
| 1657 | + Of course, productions are added to the grammar only for computing 'First', not for any | |
| 1658 | + other computation. | |
| 1705 | 1659 | |
| 1706 | - We also need to compute 'First(X_1...X_k) for any sequence of | |
| 1707 | - symbols. This is done by induction on k: | |
| 1660 | + We also need to compute 'First(X_1...X_k) for any sequence of symbols. This is done by | |
| 1661 | + induction on k: | |
| 1708 | 1662 | |
| 1709 | 1663 | First() = [e] |
| 1710 | 1664 | First(X_1...X_k) = |
| ... | ... | @@ -1712,8 +1666,8 @@ type ExToken: |
| 1712 | 1666 | - else First(X_1). |
| 1713 | 1667 | |
| 1714 | 1668 | |
| 1715 | - In practice, we compute only what we call a 'first function', which | |
| 1716 | - is an association list: | |
| 1669 | + In practice, we compute only what we call a 'first function', which is an association | |
| 1670 | + list: | |
| 1717 | 1671 | |
| 1718 | 1672 | [ |
| 1719 | 1673 | (A,[...]), |
| ... | ... | @@ -1721,12 +1675,12 @@ type ExToken: |
| 1721 | 1675 | ... |
| 1722 | 1676 | ] |
| 1723 | 1677 | |
| 1724 | - of type List((String,List(ExToken))), where 'A', 'B',... are the | |
| 1725 | - non terminals, and [...] the list of extended tokens which may come | |
| 1726 | - first in an instance of the corresponding non terminal. | |
| 1678 | + of type List((String,List(ExToken))), where 'A', 'B',... are the non terminals, and | |
| 1679 | + [...] the list of extended tokens which may come first in an instance of the | |
| 1680 | + corresponding non terminal. | |
| 1727 | 1681 | |
| 1728 | - The next function computes: (l1 -[e]) union l2. However, | |
| 1729 | - 'e' may belong to l2, and in that case will belong to the result. | |
| 1682 | + The next function computes: (l1 -[e]) union l2. However, 'e' may belong to l2, and in | |
| 1683 | + that case will belong to the result. | |
| 1730 | 1684 | |
| 1731 | 1685 | define List(ExToken) |
| 1732 | 1686 | merge_except_empty |
| ... | ... | @@ -1746,8 +1700,8 @@ define List(ExToken) |
| 1746 | 1700 | }. |
| 1747 | 1701 | |
| 1748 | 1702 | |
| 1749 | - We will need to convert an extended token to a grammar symbol. 'e' | |
| 1750 | - should never be converted. | |
| 1703 | + We will need to convert an extended token to a grammar symbol. 'e' should never be | |
| 1704 | + converted. | |
| 1751 | 1705 | |
| 1752 | 1706 | define Symbol |
| 1753 | 1707 | to_symbol |
| ... | ... | @@ -1762,8 +1716,8 @@ define Symbol |
| 1762 | 1716 | }. |
| 1763 | 1717 | |
| 1764 | 1718 | |
| 1765 | - The function below, constructs the initial stage of our 'first | |
| 1766 | - function'. In this stage all lists of tokens are empty. | |
| 1719 | + The function below, constructs the initial stage of our 'first function'. In this stage | |
| 1720 | + all lists of tokens are empty. | |
| 1767 | 1721 | |
| 1768 | 1722 | define List((String,List(ExToken))) |
| 1769 | 1723 | initial_stage |
| ... | ... | @@ -1777,9 +1731,8 @@ define List((String,List(ExToken))) |
| 1777 | 1731 | }. |
| 1778 | 1732 | |
| 1779 | 1733 | |
| 1780 | - We will also need to find the value of a non terminal (given by its | |
| 1781 | - name) in our 'first function'. This search should always be | |
| 1782 | - successful. | |
| 1734 | + We will also need to find the value of a non terminal (given by its name) in our 'first | |
| 1735 | + function'. This search should always be successful. | |
| 1783 | 1736 | |
| 1784 | 1737 | define List(ExToken) |
| 1785 | 1738 | first |
| ... | ... | @@ -1814,8 +1767,7 @@ define List(ExToken) |
| 1814 | 1767 | }. |
| 1815 | 1768 | |
| 1816 | 1769 | |
| 1817 | - Finally, we may compute 'First(u)' for any sequence of grammar | |
| 1818 | - symbols 'u'. | |
| 1770 | + Finally, we may compute 'First(u)' for any sequence of grammar symbols 'u'. | |
| 1819 | 1771 | |
| 1820 | 1772 | define List(ExToken) |
| 1821 | 1773 | first |
| ... | ... | @@ -1837,11 +1789,10 @@ define List(ExToken) |
| 1837 | 1789 | |
| 1838 | 1790 | |
| 1839 | 1791 | |
| 1840 | - The following function adds a token to a set of tokens in a 'first | |
| 1841 | - function'. It is given the extended token 'x' to be added, the | |
| 1842 | - name of the non terminal under which it should be added, and the 'first | |
| 1843 | - function' into which this operation should be performed. The | |
| 1844 | - grammar is not used, but must be transmitted via terminal calls. | |
| 1792 | + The following function adds a token to a set of tokens in a 'first function'. It is | |
| 1793 | + given the extended token 'x' to be added, the name of the non terminal under which it | |
| 1794 | + should be added, and the 'first function' into which this operation should be | |
| 1795 | + performed. The grammar is not used, but must be transmitted via terminal calls. | |
| 1845 | 1796 | |
| 1846 | 1797 | define (List((String,List(ExToken))),List(GrammarRule)) |
| 1847 | 1798 | add |
| ... | ... | @@ -1864,8 +1815,7 @@ define (List((String,List(ExToken))),List(GrammarRule)) |
| 1864 | 1815 | }. |
| 1865 | 1816 | |
| 1866 | 1817 | |
| 1867 | - The next function tests if a given non terminal may represent the | |
| 1868 | - empty sequence. | |
| 1818 | + The next function tests if a given non terminal may represent the empty sequence. | |
| 1869 | 1819 | |
| 1870 | 1820 | define Bool |
| 1871 | 1821 | may_be_empty |
| ... | ... | @@ -1876,8 +1826,8 @@ define Bool |
| 1876 | 1826 | member(empty,first(name,f)). |
| 1877 | 1827 | |
| 1878 | 1828 | |
| 1879 | - The following function adds all elements of a set of extended | |
| 1880 | - tokens to a 'first list' in a given 'first function'. | |
| 1829 | + The following function adds all elements of a set of extended tokens to a 'first list' | |
| 1830 | + in a given 'first function'. | |
| 1881 | 1831 | |
| 1882 | 1832 | define (List((String,List(ExToken))),List(GrammarRule)) |
| 1883 | 1833 | add_all_of |
| ... | ... | @@ -1919,8 +1869,8 @@ define (List((String,List(ExToken))),List(GrammarRule)) |
| 1919 | 1869 | }. |
| 1920 | 1870 | |
| 1921 | 1871 | |
| 1922 | - The following function works out one grammar rule for the addition | |
| 1923 | - of elements to 'First lists'. | |
| 1872 | + The following function works out one grammar rule for the addition of elements to | |
| 1873 | + 'First lists'. | |
| 1924 | 1874 | |
| 1925 | 1875 | define (List((String,List(ExToken))),List(GrammarRule)) |
| 1926 | 1876 | first_work_rule |
| ... | ... | @@ -1950,10 +1900,9 @@ define (List((String,List(ExToken))),List(GrammarRule)) |
| 1950 | 1900 | }. |
| 1951 | 1901 | |
| 1952 | 1902 | |
| 1953 | - The next function makes one step of completion of first sets (only | |
| 1954 | - for non terminals), making one action for each rule in the | |
| 1955 | - grammar. We need to return both the 'first function' 'f' and the | |
| 1956 | - grammar, because they change during the process. | |
| 1903 | + The next function makes one step of completion of first sets (only for non terminals), | |
| 1904 | + making one action for each rule in the grammar. We need to return both the 'first | |
| 1905 | + function' 'f' and the grammar, because they change during the process. | |
| 1957 | 1906 | |
| 1958 | 1907 | define (List((String,List(ExToken))),List(GrammarRule)) |
| 1959 | 1908 | first_one_step |
| ... | ... | @@ -1971,7 +1920,7 @@ define (List((String,List(ExToken))),List(GrammarRule)) |
| 1971 | 1920 | }. |
| 1972 | 1921 | |
| 1973 | 1922 | |
| 1974 | - The next function saturates a 'first function'. | |
| 1923 | + The next function saturates a 'first function'. | |
| 1975 | 1924 | |
| 1976 | 1925 | define (List((String,List(ExToken))),List(GrammarRule),Int32) |
| 1977 | 1926 | saturate_first |
| ... | ... | @@ -1988,8 +1937,7 @@ define (List((String,List(ExToken))),List(GrammarRule),Int32) |
| 1988 | 1937 | else saturate_first(f_new,l_new,count+1). |
| 1989 | 1938 | |
| 1990 | 1939 | |
| 1991 | - We need to extract the list of all non terminals from the | |
| 1992 | - grammar. | |
| 1940 | + We need to extract the list of all non terminals from the grammar. | |
| 1993 | 1941 | |
| 1994 | 1942 | define List(String) |
| 1995 | 1943 | non_terminals |
| ... | ... | @@ -2020,8 +1968,7 @@ define List(String) |
| 2020 | 1968 | |
| 2021 | 1969 | |
| 2022 | 1970 | |
| 2023 | - Here is the function which computes the 'first function' associated | |
| 2024 | - to a given grammar. | |
| 1971 | + Here is the function which computes the 'first function' associated to a given grammar. | |
| 2025 | 1972 | |
| 2026 | 1973 | define (List((String,List(ExToken))),Int32) |
| 2027 | 1974 | first_function |
| ... | ... | @@ -2041,36 +1988,32 @@ define (List((String,List(ExToken))),Int32) |
| 2041 | 1988 | |
| 2042 | 1989 | *** (3.2) Scenarii. |
| 2043 | 1990 | |
| 2044 | - As we saw previously, reductions using a grammar rule, occur only | |
| 2045 | - on top of stack. If the stack (as far as grammar symbols are | |
| 2046 | - concerned) is: | |
| 1991 | + As we saw previously, reductions using a grammar rule, occur only on top of stack. If | |
| 1992 | + the stack (as far as grammar symbols are concerned) is: | |
| 2047 | 1993 | |
| 2048 | 1994 | ... u |
| 2049 | 1995 | |
| 2050 | - i.e. if it ends by u (a sequence of grammar symbols), and if there | |
| 2051 | - is a production of the form: | |
| 1996 | + i.e. if it ends by u (a sequence of grammar symbols), and if there is a production of | |
| 1997 | + the form: | |
| 2052 | 1998 | |
| 2053 | 1999 | A -> uv |
| 2054 | 2000 | |
| 2055 | - then it is possible that after having read an instance of v, we | |
| 2056 | - reduce using that rule. Furthermore, the automaton is able to look | |
| 2057 | - at the next token to be read (it has one token of | |
| 2058 | - 'lookahead'). This helps to make decisions, as we will see later, | |
| 2059 | - using precedence and association rules. In particular, the | |
| 2060 | - automaton knows which token is allowded as the lookahead for a | |
| 2061 | - given reduction. | |
| 2001 | + then it is possible that after having read an instance of v, we reduce using that | |
| 2002 | + rule. Furthermore, the automaton is able to look at the next token to be read (it has | |
| 2003 | + one token of 'lookahead'). This helps to make decisions, as we will see later, using | |
| 2004 | + precedence and association rules. In particular, the automaton knows which token is | |
| 2005 | + allowded as the lookahead for a given reduction. | |
| 2062 | 2006 | |
| 2063 | - Hence, we introduce the notion of a scenario. A 'scenario' is a pair, | |
| 2064 | - denoted (in these explanations): | |
| 2007 | + Hence, we introduce the notion of a scenario. A 'scenario' is a pair, denoted (in these | |
| 2008 | + explanations): | |
| 2065 | 2009 | |
| 2066 | 2010 | (A ->u.v , (a_1,...,a_k)) |
| 2067 | 2011 | |
| 2068 | - where A -> uv is a production (whose right hand side has been split | |
| 2069 | - into two parts u and v, separated by a dot, where u and/or v may be | |
| 2070 | - empty), and where (a_1,...,a_k) is a non empty set of tokens. | |
| 2012 | + where A -> uv is a production (whose right hand side has been split into two parts u | |
| 2013 | + and v, separated by a dot, where u and/or v may be empty), and where (a_1,...,a_k) is a | |
| 2014 | + non empty set of tokens. | |
| 2071 | 2015 | |
| 2072 | - In the case of our example grammar, here are all the possible | |
| 2073 | - left part of scenarii: | |
| 2016 | + In the case of our example grammar, here are all the possible left part of scenarii: | |
| 2074 | 2017 | |
| 2075 | 2018 | S -> .A |
| 2076 | 2019 | S -> A. |
| ... | ... | @@ -2081,35 +2024,31 @@ define (List((String,List(ExToken))),Int32) |
| 2081 | 2024 | A -> A.A |
| 2082 | 2025 | A -> AA. |
| 2083 | 2026 | |
| 2084 | - That a scenario (A -> u.v, E) is 'possible' in some state s means | |
| 2085 | - that the top of stack is described by u (one slot for one symbol), | |
| 2086 | - and that reduction using the given grammar rule may occur if the | |
| 2087 | - lookahead token (at the time the reduction takes place) belongs to | |
| 2088 | - 'E'. | |
| 2027 | + That a scenario (A -> u.v, E) is 'possible' in some state s means that the top of stack | |
| 2028 | + is described by u (one slot for one symbol), and that reduction using the given grammar | |
| 2029 | + rule may occur if the lookahead token (at the time the reduction takes place) belongs | |
| 2030 | + to 'E'. | |
| 2089 | 2031 | |
| 2090 | - It is clear that, the grammar being given as a finite set of rules | |
| 2091 | - (and a finite sets of tokens), there is only a finite number of | |
| 2092 | - scenarii. | |
| 2032 | + It is clear that, the grammar being given as a finite set of rules (and a finite sets | |
| 2033 | + of tokens), there is only a finite number of scenarii. | |
| 2093 | 2034 | |
| 2094 | 2035 | Two scenarii: |
| 2095 | 2036 | |
| 2096 | 2037 | (A -> u.v , E) |
| 2097 | 2038 | (B -> w.t , F) |
| 2098 | 2039 | |
| 2099 | - are called 'compatible' if either u is a postfix of w, or w a | |
| 2100 | - postfix of u. This simply means that there exists a stack for which | |
| 2101 | - the two scenarii are possible. The top of that stack must have | |
| 2102 | - the longuest of u and w on its top. | |
| 2040 | + are called 'compatible' if either u is a postfix of w, or w a postfix of u. This simply | |
| 2041 | + means that there exists a stack for which the two scenarii are possible. The top of | |
| 2042 | + that stack must have the longuest of u and w on its top. | |
| 2103 | 2043 | |
| 2104 | 2044 | Two scenarii: |
| 2105 | 2045 | |
| 2106 | 2046 | (A -> u.v , E) |
| 2107 | 2047 | (A -> u.v , F) |
| 2108 | 2048 | |
| 2109 | - are called 'similar' if they have the same left part (same | |
| 2110 | - production splitted at the same place). They differ only by the | |
| 2111 | - sets of tokens E and F. Two such scenarii may be joined together into | |
| 2112 | - the unique scenario: | |
| 2049 | + are called 'similar' if they have the same left part (same production splitted at the | |
| 2050 | + same place). They differ only by the sets of tokens E and F. Two such scenarii may be | |
| 2051 | + joined together into the unique scenario: | |
| 2113 | 2052 | |
| 2114 | 2053 | (A -> u.v , G) |
| 2115 | 2054 | |
| ... | ... | @@ -2125,9 +2064,9 @@ type Scenario: |
| 2125 | 2064 | List(ExToken), // E |
| 2126 | 2065 | Maybe(Int32)). // precedence level of grammar rule |
| 2127 | 2066 | |
| 2128 | - 'u' is stored in reverse order, because the most common operation | |
| 2129 | - is to kake the head of 'v' and put it in front of 'u', so that the | |
| 2130 | - dot in the scenario advances past one grammar symbol. | |
| 2067 | + 'u' is stored in reverse order, because the most common operation is to kake the head | |
| 2068 | + of 'v' and put it in front of 'u', so that the dot in the scenario advances past one | |
| 2069 | + grammar symbol. | |
| 2131 | 2070 | |
| 2132 | 2071 | |
| 2133 | 2072 | |
| ... | ... | @@ -2137,30 +2076,25 @@ type Scenario: |
| 2137 | 2076 | |
| 2138 | 2077 | *** (3.3) States. |
| 2139 | 2078 | |
| 2140 | - A state of our automaton is a finite set of two by two compatible | |
| 2141 | - scenarii, which does not contain any two similar | |
| 2142 | - scenarii. Intuitively, the scenarii in a state are simply those | |
| 2143 | - which are still possible in this state. | |
| 2079 | + A state of our automaton is a finite set of two by two compatible scenarii, which does | |
| 2080 | + not contain any two similar scenarii. Intuitively, the scenarii in a state are simply | |
| 2081 | + those which are still possible in this state. | |
| 2144 | 2082 | |
| 2145 | - The 'core' of a state is what remains if we ignore | |
| 2146 | - lookaheads. States which do not differ by the core are called | |
| 2147 | - 'similar'. | |
| 2083 | + The 'core' of a state is what remains if we ignore lookaheads. States which do not | |
| 2084 | + differ by the core are called 'similar'. | |
| 2148 | 2085 | |
| 2149 | - Could'nt we consider similar states as equivalent ? The answer is | |
| 2150 | - no in theory. But the difference of behavior of the automaton in similar | |
| 2151 | - states is negligible in practice. This is the reason why we will | |
| 2152 | - identify similar states (merging lists of lookahead for similar | |
| 2153 | - scenarii). | |
| 2086 | + Could'nt we consider similar states as equivalent ? The answer is no in theory. But the | |
| 2087 | + difference of behavior of the automaton in similar states is negligible in | |
| 2088 | + practice. This is the reason why we will identify similar states (merging lists of | |
| 2089 | + lookahead for similar scenarii). | |
| 2154 | 2090 | |
| 2155 | - But let's see what the difference is really. Clearly, since similar | |
| 2156 | - states differ only by the lookaheads, the same shift and/or | |
| 2157 | - reduces may arise. The difference is only in the decision to make | |
| 2158 | - in case of a conflict. However, since the user has plenty of tools | |
| 2159 | - to influence such decisions, there is no need to make any | |
| 2160 | - distinction between similar states. | |
| 2091 | + But let's see what the difference is really. Clearly, since similar states differ only | |
| 2092 | + by the lookaheads, the same shift and/or reduces may arise. The difference is only in | |
| 2093 | + the decision to make in case of a conflict. However, since the user has plenty of tools | |
| 2094 | + to influence such decisions, there is no need to make any distinction between similar | |
| 2095 | + states. | |
| 2161 | 2096 | |
| 2162 | - Of course we represent states (up to a certain point) using the | |
| 2163 | - type 'List(Scenario)'. | |
| 2097 | + Of course we represent states (up to a certain point) using the type 'List(Scenario)'. | |
| 2164 | 2098 | |
| 2165 | 2099 | |
| 2166 | 2100 | |
| ... | ... | @@ -2185,9 +2119,8 @@ define Bool |
| 2185 | 2119 | (n1,u1,v1) = (n2,u2,v2). |
| 2186 | 2120 | |
| 2187 | 2121 | |
| 2188 | - The next function takes a scenario 's' and a state, and returns this | |
| 2189 | - state from which an eventual scenario similar to 's' has been | |
| 2190 | - dropped. | |
| 2122 | + The next function takes a scenario 's' and a state, and returns this state from which | |
| 2123 | + an eventual scenario similar to 's' has been dropped. | |
| 2191 | 2124 | |
| 2192 | 2125 | define Maybe(List(Scenario)) |
| 2193 | 2126 | drop_similar |
| ... | ... | @@ -2229,9 +2162,8 @@ define Bool |
| 2229 | 2162 | }. |
| 2230 | 2163 | |
| 2231 | 2164 | |
| 2232 | - The next function tests if a list if scenarii contains only | |
| 2233 | - scenarii with the splitting dot at the left end (i.e. in front of | |
| 2234 | - the right member of the rule). | |
| 2165 | + The next function tests if a list if scenarii contains only scenarii with the splitting | |
| 2166 | + dot at the left end (i.e. in front of the right member of the rule). | |
| 2235 | 2167 | |
| 2236 | 2168 | define Bool |
| 2237 | 2169 | has_only_front_dots |
| ... | ... | @@ -2251,9 +2183,8 @@ define Bool |
| 2251 | 2183 | }. |
| 2252 | 2184 | |
| 2253 | 2185 | |
| 2254 | - The next function tests if a given non saturated state has a | |
| 2255 | - saturated version similar to some saturated state. It does this | |
| 2256 | - without saturating the first state. | |
| 2186 | + The next function tests if a given non saturated state has a saturated version similar | |
| 2187 | + to some saturated state. It does this without saturating the first state. | |
| 2257 | 2188 | |
| 2258 | 2189 | define Bool |
| 2259 | 2190 | saturated_is_similar |
| ... | ... | @@ -2286,18 +2217,17 @@ define Bool |
| 2286 | 2217 | |
| 2287 | 2218 | (A -> u.Bv , E) |
| 2288 | 2219 | |
| 2289 | - (where B is a non terminal), it is possible that the next sequence | |
| 2290 | - of tokens to be read matches B. This means that, if B -> w is any | |
| 2291 | - B-production, the scenario | |
| 2220 | + (where B is a non terminal), it is possible that the next sequence of tokens to be read | |
| 2221 | + matches B. This means that, if B -> w is any B-production, the scenario | |
| 2292 | 2222 | |
| 2293 | 2223 | (B -> .w, ?) |
| 2294 | 2224 | |
| 2295 | - should also be possible in the same state. Now, what are the | |
| 2296 | - acceptable lookaheads for this scenario ? They are obviously all | |
| 2297 | - the tokens which may begin an instance of va, for any a in E. | |
| 2225 | + should also be possible in the same state. Now, what are the acceptable lookaheads for | |
| 2226 | + this scenario ? They are obviously all the tokens which may begin an instance of va, | |
| 2227 | + for any a in E. | |
| 2298 | 2228 | |
| 2299 | - This remark provides a procedure for 'saturating' states. A state | |
| 2300 | - is 'saturated' if whenever it contains: | |
| 2229 | + This remark provides a procedure for 'saturating' states. A state is 'saturated' if | |
| 2230 | + whenever it contains: | |
| 2301 | 2231 | |
| 2302 | 2232 | (A -> u.Bv , (a_1,...,a_k)) |
| 2303 | 2233 | |
| ... | ... | @@ -2308,8 +2238,8 @@ define Bool |
| 2308 | 2238 | |
| 2309 | 2239 | for all B-productions B -> w. |
| 2310 | 2240 | |
| 2311 | - In the sequel, we will compute saturated states, but states are | |
| 2312 | - often more conveniently represented by their non saturated version. | |
| 2241 | + In the sequel, we will compute saturated states, but states are often more conveniently | |
| 2242 | + represented by their non saturated version. | |
| 2313 | 2243 | |
| 2314 | 2244 | |
| 2315 | 2245 | Below is a function which computes union First(va_i): |
| ... | ... | @@ -2329,11 +2259,10 @@ define List(ExToken) |
| 2329 | 2259 | }. |
| 2330 | 2260 | |
| 2331 | 2261 | |
| 2332 | - The next function tests if a given state is similar to some state | |
| 2333 | - in a given list of states. This is needed for our saturation | |
| 2334 | - process, because we must not add to a state a scenario which | |
| 2335 | - already belongs (maybe in a similar form) to that state. Otherwise, | |
| 2336 | - our process would never end. | |
| 2262 | + The next function tests if a given state is similar to some state in a given list of | |
| 2263 | + states. This is needed for our saturation process, because we must not add to a state a | |
| 2264 | + scenario which already belongs (maybe in a similar form) to that state. Otherwise, our | |
| 2265 | + process would never end. | |
| 2337 | 2266 | |
| 2338 | 2267 | define Bool |
| 2339 | 2268 | already_present |
| ... | ... | @@ -2351,10 +2280,9 @@ define Bool |
| 2351 | 2280 | }. |
| 2352 | 2281 | |
| 2353 | 2282 | |
| 2354 | - The next function is given a (new) scenario to be inserted | |
| 2355 | - into a list of scenarii. If this list contains a similar scenario, | |
| 2356 | - the new scenario is just merged to that one. Otherwise, it is | |
| 2357 | - simply added to the list. | |
| 2283 | + The next function is given a (new) scenario to be inserted into a list of scenarii. If | |
| 2284 | + this list contains a similar scenario, the new scenario is just merged to that | |
| 2285 | + one. Otherwise, it is simply added to the list. | |
| 2358 | 2286 | |
| 2359 | 2287 | define List(Scenario) |
| 2360 | 2288 | insert_scenario |
| ... | ... | @@ -2375,8 +2303,8 @@ define List(Scenario) |
| 2375 | 2303 | }. |
| 2376 | 2304 | |
| 2377 | 2305 | |
| 2378 | - The next function extracts the symbols from the right hand side of | |
| 2379 | - a grammar rule (dropping the 'term' part). | |
| 2306 | + The next function extracts the symbols from the right hand side of a grammar rule | |
| 2307 | + (dropping the 'term' part). | |
| 2380 | 2308 | |
| 2381 | 2309 | define List(Symbol) |
| 2382 | 2310 | symbols |
| ... | ... | @@ -2391,9 +2319,8 @@ define List(Symbol) |
| 2391 | 2319 | }. |
| 2392 | 2320 | |
| 2393 | 2321 | |
| 2394 | - The following function adds to a given state 's', all the scenarii | |
| 2395 | - of the form (B -> .w , F), for all B-productions. The set of | |
| 2396 | - lookaheads F is given. | |
| 2322 | + The following function adds to a given state 's', all the scenarii of the form (B -> .w | |
| 2323 | + , F), for all B-productions. The set of lookaheads F is given. | |
| 2397 | 2324 | |
| 2398 | 2325 | define List(Scenario) |
| 2399 | 2326 | add_scenarii |
| ... | ... | @@ -2429,24 +2356,20 @@ define List(Scenario) |
| 2429 | 2356 | |
| 2430 | 2357 | |
| 2431 | 2358 | |
| 2432 | - The next function performs one step in the saturation of a | |
| 2433 | - state. This step consists in a loop on all scenarii in the | |
| 2434 | - state. The list l is the list of scenarii which have not yet been | |
| 2435 | - used for saturation, while 'all' is the set of all known scenarii | |
| 2436 | - in the state at any time. | |
| 2359 | + The next function performs one step in the saturation of a state. This step consists in | |
| 2360 | + a loop on all scenarii in the state. The list l is the list of scenarii which have not | |
| 2361 | + yet been used for saturation, while 'all' is the set of all known scenarii in the state | |
| 2362 | + at any time. | |
| 2437 | 2363 | |
| 2438 | - For each scenario ('sc1' below), of the form (A -> u.v , E), we | |
| 2439 | - first check the form of 'v'. If 'v' is empty the scenario does not | |
| 2440 | - participate to saturation, and we just re-enter the loop with the | |
| 2441 | - tail of 'l' instead of 'l'. | |
| 2364 | + For each scenario ('sc1' below), of the form (A -> u.v , E), we first check the form of | |
| 2365 | + 'v'. If 'v' is empty the scenario does not participate to saturation, and we just | |
| 2366 | + re-enter the loop with the tail of 'l' instead of 'l'. | |
| 2442 | 2367 | |
| 2443 | - If 'v' is not empty, it has a first symbol ('_B' below). This _B | |
| 2444 | - cannot be a $. If it is a token, the scenario does not participate | |
| 2445 | - to saturation, like above. | |
| 2368 | + If 'v' is not empty, it has a first symbol ('_B' below). This _B cannot be a $. If it | |
| 2369 | + is a token, the scenario does not participate to saturation, like above. | |
| 2446 | 2370 | |
| 2447 | - Now, if _B is a non terminal, we add to 'all' all the scenarii | |
| 2448 | - derived by the previous function from B-productions, and we continue | |
| 2449 | - our loop. | |
| 2371 | + Now, if _B is a non terminal, we add to 'all' all the scenarii derived by the previous | |
| 2372 | + function from B-productions, and we continue our loop. | |
| 2450 | 2373 | |
| 2451 | 2374 | define List(Scenario) |
| 2452 | 2375 | saturate_state_one_step |
| ... | ... | @@ -2487,8 +2410,8 @@ define List(Scenario) |
| 2487 | 2410 | }. |
| 2488 | 2411 | |
| 2489 | 2412 | |
| 2490 | - Now, saturating a state is just performing saturation steps until a | |
| 2491 | - step does not change the state any more. | |
| 2413 | + Now, saturating a state is just performing saturation steps until a step does not | |
| 2414 | + change the state any more. | |
| 2492 | 2415 | |
| 2493 | 2416 | define List(Scenario) |
| 2494 | 2417 | saturate_state |
| ... | ... | @@ -2511,69 +2434,56 @@ define List(Scenario) |
| 2511 | 2434 | |
| 2512 | 2435 | *** (3.6) The initial state. |
| 2513 | 2436 | |
| 2514 | - The non terminal S represents the totality of what we want to read | |
| 2515 | - from the input. More precisely, if the input is correct, it is an | |
| 2516 | - instance of S. Hence, since there is only one S-production S -> A, | |
| 2517 | - our reading (if successful) will end by a reduction via this rule, | |
| 2518 | - and it will be correct if and only if the lookahead token is the | |
| 2519 | - end marker: $. | |
| 2437 | + The non terminal S represents the totality of what we want to read from the input. More | |
| 2438 | + precisely, if the input is correct, it is an instance of S. Hence, since there is only | |
| 2439 | + one S-production S -> A, our reading (if successful) will end by a reduction via this | |
| 2440 | + rule, and it will be correct if and only if the lookahead token is the end marker: $. | |
| 2520 | 2441 | |
| 2521 | - Hence, at the beginning, there is obviously one and only one | |
| 2522 | - wanted scenario, which is: | |
| 2442 | + Hence, at the beginning, there is obviously one and only one wanted scenario, which is: | |
| 2523 | 2443 | |
| 2524 | 2444 | (S -> .A , ($)) |
| 2525 | 2445 | |
| 2526 | - This scenario (which will be called the 'initial scenario') needs | |
| 2527 | - to belong to the initial state. In fact, the initial state is | |
| 2528 | - simply the smallest saturated state which contains this | |
| 2529 | - scenario. In the case of our example, this saturated state will be | |
| 2530 | - (after two steps of saturation): | |
| 2446 | + This scenario (which will be called the 'initial scenario') needs to belong to the | |
| 2447 | + initial state. In fact, the initial state is simply the smallest saturated state which | |
| 2448 | + contains this scenario. In the case of our example, this saturated state will be (after | |
| 2449 | + two steps of saturation): | |
| 2531 | 2450 | |
| 2532 | 2451 | (S -> .A , ($)) |
| 2533 | 2452 | (A -> . , (a,$)) |
| 2534 | 2453 | (A -> .a , (a,$)) |
| 2535 | 2454 | (A -> .AA , (a,$)) |
| 2536 | 2455 | |
| 2537 | - Note that the rule S -> A appears only one time in the | |
| 2538 | - initial state since the state saturation process cannot produce a | |
| 2539 | - scenario using this rule. | |
| 2540 | - | |
| 2541 | - Now the state generation process will produce a state with the | |
| 2542 | - scenario (S -> A. , ($)). Obviously, we cannot have other scenarii | |
| 2543 | - using this rule. | |
| 2544 | - | |
| 2545 | - The state which contains the scenario (S -> A. , ($)) is our | |
| 2546 | - 'accepting state'. Indeed, the input has been read entirely only | |
| 2547 | - when we are on the point to reduce using this scenario. In that | |
| 2548 | - case the next token to be read is the end marker, and we 'accept' | |
| 2549 | - the input. | |
| 2550 | - | |
| 2551 | - However, we may have a reduce/reduce conflict with this | |
| 2552 | - scenario. It is the case in our example grammar. Indeed, in state | |
| 2553 | - 2 (see below), and if the next token to be read is the end marker, | |
| 2554 | - we may either reduce using the scenario (S -> A. , ($)) or the | |
| 2555 | - scenario (A -> . , (a,$)). Notice that it is not possible to have a | |
| 2556 | - shift/reduce conflict with scenario (S -> A. ,($)), because the | |
| 2557 | - token '$' cannot be shifted (it cannot appear in the right member | |
| 2558 | - of a rule). | |
| 2559 | - | |
| 2560 | - Of course the user cannot choose between these two reductions | |
| 2561 | - because he does'nt know about the existence of rule S -> A. | |
| 2562 | - | |
| 2563 | - Nevertheless, in that case, we avoid the conflict by reducing | |
| 2564 | - systematically using rule (S -> A. , ($)). This may be justified as | |
| 2565 | - follows. | |
| 2566 | - | |
| 2567 | - The initial state contains the initial scenario, and scenarii | |
| 2568 | - obtained by saturation, i.e. with the dot in front of the right | |
| 2569 | - member. Hence the accepting state may only contain the accepting | |
| 2570 | - scenario, scenarii of the form (? -> A.? , ?) (because we make a | |
| 2571 | - transition on A between the two states), and scenarii with the | |
| 2572 | - dot in front of the right member. Hence all scenarii in the | |
| 2573 | - accepting state have at most one symbol on the left of the | |
| 2574 | - dot. This means that if a reduce/reduce conflict arises between the | |
| 2575 | - accepting scenario and another scenario, this other scenario is | |
| 2576 | - either of the form: | |
| 2456 | + Note that the rule S -> A appears only one time in the initial state since the state | |
| 2457 | + saturation process cannot produce a scenario using this rule. | |
| 2458 | + | |
| 2459 | + Now the state generation process will produce a state with the scenario (S -> A. , | |
| 2460 | + ($)). Obviously, we cannot have other scenarii using this rule. | |
| 2461 | + | |
| 2462 | + The state which contains the scenario (S -> A. , ($)) is our 'accepting state'. Indeed, | |
| 2463 | + the input has been read entirely only when we are on the point to reduce using this | |
| 2464 | + scenario. In that case the next token to be read is the end marker, and we 'accept' the | |
| 2465 | + input. | |
| 2466 | + | |
| 2467 | + However, we may have a reduce/reduce conflict with this scenario. It is the case in our | |
| 2468 | + example grammar. Indeed, in state 2 (see below), and if the next token to be read is | |
| 2469 | + the end marker, we may either reduce using the scenario (S -> A. , ($)) or the scenario | |
| 2470 | + (A -> . , (a,$)). Notice that it is not possible to have a shift/reduce conflict with | |
| 2471 | + scenario (S -> A. ,($)), because the token '$' cannot be shifted (it cannot appear in | |
| 2472 | + the right member of a rule). | |
| 2473 | + | |
| 2474 | + Of course the user cannot choose between these two reductions because he does'nt know | |
| 2475 | + about the existence of rule S -> A. | |
| 2476 | + | |
| 2477 | + Nevertheless, in that case, we avoid the conflict by reducing systematically using rule | |
| 2478 | + (S -> A. , ($)). This may be justified as follows. | |
| 2479 | + | |
| 2480 | + The initial state contains the initial scenario, and scenarii obtained by saturation, | |
| 2481 | + i.e. with the dot in front of the right member. Hence the accepting state may only | |
| 2482 | + contain the accepting scenario, scenarii of the form (? -> A.? , ?) (because we make a | |
| 2483 | + transition on A between the two states), and scenarii with the dot in front of the | |
| 2484 | + right member. Hence all scenarii in the accepting state have at most one symbol on the | |
| 2485 | + left of the dot. This means that if a reduce/reduce conflict arises between the | |
| 2486 | + accepting scenario and another scenario, this other scenario is either of the form: | |
| 2577 | 2487 | |
| 2578 | 2488 | (B -> . , ($ ...)) |
| 2579 | 2489 | |
| ... | ... | @@ -2584,9 +2494,9 @@ define List(Scenario) |
| 2584 | 2494 | In the first case, ??? |
| 2585 | 2495 | |
| 2586 | 2496 | |
| 2587 | - The following function constructs the non saturated initial state | |
| 2588 | - for a given grammar. It simply looks for the unique S-production, | |
| 2589 | - and constructs state 0 containing the unique initial scenario. | |
| 2497 | + The following function constructs the non saturated initial state for a given | |
| 2498 | + grammar. It simply looks for the unique S-production, and constructs state 0 containing | |
| 2499 | + the unique initial scenario. | |
| 2590 | 2500 | |
| 2591 | 2501 | define List(Scenario) |
| 2592 | 2502 | initial_state |
| ... | ... | @@ -2611,20 +2521,18 @@ define List(Scenario) |
| 2611 | 2521 | |
| 2612 | 2522 | *** (3.7) Transitions. |
| 2613 | 2523 | |
| 2614 | - Of course our automaton has transitions. It has two kinds of | |
| 2615 | - transitions: those which result from the reading of a token, and | |
| 2616 | - those which result from the reduction via a rule, after a sequence | |
| 2617 | - of tokens has been read which is an instance of the right side of | |
| 2618 | - this rule. The first ones are labelled by tokens, while the others | |
| 2619 | - are labelled by non terminals. | |
| 2524 | + Of course our automaton has transitions. It has two kinds of transitions: those which | |
| 2525 | + result from the reading of a token, and those which result from the reduction via a | |
| 2526 | + rule, after a sequence of tokens has been read which is an instance of the right side | |
| 2527 | + of this rule. The first ones are labelled by tokens, while the others are labelled by | |
| 2528 | + non terminals. | |
| 2620 | 2529 | |
| 2621 | 2530 | If in some state, we have the scenario: |
| 2622 | 2531 | |
| 2623 | 2532 | (A -> u.av , E) |
| 2624 | 2533 | |
| 2625 | - (where 'a' is a token) then, if the next token to be read is 'a', | |
| 2626 | - it is clear that the transition will be performed to a state | |
| 2627 | - containing the scenario: | |
| 2534 | + (where 'a' is a token) then, if the next token to be read is 'a', it is clear that the | |
| 2535 | + transition will be performed to a state containing the scenario: | |
| 2628 | 2536 | |
| 2629 | 2537 | (A -> ua.v , E) |
| 2630 | 2538 | |
| ... | ... | @@ -2634,15 +2542,14 @@ define List(Scenario) |
| 2634 | 2542 | |
| 2635 | 2543 | (A -> u.Bv , E) |
| 2636 | 2544 | |
| 2637 | - and if, after reading some tokens, we reduce via this B-production and | |
| 2638 | - return to this state, we will have to make a transition to a state | |
| 2639 | - containing: | |
| 2545 | + and if, after reading some tokens, we reduce via this B-production and return to this | |
| 2546 | + state, we will have to make a transition to a state containing: | |
| 2640 | 2547 | |
| 2641 | 2548 | (A -> uB.v , E) |
| 2642 | 2549 | |
| 2643 | 2550 | (E again unchanged). |
| 2644 | 2551 | |
| 2645 | - All our transitions will occur in one of these two situations. | |
| 2552 | + All our transitions will occur in one of these two situations. | |
| 2646 | 2553 | |
| 2647 | 2554 | |
| 2648 | 2555 | |
| ... | ... | @@ -2654,11 +2561,12 @@ define List(Scenario) |
| 2654 | 2561 | |
| 2655 | 2562 | *** (3.8) Generating the states. |
| 2656 | 2563 | |
| 2657 | - Which states do we needs ? We need the initial state, and all the | |
| 2658 | - states which are reachable from it via one of the two above kinds | |
| 2659 | - of transitions. This gives the method for generating states. | |
| 2564 | + Which states do we needs ? We need the initial state, and all the states which are | |
| 2565 | + reachable from it via one of the two above kinds of transitions. This gives the method | |
| 2566 | + for generating states. | |
| 2660 | 2567 | |
| 2661 | 2568 | (1) when creating a new state, saturate it, |
| 2569 | + | |
| 2662 | 2570 | (2) for each symbol for which there are scenarii in the state with |
| 2663 | 2571 | this symbol after the dot, construct the state needed for the |
| 2664 | 2572 | corresponding transition. |
| ... | ... | @@ -2728,8 +2636,8 @@ define List(Scenario) |
| 2728 | 2636 | *** (3.9) Making the automaton. |
| 2729 | 2637 | |
| 2730 | 2638 | |
| 2731 | - The following function takes a scenario (A -> u.Xv , E), where X is | |
| 2732 | - any grammar symbol, and a list of lists of scenarii of the form: | |
| 2639 | + The following function takes a scenario (A -> u.Xv , E), where X is any grammar symbol, | |
| 2640 | + and a list of lists of scenarii of the form: | |
| 2733 | 2641 | |
| 2734 | 2642 | [ |
| 2735 | 2643 | [ |
| ... | ... | @@ -2740,17 +2648,14 @@ define List(Scenario) |
| 2740 | 2648 | ... |
| 2741 | 2649 | ] |
| 2742 | 2650 | |
| 2743 | - i.e. such that in each list (called a 'class'), the scenarii (? -> | |
| 2744 | - u.? , ?) have the same symbol as the last one in 'u' (i.e. the | |
| 2745 | - first one in our representation, since 'u' is stored in reverse | |
| 2746 | - order). The class above is said ''corresponding to Y''. | |
| 2651 | + i.e. such that in each list (called a 'class'), the scenarii (? -> u.? , ?) have the | |
| 2652 | + same symbol as the last one in 'u' (i.e. the first one in our representation, since 'u' | |
| 2653 | + is stored in reverse order). The class above is said ''corresponding to Y''. | |
| 2747 | 2654 | |
| 2748 | - The function looks for a class corresponding to X. If it exists the | |
| 2749 | - scenario is added to this class, after its dot has been put past | |
| 2750 | - X. Otherwise, it makes a new class. | |
| 2655 | + The function looks for a class corresponding to X. If it exists the scenario is added | |
| 2656 | + to this class, after its dot has been put past X. Otherwise, it makes a new class. | |
| 2751 | 2657 | |
| 2752 | - If the scenario has no symbol after the dot, it is not classified | |
| 2753 | - at all. | |
| 2658 | + If the scenario has no symbol after the dot, it is not classified at all. | |
| 2754 | 2659 | |
| 2755 | 2660 | define List(List(Scenario)) |
| 2756 | 2661 | classify |
| ... | ... | @@ -2794,14 +2699,13 @@ define List(List(Scenario)) |
| 2794 | 2699 | |
| 2795 | 2700 | |
| 2796 | 2701 | |
| 2797 | - The function 'next_states' takes a state 'state', and produces the | |
| 2798 | - list of all states which may be reached from 'state' via a single | |
| 2799 | - transition (either on shifting a token or after reduction to a non | |
| 2800 | - terminal). | |
| 2702 | + The function 'next_states' takes a state 'state', and produces the list of all states | |
| 2703 | + which may be reached from 'state' via a single transition (either on shifting a token | |
| 2704 | + or after reduction to a non terminal). | |
| 2801 | 2705 | |
| 2802 | - It works as follows. It partitions 'state' so that each element of | |
| 2803 | - the partition has scenarii with the same symbol after the dot. Then | |
| 2804 | - the dot is put past this symbol. For example, if 'state' is: | |
| 2706 | + It works as follows. It partitions 'state' so that each element of the partition has | |
| 2707 | + scenarii with the same symbol after the dot. Then the dot is put past this symbol. For | |
| 2708 | + example, if 'state' is: | |
| 2805 | 2709 | |
| 2806 | 2710 | [ |
| 2807 | 2711 | (A -> u.av , E) |
| ... | ... | @@ -2822,10 +2726,10 @@ define List(List(Scenario)) |
| 2822 | 2726 | ] |
| 2823 | 2727 | |
| 2824 | 2728 | |
| 2825 | - The next function takes a (non saturated) state, and computes the | |
| 2826 | - list of all (non saturated) states which may be the target of a | |
| 2827 | - transition (either on a token or on a non terminal) from that | |
| 2828 | - state. It transforms a state into a set of classes like the above. | |
| 2729 | + The next function takes a (non saturated) state, and computes the list of all (non | |
| 2730 | + saturated) states which may be the target of a transition (either on a token or on a | |
| 2731 | + non terminal) from that state. It transforms a state into a set of classes like the | |
| 2732 | + above. | |
| 2829 | 2733 | |
| 2830 | 2734 | define List(List(Scenario)) |
| 2831 | 2735 | next_states |
| ... | ... | @@ -2843,12 +2747,11 @@ define List(List(Scenario)) |
| 2843 | 2747 | |
| 2844 | 2748 | |
| 2845 | 2749 | |
| 2846 | - Now, in order to compute our automaton (of type | |
| 2847 | - 'List(List(Scenario))'), we must start with the initial non | |
| 2848 | - saturated state and add 'next' states until no more state may be | |
| 2849 | - added. Of course, we add states only if they are not already | |
| 2850 | - present in the automaton. More presisely, if there is a similar | |
| 2851 | - state in the automaton, we must merge those two states. | |
| 2750 | + Now, in order to compute our automaton (of type 'List(List(Scenario))'), we must start | |
| 2751 | + with the initial non saturated state and add 'next' states until no more state may be | |
| 2752 | + added. Of course, we add states only if they are not already present in the | |
| 2753 | + automaton. More presisely, if there is a similar state in the automaton, we must merge | |
| 2754 | + those two states. | |
| 2852 | 2755 | |
| 2853 | 2756 | Here is how we merge states. |
| 2854 | 2757 | |
| ... | ... | @@ -2909,8 +2812,7 @@ define List(List(Scenario)) |
| 2909 | 2812 | }. |
| 2910 | 2813 | |
| 2911 | 2814 | |
| 2912 | - At each step of the construction of our automaton, we have two | |
| 2913 | - lists: | |
| 2815 | + At each step of the construction of our automaton, we have two lists: | |
| 2914 | 2816 | |
| 2915 | 2817 | - the list 'have_next' of those states for which next states |
| 2916 | 2818 | have been already constructed, |
| ... | ... | @@ -2997,8 +2899,8 @@ define List(List(Scenario)) |
| 2997 | 2899 | |
| 2998 | 2900 | *** (4.1) Numbering states and adding transitions lists. |
| 2999 | 2901 | |
| 3000 | - Now that our states are established, we need to rework them. Here | |
| 3001 | - are the operations performed: | |
| 2902 | + Now that our states are established, we need to rework them. Here are the operations | |
| 2903 | + performed: | |
| 3002 | 2904 | |
| 3003 | 2905 | - Put an identifying number on each state (beginning at 0) |
| 3004 | 2906 | |
| ... | ... | @@ -3011,7 +2913,7 @@ type IntermediateState: |
| 3011 | 2913 | List((Symbol,Int32)) transitions). |
| 3012 | 2914 | |
| 3013 | 2915 | |
| 3014 | - The next function just add numbers identifying states. | |
| 2916 | + The next function just add numbers identifying states. | |
| 3015 | 2917 | |
| 3016 | 2918 | define List(IntermediateState) |
| 3017 | 2919 | number |
| ... | ... | @@ -3027,8 +2929,8 @@ define List(IntermediateState) |
| 3027 | 2929 | }. |
| 3028 | 2930 | |
| 3029 | 2931 | |
| 3030 | - The next function gives the number identifying a non saturated | |
| 3031 | - state in a list of intermediate states. | |
| 2932 | + The next function gives the number identifying a non saturated state in a list of | |
| 2933 | + intermediate states. | |
| 3032 | 2934 | |
| 3033 | 2935 | define Int32 |
| 3034 | 2936 | find_id |
| ... | ... | @@ -3047,11 +2949,10 @@ define Int32 |
| 3047 | 2949 | }. |
| 3048 | 2950 | |
| 3049 | 2951 | |
| 3050 | - The next function takes a class (a list of scenarii with the same | |
| 3051 | - grammar symbol Y before the dot) and an automaton in the form os a | |
| 3052 | - list of intermediate states, and returns the pair (Y,n), where Y is the | |
| 3053 | - previous grammar symbol and n the integer identifying that class in | |
| 3054 | - the automaton. | |
| 2952 | + The next function takes a class (a list of scenarii with the same grammar symbol Y | |
| 2953 | + before the dot) and an automaton in the form os a list of intermediate states, and | |
| 2954 | + returns the pair (Y,n), where Y is the previous grammar symbol and n the integer | |
| 2955 | + identifying that class in the automaton. | |
| 3055 | 2956 | |
| 3056 | 2957 | |
| 3057 | 2958 | define (Symbol,Int32) |
| ... | ... | @@ -3075,10 +2976,9 @@ define (Symbol,Int32) |
| 3075 | 2976 | |
| 3076 | 2977 | |
| 3077 | 2978 | |
| 3078 | - The following function takes a partition of a state (in the form of | |
| 3079 | - a list of classes), an automaton (in the form of a list of | |
| 3080 | - intermediate states), and returns a list of pairs (X,n) saying ``if | |
| 3081 | - transition is on X, then go to state n''. | |
| 2979 | + The following function takes a partition of a state (in the form of a list of classes), | |
| 2980 | + an automaton (in the form of a list of intermediate states), and returns a list of | |
| 2981 | + pairs (X,n) saying ``if transition is on X, then go to state n''. | |
| 3082 | 2982 | |
| 3083 | 2983 | define List((Symbol,Int32)) |
| 3084 | 2984 | make_transitions |
| ... | ... | @@ -3099,8 +2999,7 @@ define List((Symbol,Int32)) |
| 3099 | 2999 | |
| 3100 | 3000 | |
| 3101 | 3001 | |
| 3102 | - The next function adds transitions to all intermediate states in | |
| 3103 | - our automaton. | |
| 3002 | + The next function adds transitions to all intermediate states in our automaton. | |
| 3104 | 3003 | |
| 3105 | 3004 | define List(IntermediateState) |
| 3106 | 3005 | add_transitions |
| ... | ... | @@ -3143,15 +3042,15 @@ define List(IntermediateState) |
| 3143 | 3042 | |
| 3144 | 3043 | ( A-> u.v , E) |
| 3145 | 3044 | |
| 3146 | - and if v is not empty, E is no more needed. Such a scenario is | |
| 3147 | - called a 'shifting' scenario, because it will cause the shifting of | |
| 3148 | - either a token or of an instance of a non terminal. | |
| 3045 | + and if v is not empty, E is no more needed. Such a scenario is called a 'shifting' | |
| 3046 | + scenario, because it will cause the shifting of either a token or of an instance of a | |
| 3047 | + non terminal. | |
| 3149 | 3048 | |
| 3150 | - On the contrary, scenarii of the form | |
| 3049 | + On the contrary, scenarii of the form | |
| 3151 | 3050 | |
| 3152 | 3051 | (A -> u. , E) |
| 3153 | 3052 | |
| 3154 | - are called 'reducing' scenarii, because they call for a reduction. | |
| 3053 | + are called 'reducing' scenarii, because they call for a reduction. | |
| 3155 | 3054 | |
| 3156 | 3055 | |
| 3157 | 3056 | type NonEmptyList($T): |
| ... | ... | @@ -3184,12 +3083,11 @@ type NewState: |
| 3184 | 3083 | List(Conflict) conflicts). |
| 3185 | 3084 | |
| 3186 | 3085 | |
| 3187 | - Given an automaton in the form of a list of intermediate states, we | |
| 3188 | - transform it into an automaton in the form of a list of new | |
| 3189 | - states. This is a state by state operation. | |
| 3086 | + Given an automaton in the form of a list of intermediate states, we transform it into | |
| 3087 | + an automaton in the form of a list of new states. This is a state by state operation. | |
| 3190 | 3088 | |
| 3191 | - The next function checks if a precedence level may be deduced from | |
| 3192 | - the right member of the rule. | |
| 3089 | + The next function checks if a precedence level may be deduced from the right member of | |
| 3090 | + the rule. | |
| 3193 | 3091 | |
| 3194 | 3092 | |
| 3195 | 3093 | define Maybe(Int32) |
| ... | ... | @@ -3235,8 +3133,8 @@ define Maybe(Int32) |
| 3235 | 3133 | |
| 3236 | 3134 | |
| 3237 | 3135 | |
| 3238 | - For each state, we just need to separate the list of scenarii, and | |
| 3239 | - slightly rearrange each of them. | |
| 3136 | + For each state, we just need to separate the list of scenarii, and slightly rearrange | |
| 3137 | + each of them. | |
| 3240 | 3138 | |
| 3241 | 3139 | define (List(ReducingScenario),List(ShiftingScenario)) |
| 3242 | 3140 | separate |
| ... | ... | @@ -3262,9 +3160,8 @@ define (List(ReducingScenario),List(ShiftingScenario)) |
| 3262 | 3160 | }. |
| 3263 | 3161 | |
| 3264 | 3162 | |
| 3265 | - The next function establishes the list of conflict in a given | |
| 3266 | - state, from the two lists of reducing scenarii and shifting | |
| 3267 | - scenarii. | |
| 3163 | + The next function establishes the list of conflict in a given state, from the two lists | |
| 3164 | + of reducing scenarii and shifting scenarii. | |
| 3268 | 3165 | |
| 3269 | 3166 | |
| 3270 | 3167 | define List($T) |
| ... | ... | @@ -3409,10 +3306,9 @@ define Int32 |
| 3409 | 3306 | *** (4.3) Making decisions. |
| 3410 | 3307 | |
| 3411 | 3308 | |
| 3412 | - We will now examine our states to decide what to do in the presence | |
| 3413 | - of a given lookahead. In other words, we must construct our | |
| 3414 | - 'action' function. We continue with the same example. We record all | |
| 3415 | - possibilities in the following table: | |
| 3309 | + We will now examine our states to decide what to do in the presence of a given | |
| 3310 | + lookahead. In other words, we must construct our 'action' function. We continue with | |
| 3311 | + the same example. We record all possibilities in the following table: | |
| 3416 | 3312 | |
| 3417 | 3313 | | a $ |
| 3418 | 3314 | --+------------------------- |
| ... | ... | @@ -3421,15 +3317,13 @@ define Int32 |
| 3421 | 3317 | 2 | s1/r2 r1/r2 |
| 3422 | 3318 | 3 | s1/r2/r4 r2/r4 |
| 3423 | 3319 | |
| 3424 | - Indeed, in state 0, if we see an 'a' we may either shift and go to | |
| 3425 | - state 1, or reduce using rule 2 (A -> ). If we see a '$' we can | |
| 3426 | - only reduce using rule 2. In state 1, we can only reduce using rule | |
| 3427 | - 3 (A -> a). In state 2, if we see 'a', we ca shift and go to state | |
| 3428 | - 1, or reduce using rule 2 (A -> ). If we see a '$' we can reduce | |
| 3429 | - using either rule 1 (S -> A) or rule 2 (A -> ). In state 3, if we | |
| 3430 | - see 'a', we can shift and go to state 1, or reduce using either | |
| 3431 | - rule 2 (A -> ) or rule 4 (A -> AA). If we see '$', we can reduce | |
| 3432 | - using either rule 2 or rule 4. | |
| 3320 | + Indeed, in state 0, if we see an 'a' we may either shift and go to state 1, or reduce | |
| 3321 | + using rule 2 (A -> ). If we see a '$' we can only reduce using rule 2. In state 1, we | |
| 3322 | + can only reduce using rule 3 (A -> a). In state 2, if we see 'a', we ca shift and go to | |
| 3323 | + state 1, or reduce using rule 2 (A -> ). If we see a '$' we can reduce using either | |
| 3324 | + rule 1 (S -> A) or rule 2 (A -> ). In state 3, if we see 'a', we can shift and go to | |
| 3325 | + state 1, or reduce using either rule 2 (A -> ) or rule 4 (A -> AA). If we see '$', we | |
| 3326 | + can reduce using either rule 2 or rule 4. | |
| 3433 | 3327 | |
| 3434 | 3328 | Hence, as expected, the example grammar is highly ambiguous. |
| 3435 | 3329 | |
| ... | ... | @@ -3517,8 +3411,8 @@ define Int32 |
| 3517 | 3411 | |
| 3518 | 3412 | |
| 3519 | 3413 | |
| 3520 | - Finally, here is a tool to print a 'first function'. We begin by a | |
| 3521 | - function printing a list of extended tokens. | |
| 3414 | + Finally, here is a tool to print a 'first function'. We begin by a function printing a | |
| 3415 | + list of extended tokens. | |
| 3522 | 3416 | |
| 3523 | 3417 | define One |
| 3524 | 3418 | |
| ... | ... | @@ -3923,14 +3817,6 @@ define One |
| 3923 | 3817 | } |
| 3924 | 3818 | }. |
| 3925 | 3819 | |
| 3926 | - define One | |
| 3927 | ||
| 3928 | - ( | |
| 3929 | - WAddr(Int8) file, | |
| 3930 | - String s | |
| 3931 | - ) = | |
| 3932 | - print(file,s,0). | |
| 3933 | - | |
| 3934 | 3820 | define One |
| 3935 | 3821 | trace_body |
| 3936 | 3822 | ( |
| ... | ... | @@ -4011,9 +3897,8 @@ define One |
| 4011 | 3897 | |
| 4012 | 3898 | read trace_apg.anubis |
| 4013 | 3899 | |
| 4014 | - The function 'make_parser' receives the grammar read from the | |
| 4015 | - source file (together with its name, its precedence and association | |
| 4016 | - rules), and also the two output files. | |
| 3900 | + The function 'make_parser' receives the grammar read from the source file (together | |
| 3901 | + with its name, its precedence and association rules), and also the two output files. | |
| 4017 | 3902 | |
| 4018 | 3903 | define Maybe(One) |
| 4019 | 3904 | make_parser | ... | ... |