| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592 |
- BTYACC -- backtracking yacc
- ===========================
- BTYACC was created by Chris Dodd using ideas from many
- places and lots of code from the Berkeley Yacc
- distribution, which is a public domain yacc clone put
- together by the good folks at Berkeley. This code is
- distributed with NO WARRANTEE and is public domain.
- It is certain to contain bugs, which you should
- report to: [email protected].
- Vadim Maslov of Siber Systems <[email protected]>
- considerably modified BTYACC to make it suitable
- for production environment.
- Several people have suggested bug fixes that
- were incorporated into BtYacc.
- See the README.BYACC files for more about
- Berkeley Yacc and other sources of info.
- http://www.siber.com/btyacc/ is the current home of BtYacc.
- It is provided courtesy of Siber Systems http://www.siber.com/.
- Version 3.0 changes
- -------------------
- by Vadim Maslov
- Changes mostly occurred in btyaccpa.ske file that
- contains the parsing shift/reduce/backtrack algorithm.
- Version 3.0 innovations focus on:
- - text position computation and propagation,
- - industrial-strength error processing and recovery.
- ** Added mechanism for computing and propagating
- text position of tokens and non-terminals.
- Compilers often need to build AST trees such that every node
- in a tree can relate to the parsed program source it came from.
- The following applications are very likely to need this:
- - debuggers that show actual source of the debugged program,
- - source-to-source translators that want
- unchanged parts of the tree to generate the unchanged code.
- The new YYPOSN mechanism added in this version of BtYacc
- helps you in automating the text position computation
- and in assigning the computed text positions to the AST.
- This mechanism is successfully used in commercial
- parsers and source-to-source translators.
- In standard Yaccs every token and every non-terminal
- has an YYSTYPE semantic value attached to it.
- In this new version every token and every non-terminal
- also has an YYPOSN text position attached to it.
- YYPOSN is a user-defined type that can be anything and
- that has a meaning of text position attached to
- token or non-terminal.
- In addition to semantic value stack BtYacc now maintains
- text position stack. Behavior of the text position stack
- is similar to the behavior of the semantic value stack.
- If using text position mechanism,
- you need to define the following:
- YYPOSN Preprocessor variable that contains C/C++ type of
- the text position attached to
- every token and non-terminal.
- yyposn Global variable of type YYPOSN.
- The lexer must assign text position of
- the returned token to yyposn, just like it assigns
- semantic value of the returned token to yylval.
- YYREDUCEPOSNFUNC
- Preprocessor variable that points to unction that
- is called after the grammar rule reduction
- to reduce text positions located on the stack.
-
- This function is called by BtYacc to reduce text
- positions. The function is called immediately after
- the regular rule reduction occurs.
-
- The function has the following prototype:
- void ReducePosn(YYPOSN &ret,
- YYPOSN *terms,
- YYSTYPE *term_vals,
- int term_no,
- int stk_pos,
- int yychar,
- YYPOSN &yyposn,
- UserType extra);
- The function arguments are:
- - ret
- Reference to the text position returned by
- the rule. The function must write the computed
- text position returned by the rule to ret.
- This is analogue of the $$ semantic value.
- - term_posns
- Array of the right-hand side rule components
- YYPOSN text positions. These are analogues of
- $1, $2, ..., $N in the text position world.
- - term_vals
- Array of the right-hand side (RHS) rule components
- YYSTYPE values. These are the $1,...,$N themselves.
- - term_no
- Number of the components in RHS of the reduced rule.
- Equal to size of arrays term_posns and term_vals.
- Also equal to N in $1,...,$N in the reduced rule.
- - stk_pos
- YYSTYPE/YYPOSN stack position before the reduction.
- - yychar
- Lookahead token that immediately follows
- the reduced RHS components.
- - yyposn
- YYPOSN of the token that immediately follows
- the reduced RHS components.
- - extra
- User-defined extra argument passed to ReducePosn.
-
- Typically this function extracts text positions from
- the right-hand side rule components and either
- assigns them to the returned $$ structure/tree or
- if no $$ value is returned, puts them into
- the ret text position from where
- it will be picked up by the later reduced rules.
-
- YYREDUCEPOSNFUNCARG
- Extra user-defined argument passed to
- the ReducePosn function. This argument can use
- any variables defined in btyaccpa.ske.
- ** Added code to btyaccpa.ske that automatically cleans up
- semantic semantic values and text positions of tokens
- and non-terminals that are discarded and deleted as
- a result of error processing.
- In the previous versions the discarded token and non-terminal
- semantic values were not cleaned that caused quite severe
- leaks. The only way to fix it was to add garbage collection
- to YYSTYPE class.
- Now BtYacc skeleton calls delete functions for semantic
- values and positions of the discarded tokens and
- non-terminals.
- You need to define the following functions that BtYacc
- calls when it needs to delete semantic value or text position.
- YYDELETEVAL
- User-defined function that is called by BtYacc
- to delete semantic value of the token or non-terminal.
-
- The user-defined function must have the prototype:
- void DeleteYYval(YYSTYPE v, int type);
- v is semantic value to delete,
- type is one of the following:
- 0 discarding token
- 1 discarding state
- 2 cleaning up stack when aborting
- YYDELETEPOSN
- User-defined function that is called by BtYacc
- to delete text position of the token or non-terminal.
- The user-defined function must have the prototype:
- void DeleteYYposn(YYPOSN p, int type);
- v is semantic value to delete,
- type is one of the following:
- 0 discarding token
- 1 discarding state
- 2 cleaning up stack when aborting
- ** User can define "detailed" syntax error processing
- function that reports an *exact* position of
- the token that caused the error.
- If you define preprocessor variable YYERROR_DETAILED in
- your grammar then you need define the following
- error processing function:
- void yyerror_detailed(char *text,
- int errt,
- YYSTYPE &errt_value,
- YYPOSN &errt_posn);
- It receives the following arguments:
- text Error message.
- errt Code of the token that caused the error.
- errt_value Value of the token that caused the error.
- errt_posn Text position of token that caused error.
- ** Dropped compatibility with C.
- Compatibility with C became increasingly difficult
- to maintain as new features were added to btyaccpa.ske.
- So we dropped it. If anybody wants to make the new version
- compatible with C, we would gladly accept the changes.
- Meanwhile we expect that you use C++ to write grammar
- actions and everything else in grammar files.
- Since C is (in a sense) subset of C++, your C-based
- grammar may work if you use C++ compiler to compile it.
- Version 3.0 bugs fixed
- ----------------------
- Matthias Meixner <[email protected]> fixed a bug:
- BtYacc does not correctly handle typenames, if one typename
- is a prefix of another one and if this type is used after
- the longer one. In this case BTYacc produces invalid code.
- Version 2.1 changes
- -------------------
- by Vadim Maslov
- ** Added preprocessor statements to BtYacc that are similar
- in function and behavior to C/C++ preprocessor statements.
- These statements are used to:
- - Introduce modularity into a grammar by breaking it
- into several *.y files and assembling different
- grammars from the *.y modules using %include and %ifdef.
- - Have several versions of the same grammar
- by using %ifdef and $endif.
- - To include automatically generated grammar fragment.
- For instance, we use %include to include
- automatically generated list of tokens.
- Preprocessor statements are:
- %define <var-name>
- Define preprocessor variable named <var-name>.
- %ifdef <var-name>
- If preprocessor variable named <var-name>
- is defined by %define, then process the text from
- this %ifdef to the closing %endif.
- %endif
- Closing bracket for %ifdef preprocessor statement.
- Only one nesting level of %ifdef-%endif is allowed.
- %include <file-name>
- Process contents of the file named <file-name>.
- If <file-name> is a relative name, it is looked up
- in a directory in which btyacc was started.
- Only one nesting level of %include is allowed.
- Version 2.0 changes
- -------------------
- by Vadim Maslov
- ** Changed 16-bit short numbers to 32-bit int numbers in
- grammar tables, so that huge grammar tables (tables that
- are larger than 32768 elements) resulting from huge
- grammars (Cobol grammar, for instance) can work correctly.
- You need to have 32-bit integer to index table bigger than
- 32768 elements, 16-bit integer is not enough.
- The original BtYacc just generated non-working tables
- larger than 32768 elements without even notifying about
- the table overflow.
- ** Make error recovery work correctly when error happens
- while processing nested conflicts. Original BtYacc could
- infinitely cycle in certain situations that involved error
- recovery while in nested conflict.
- More detailed explanation: when we have nested conflicts
- (conflict that happens while trial-processing another
- conflict), it leads btyacc into NP-complete searching of
- conflict tree. The ultimate goal is YYVALID operator that
- selects a particular branch of that tree as a valid one.
- If no YYVALID is found on the tree, then error recovery
- takes over. The problem with this is that error recovery
- is started in the same state context that exists on the
- last surveyed branch of the conflict tree. Sometimes this
- last branch may be of zero length and it results in
- recovering to exactly the same state as existed before
- entering the conflict. BtYacc cycles then.
- We solved this problem by memorizing the longest path in
- the conflict tree while browsing it. If we ever get into
- error recovery, we restore state that existed on the
- longest path. Effectively we say: if we have an error,
- let us move forward as far as we possibly could while we
- were browsing the conflict tree.
- ** Introduce YYVALID_NESTED operation in addition to
- simply YYVALID. When we have a nested conflict (conflict
- while processing in trial mode for another conflict), we
- want to relate YYVALID to a particular level of conflict
- being in trial.
- Since we mostly anticipate only 2-level nested conflicts
- YYVALID_NESTED tells the parser to satisfy only the
- internal conflict. Therefore, in 1-level conflict
- situation YYVALID_NESTED acts like a regular YYVALID, but
- in 2-level conflict it is a no-op and the other YYVALID
- for outer conflict will be searched for.
- ** Improved handling of situation where /tmp directory is
- missing. Original btyacc just died quietly when /tmp
- directory was missing. We added code that states the
- problem explicitly. While on UNIX /tmp directory is always
- present, it may be missing on WIN32 systems, therefore
- diagnosing this situation is important.
- Version 1.0 changes: BackTracking
- =================================
- by Chris Dodd
- BTYACC is a modified version of yacc that supports
- automatic backtracking and semantic disambiguation to
- parse ambiguous grammars, as well as syntactic sugar for
- inherited attributes (which tend to introduce conflicts).
- Whenever a btyacc generated parser runs into a
- shift-reduce or reduce-reduce error in the parse table, it
- remembers the current parse point (yacc stack and input
- stream state), and goes into trial parse mode. It then
- continues parsing, ignoring most rule actions. If it runs
- into an error (either through the parse table or through
- an action calling YYERROR), it backtracks to the most
- recent conflict point and tries a different alternative.
- If it finds a successful parse (reaches the end of the
- input or an action calls YYVALID), it backtracks to the
- point where it first entered trial parse mode, and
- continues with a full parse (executing all actions),
- following the path of the successful trial.
- Actions in btyacc come in two flavors -- {}-actions, which
- are only executed when not in trial mode, and []-actions
- which are executed regardless of mode. There are also
- inherited attributes, which look like arguments (they are
- enclosed in "()") and act like []-actions.
- What this buys you:
- * No more lexer feedback hack. In yacc grammars for C, a
- standard hack, know as the "lexer feedback hack" is used
- to find typedef names. The lexer uses semantic
- information to decide if any given identifier is a
- typedef-name or not and returns a special token. With
- btyacc, you no longer need to do this; the lexer should
- just always return an identifier. The btyacc grammar then
- needs a rule of the form:
- typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]
- While the hack works adequately well for parsing C, it
- becomes a nightmare when you try to parse something like
- C++, where treating an ID as a typedef becomes heavily
- dependent on context.
- * Easy disambiguation via simple ordering. Btyacc runs
- its trials via the rule "try shifting first, then try
- reducing by the order that the conflicting rules appear in
- the input file". This means you can deal with semantic a
- disambiguation rule like:
- [1] If it looks like a declaration it is, otherwise
- [2] If it looks like an expression it is, otherwise
- [3] it is a syntax error
- [Ellis&Stroustrup, Annotated C++ Reference Manual, p93]
- To deal with this, you need only put all the rules for
- declarations before the rules for expressions in the
- grammar file.
- * No extra cost if you do not use it. Backtracking is
- only triggered when the parse hits a shift/reduce or
- reduce/reduce conflict in the table. If you have no
- conflicts in your grammar, there is no extra cost, other
- than some extra code which will never be invoked.
- * C++ and ANSI C compatible parsers. The parsers produced
- by btyacc can be compiled with C++ correctly. If you
- "#define" YYSTYPE to be some C++ type with constructor and
- destructor, everything will work fine. My favorite is
- "#define YYSTYPE SmartPointer", where SmartPointer is a
- smart pointer type that does garbage collection on the
- pointed to objects.
- BTYACC was originally written to make it easy to write a
- C++ parser (my goal was to be able to use the grammar out
- of the back of the ARM with as few modifications as
- possible). Anyone who has ever looked at Jim Roskind
- public domain C++ yacc grammar, or the yacc-based grammar
- used in g++ knows how difficult this is. BTYACC is very
- useful for parsing any ambiguous grammar, particularly
- ones that come from trying to merge two (or more) complete
- grammars.
- Limitations of the backtracking: Currently, the generated
- parser does NO pruning of alternate parsing paths. To
- avoid an exponential explosion of possible paths (and
- parsing time), you need to manually tell the parser when
- it can throw away saved paths using YYVALID. In practice,
- this turns out to be fairly easy to do. A C++ parser (for
- example) can just put a [YYVALID;] after every complete
- declaration and statement rule, corresponding to pruning
- the backtracking state after seeing a ';' or '}' -- there
- will never be a situation in which it is useful to
- backtrack past either of these.
- Inherited attributes in btyacc:
- Inherited attributes look a lot like function arguments to
- non-terminals, which is what they end up being in a
- recursive descent parser, but NOT how they are implemented
- in btyacc. Basically they are just syntactic sugar for
- embedded semantic actions and $0, $-1, ... in normal yacc.
- btyacc gives you two big advantages besides just the
- syntax:
- 1. it does type checking on the inherited attributes,
- so you do not have to specify $<type>0 and makes sure
- you give the correct number of arguments (inherited
- attributes) to every use of a non-terminal.
- 2. It "collapses" identical actions from that are produced
- from inherited attributes. This eliminates many
- potential reduce-reduce conflicts arising from
- the inherited attributes.
- You use inherited attributes by declaring the types of the
- attributes in the preamble with a type declaration and
- declaring names of the attributes on the lhs of the yacc
- rule. You can of course have more than one rule with the
- same lhs, and you can even give them different names in
- each, but the type and number must be the same.
- Here is a small example:
- /* lhs takes 2 inherited attributes */
- %type <t1> lhs(<t1>, <t2>)
- stuff(<t1>, <t2>)
- %%
- lhs($i1, $i2) : { $$ = $i1 }
- | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }
- This is roughly equivalent to the following yacc code:
- lhs :
- { $$ = $<t1>-1; }
- | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff
- { $$ = $4; }
- ;
- See the file "test/t2.y" for a longer and more complete
- example. At the current time, the start symbol cannot
- have any arguments.
- Variant parsers:
- Btyacc supports the -S flag to use a different parser
- skeleton, changing the way that the parser is called and
- used. The skeleton "push.skel" is included to produce a
- "passive" parser that you feed tokens to (rather than
- having the parser call a separate yylex routine). With
- push.skel, yyparse is defined as follows:
- int yyparse(int token, YYSTYPE yylval)
- You should call yyparse repeatedly with successive tokens
- of input. It returns 0 if more input is needed, 1 for a
- successful parse, and -1 for an unrecoverable parse error.
- Miscellaneous Features in ver. 1.0
- ----------------------------------
- by Chris Dodd
- The -r option has been implemented. The -r option tells
- Yacc to put the read-only tables in y.tab.c and the code and
- variables in y.code.c. Keith Bostic asked for this option so
- that :yyfix could be eliminated.
- The -l and -t options have been implemented. The -l
- option tells Yacc not to include #line directives in the code
- it produces. The -t option causes debugging code to be
- included in the compiled parser.
- The code for error recovery has been changed to
- implement the same algorithm as AT&T Yacc. There will still
- be differences in the way error recovery works because AT&T
- Yacc uses more default reductions than Berkeley Yacc.
- The environment variable TMPDIR determines the directory
- where temporary files will be created. If TMPDIR is defined,
- temporary files will be created in the directory whose
- pathname is the value of TMPDIR. By default, temporary files
- are created in /tmp.
- The keywords are now case-insensitive. For example,
- %nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are
- all equivalent.
- Commas and semicolons that are not part of C code are
- treated as commentary.
- Line-end comments, as in BCPL, are permitted. Line-end
- comments begin with // and end at the next end-of-line.
- Line-end comments are permitted in C code; they are converted
- to C comments on output.
- The form of y.output files has been changed to look more
- like those produced by AT&T Yacc.
- A new kind of declaration has been added.
- The form of the declaration is
- %ident string
- where string is a sequence of characters beginning with a
- double quote and ending with either a double quote or the
- next end-of-line, whichever comes first. The declaration
- will cause a #ident directive to be written near the start
- of the output file.
- If a parser has been compiled with debugging code, that
- code can be enabled by setting an environment variable.
- If the environment variable YYDEBUG is set to 0, debugging
- output is suppressed. If it is set to 1, debugging output
- is written to standard output.
- Building BtYacc
- ---------------
- by Chris Dodd and Vadim Maslov
- We used GCC and GNU make to compile BtYacc both on UNIX and
- WIN32 paltforms. You are welcome to try different
- combinations of makes and compilers. Most likely it will
- work, but it may require Makefile changes.
- There is no config script.
- Just type "make" and it should compile.
- AWK. If you want to change file btyaccpa.ske (backtracking
- parser skeleton), you will need awk to compile it into
- skeleton.c file. We used GNU AWK (gawk) version 3.0.
- It is known that using older versions of gawk
- may create problems in compilation, because older awks
- have problems with backslashes at the end of a line.
- For MSDOS, there a "makefile.dos" that should do the trick.
- Note: makefile.dos was not tested for a long time.
- The result of compilation should be a single executable called
- "btyacc" which you can install anywhere you like;
- it does not require any other files in the distribution to run.
- Legal Stuff
- -----------
- by Chris Dodd and Vadim Maslov
- In English: BtYacc is freeware. BtYacc is distributed with
- no warranty whatsoever. The author and any other contributors
- take no responsibility for any and all consequences of its use.
- In Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS
- NOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE
- LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL
- DAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR
- DATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY
- THIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN
- IF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
- DAMAGES.
|