README 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592
  1. BTYACC -- backtracking yacc
  2. ===========================
  3. BTYACC was created by Chris Dodd using ideas from many
  4. places and lots of code from the Berkeley Yacc
  5. distribution, which is a public domain yacc clone put
  6. together by the good folks at Berkeley. This code is
  7. distributed with NO WARRANTEE and is public domain.
  8. It is certain to contain bugs, which you should
  9. report to: [email protected].
  10. Vadim Maslov of Siber Systems <[email protected]>
  11. considerably modified BTYACC to make it suitable
  12. for production environment.
  13. Several people have suggested bug fixes that
  14. were incorporated into BtYacc.
  15. See the README.BYACC files for more about
  16. Berkeley Yacc and other sources of info.
  17. http://www.siber.com/btyacc/ is the current home of BtYacc.
  18. It is provided courtesy of Siber Systems http://www.siber.com/.
  19. Version 3.0 changes
  20. -------------------
  21. by Vadim Maslov
  22. Changes mostly occurred in btyaccpa.ske file that
  23. contains the parsing shift/reduce/backtrack algorithm.
  24. Version 3.0 innovations focus on:
  25. - text position computation and propagation,
  26. - industrial-strength error processing and recovery.
  27. ** Added mechanism for computing and propagating
  28. text position of tokens and non-terminals.
  29. Compilers often need to build AST trees such that every node
  30. in a tree can relate to the parsed program source it came from.
  31. The following applications are very likely to need this:
  32. - debuggers that show actual source of the debugged program,
  33. - source-to-source translators that want
  34. unchanged parts of the tree to generate the unchanged code.
  35. The new YYPOSN mechanism added in this version of BtYacc
  36. helps you in automating the text position computation
  37. and in assigning the computed text positions to the AST.
  38. This mechanism is successfully used in commercial
  39. parsers and source-to-source translators.
  40. In standard Yaccs every token and every non-terminal
  41. has an YYSTYPE semantic value attached to it.
  42. In this new version every token and every non-terminal
  43. also has an YYPOSN text position attached to it.
  44. YYPOSN is a user-defined type that can be anything and
  45. that has a meaning of text position attached to
  46. token or non-terminal.
  47. In addition to semantic value stack BtYacc now maintains
  48. text position stack. Behavior of the text position stack
  49. is similar to the behavior of the semantic value stack.
  50. If using text position mechanism,
  51. you need to define the following:
  52. YYPOSN Preprocessor variable that contains C/C++ type of
  53. the text position attached to
  54. every token and non-terminal.
  55. yyposn Global variable of type YYPOSN.
  56. The lexer must assign text position of
  57. the returned token to yyposn, just like it assigns
  58. semantic value of the returned token to yylval.
  59. YYREDUCEPOSNFUNC
  60. Preprocessor variable that points to unction that
  61. is called after the grammar rule reduction
  62. to reduce text positions located on the stack.
  63. This function is called by BtYacc to reduce text
  64. positions. The function is called immediately after
  65. the regular rule reduction occurs.
  66. The function has the following prototype:
  67. void ReducePosn(YYPOSN &ret,
  68. YYPOSN *terms,
  69. YYSTYPE *term_vals,
  70. int term_no,
  71. int stk_pos,
  72. int yychar,
  73. YYPOSN &yyposn,
  74. UserType extra);
  75. The function arguments are:
  76. - ret
  77. Reference to the text position returned by
  78. the rule. The function must write the computed
  79. text position returned by the rule to ret.
  80. This is analogue of the $$ semantic value.
  81. - term_posns
  82. Array of the right-hand side rule components
  83. YYPOSN text positions. These are analogues of
  84. $1, $2, ..., $N in the text position world.
  85. - term_vals
  86. Array of the right-hand side (RHS) rule components
  87. YYSTYPE values. These are the $1,...,$N themselves.
  88. - term_no
  89. Number of the components in RHS of the reduced rule.
  90. Equal to size of arrays term_posns and term_vals.
  91. Also equal to N in $1,...,$N in the reduced rule.
  92. - stk_pos
  93. YYSTYPE/YYPOSN stack position before the reduction.
  94. - yychar
  95. Lookahead token that immediately follows
  96. the reduced RHS components.
  97. - yyposn
  98. YYPOSN of the token that immediately follows
  99. the reduced RHS components.
  100. - extra
  101. User-defined extra argument passed to ReducePosn.
  102. Typically this function extracts text positions from
  103. the right-hand side rule components and either
  104. assigns them to the returned $$ structure/tree or
  105. if no $$ value is returned, puts them into
  106. the ret text position from where
  107. it will be picked up by the later reduced rules.
  108. YYREDUCEPOSNFUNCARG
  109. Extra user-defined argument passed to
  110. the ReducePosn function. This argument can use
  111. any variables defined in btyaccpa.ske.
  112. ** Added code to btyaccpa.ske that automatically cleans up
  113. semantic semantic values and text positions of tokens
  114. and non-terminals that are discarded and deleted as
  115. a result of error processing.
  116. In the previous versions the discarded token and non-terminal
  117. semantic values were not cleaned that caused quite severe
  118. leaks. The only way to fix it was to add garbage collection
  119. to YYSTYPE class.
  120. Now BtYacc skeleton calls delete functions for semantic
  121. values and positions of the discarded tokens and
  122. non-terminals.
  123. You need to define the following functions that BtYacc
  124. calls when it needs to delete semantic value or text position.
  125. YYDELETEVAL
  126. User-defined function that is called by BtYacc
  127. to delete semantic value of the token or non-terminal.
  128. The user-defined function must have the prototype:
  129. void DeleteYYval(YYSTYPE v, int type);
  130. v is semantic value to delete,
  131. type is one of the following:
  132. 0 discarding token
  133. 1 discarding state
  134. 2 cleaning up stack when aborting
  135. YYDELETEPOSN
  136. User-defined function that is called by BtYacc
  137. to delete text position of the token or non-terminal.
  138. The user-defined function must have the prototype:
  139. void DeleteYYposn(YYPOSN p, int type);
  140. v is semantic value to delete,
  141. type is one of the following:
  142. 0 discarding token
  143. 1 discarding state
  144. 2 cleaning up stack when aborting
  145. ** User can define "detailed" syntax error processing
  146. function that reports an *exact* position of
  147. the token that caused the error.
  148. If you define preprocessor variable YYERROR_DETAILED in
  149. your grammar then you need define the following
  150. error processing function:
  151. void yyerror_detailed(char *text,
  152. int errt,
  153. YYSTYPE &errt_value,
  154. YYPOSN &errt_posn);
  155. It receives the following arguments:
  156. text Error message.
  157. errt Code of the token that caused the error.
  158. errt_value Value of the token that caused the error.
  159. errt_posn Text position of token that caused error.
  160. ** Dropped compatibility with C.
  161. Compatibility with C became increasingly difficult
  162. to maintain as new features were added to btyaccpa.ske.
  163. So we dropped it. If anybody wants to make the new version
  164. compatible with C, we would gladly accept the changes.
  165. Meanwhile we expect that you use C++ to write grammar
  166. actions and everything else in grammar files.
  167. Since C is (in a sense) subset of C++, your C-based
  168. grammar may work if you use C++ compiler to compile it.
  169. Version 3.0 bugs fixed
  170. ----------------------
  171. Matthias Meixner <[email protected]> fixed a bug:
  172. BtYacc does not correctly handle typenames, if one typename
  173. is a prefix of another one and if this type is used after
  174. the longer one. In this case BTYacc produces invalid code.
  175. Version 2.1 changes
  176. -------------------
  177. by Vadim Maslov
  178. ** Added preprocessor statements to BtYacc that are similar
  179. in function and behavior to C/C++ preprocessor statements.
  180. These statements are used to:
  181. - Introduce modularity into a grammar by breaking it
  182. into several *.y files and assembling different
  183. grammars from the *.y modules using %include and %ifdef.
  184. - Have several versions of the same grammar
  185. by using %ifdef and $endif.
  186. - To include automatically generated grammar fragment.
  187. For instance, we use %include to include
  188. automatically generated list of tokens.
  189. Preprocessor statements are:
  190. %define <var-name>
  191. Define preprocessor variable named <var-name>.
  192. %ifdef <var-name>
  193. If preprocessor variable named <var-name>
  194. is defined by %define, then process the text from
  195. this %ifdef to the closing %endif.
  196. %endif
  197. Closing bracket for %ifdef preprocessor statement.
  198. Only one nesting level of %ifdef-%endif is allowed.
  199. %include <file-name>
  200. Process contents of the file named <file-name>.
  201. If <file-name> is a relative name, it is looked up
  202. in a directory in which btyacc was started.
  203. Only one nesting level of %include is allowed.
  204. Version 2.0 changes
  205. -------------------
  206. by Vadim Maslov
  207. ** Changed 16-bit short numbers to 32-bit int numbers in
  208. grammar tables, so that huge grammar tables (tables that
  209. are larger than 32768 elements) resulting from huge
  210. grammars (Cobol grammar, for instance) can work correctly.
  211. You need to have 32-bit integer to index table bigger than
  212. 32768 elements, 16-bit integer is not enough.
  213. The original BtYacc just generated non-working tables
  214. larger than 32768 elements without even notifying about
  215. the table overflow.
  216. ** Make error recovery work correctly when error happens
  217. while processing nested conflicts. Original BtYacc could
  218. infinitely cycle in certain situations that involved error
  219. recovery while in nested conflict.
  220. More detailed explanation: when we have nested conflicts
  221. (conflict that happens while trial-processing another
  222. conflict), it leads btyacc into NP-complete searching of
  223. conflict tree. The ultimate goal is YYVALID operator that
  224. selects a particular branch of that tree as a valid one.
  225. If no YYVALID is found on the tree, then error recovery
  226. takes over. The problem with this is that error recovery
  227. is started in the same state context that exists on the
  228. last surveyed branch of the conflict tree. Sometimes this
  229. last branch may be of zero length and it results in
  230. recovering to exactly the same state as existed before
  231. entering the conflict. BtYacc cycles then.
  232. We solved this problem by memorizing the longest path in
  233. the conflict tree while browsing it. If we ever get into
  234. error recovery, we restore state that existed on the
  235. longest path. Effectively we say: if we have an error,
  236. let us move forward as far as we possibly could while we
  237. were browsing the conflict tree.
  238. ** Introduce YYVALID_NESTED operation in addition to
  239. simply YYVALID. When we have a nested conflict (conflict
  240. while processing in trial mode for another conflict), we
  241. want to relate YYVALID to a particular level of conflict
  242. being in trial.
  243. Since we mostly anticipate only 2-level nested conflicts
  244. YYVALID_NESTED tells the parser to satisfy only the
  245. internal conflict. Therefore, in 1-level conflict
  246. situation YYVALID_NESTED acts like a regular YYVALID, but
  247. in 2-level conflict it is a no-op and the other YYVALID
  248. for outer conflict will be searched for.
  249. ** Improved handling of situation where /tmp directory is
  250. missing. Original btyacc just died quietly when /tmp
  251. directory was missing. We added code that states the
  252. problem explicitly. While on UNIX /tmp directory is always
  253. present, it may be missing on WIN32 systems, therefore
  254. diagnosing this situation is important.
  255. Version 1.0 changes: BackTracking
  256. =================================
  257. by Chris Dodd
  258. BTYACC is a modified version of yacc that supports
  259. automatic backtracking and semantic disambiguation to
  260. parse ambiguous grammars, as well as syntactic sugar for
  261. inherited attributes (which tend to introduce conflicts).
  262. Whenever a btyacc generated parser runs into a
  263. shift-reduce or reduce-reduce error in the parse table, it
  264. remembers the current parse point (yacc stack and input
  265. stream state), and goes into trial parse mode. It then
  266. continues parsing, ignoring most rule actions. If it runs
  267. into an error (either through the parse table or through
  268. an action calling YYERROR), it backtracks to the most
  269. recent conflict point and tries a different alternative.
  270. If it finds a successful parse (reaches the end of the
  271. input or an action calls YYVALID), it backtracks to the
  272. point where it first entered trial parse mode, and
  273. continues with a full parse (executing all actions),
  274. following the path of the successful trial.
  275. Actions in btyacc come in two flavors -- {}-actions, which
  276. are only executed when not in trial mode, and []-actions
  277. which are executed regardless of mode. There are also
  278. inherited attributes, which look like arguments (they are
  279. enclosed in "()") and act like []-actions.
  280. What this buys you:
  281. * No more lexer feedback hack. In yacc grammars for C, a
  282. standard hack, know as the "lexer feedback hack" is used
  283. to find typedef names. The lexer uses semantic
  284. information to decide if any given identifier is a
  285. typedef-name or not and returns a special token. With
  286. btyacc, you no longer need to do this; the lexer should
  287. just always return an identifier. The btyacc grammar then
  288. needs a rule of the form:
  289. typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]
  290. While the hack works adequately well for parsing C, it
  291. becomes a nightmare when you try to parse something like
  292. C++, where treating an ID as a typedef becomes heavily
  293. dependent on context.
  294. * Easy disambiguation via simple ordering. Btyacc runs
  295. its trials via the rule "try shifting first, then try
  296. reducing by the order that the conflicting rules appear in
  297. the input file". This means you can deal with semantic a
  298. disambiguation rule like:
  299. [1] If it looks like a declaration it is, otherwise
  300. [2] If it looks like an expression it is, otherwise
  301. [3] it is a syntax error
  302. [Ellis&Stroustrup, Annotated C++ Reference Manual, p93]
  303. To deal with this, you need only put all the rules for
  304. declarations before the rules for expressions in the
  305. grammar file.
  306. * No extra cost if you do not use it. Backtracking is
  307. only triggered when the parse hits a shift/reduce or
  308. reduce/reduce conflict in the table. If you have no
  309. conflicts in your grammar, there is no extra cost, other
  310. than some extra code which will never be invoked.
  311. * C++ and ANSI C compatible parsers. The parsers produced
  312. by btyacc can be compiled with C++ correctly. If you
  313. "#define" YYSTYPE to be some C++ type with constructor and
  314. destructor, everything will work fine. My favorite is
  315. "#define YYSTYPE SmartPointer", where SmartPointer is a
  316. smart pointer type that does garbage collection on the
  317. pointed to objects.
  318. BTYACC was originally written to make it easy to write a
  319. C++ parser (my goal was to be able to use the grammar out
  320. of the back of the ARM with as few modifications as
  321. possible). Anyone who has ever looked at Jim Roskind
  322. public domain C++ yacc grammar, or the yacc-based grammar
  323. used in g++ knows how difficult this is. BTYACC is very
  324. useful for parsing any ambiguous grammar, particularly
  325. ones that come from trying to merge two (or more) complete
  326. grammars.
  327. Limitations of the backtracking: Currently, the generated
  328. parser does NO pruning of alternate parsing paths. To
  329. avoid an exponential explosion of possible paths (and
  330. parsing time), you need to manually tell the parser when
  331. it can throw away saved paths using YYVALID. In practice,
  332. this turns out to be fairly easy to do. A C++ parser (for
  333. example) can just put a [YYVALID;] after every complete
  334. declaration and statement rule, corresponding to pruning
  335. the backtracking state after seeing a ';' or '}' -- there
  336. will never be a situation in which it is useful to
  337. backtrack past either of these.
  338. Inherited attributes in btyacc:
  339. Inherited attributes look a lot like function arguments to
  340. non-terminals, which is what they end up being in a
  341. recursive descent parser, but NOT how they are implemented
  342. in btyacc. Basically they are just syntactic sugar for
  343. embedded semantic actions and $0, $-1, ... in normal yacc.
  344. btyacc gives you two big advantages besides just the
  345. syntax:
  346. 1. it does type checking on the inherited attributes,
  347. so you do not have to specify $<type>0 and makes sure
  348. you give the correct number of arguments (inherited
  349. attributes) to every use of a non-terminal.
  350. 2. It "collapses" identical actions from that are produced
  351. from inherited attributes. This eliminates many
  352. potential reduce-reduce conflicts arising from
  353. the inherited attributes.
  354. You use inherited attributes by declaring the types of the
  355. attributes in the preamble with a type declaration and
  356. declaring names of the attributes on the lhs of the yacc
  357. rule. You can of course have more than one rule with the
  358. same lhs, and you can even give them different names in
  359. each, but the type and number must be the same.
  360. Here is a small example:
  361. /* lhs takes 2 inherited attributes */
  362. %type <t1> lhs(<t1>, <t2>)
  363. stuff(<t1>, <t2>)
  364. %%
  365. lhs($i1, $i2) : { $$ = $i1 }
  366. | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }
  367. This is roughly equivalent to the following yacc code:
  368. lhs :
  369. { $$ = $<t1>-1; }
  370. | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff
  371. { $$ = $4; }
  372. ;
  373. See the file "test/t2.y" for a longer and more complete
  374. example. At the current time, the start symbol cannot
  375. have any arguments.
  376. Variant parsers:
  377. Btyacc supports the -S flag to use a different parser
  378. skeleton, changing the way that the parser is called and
  379. used. The skeleton "push.skel" is included to produce a
  380. "passive" parser that you feed tokens to (rather than
  381. having the parser call a separate yylex routine). With
  382. push.skel, yyparse is defined as follows:
  383. int yyparse(int token, YYSTYPE yylval)
  384. You should call yyparse repeatedly with successive tokens
  385. of input. It returns 0 if more input is needed, 1 for a
  386. successful parse, and -1 for an unrecoverable parse error.
  387. Miscellaneous Features in ver. 1.0
  388. ----------------------------------
  389. by Chris Dodd
  390. The -r option has been implemented. The -r option tells
  391. Yacc to put the read-only tables in y.tab.c and the code and
  392. variables in y.code.c. Keith Bostic asked for this option so
  393. that :yyfix could be eliminated.
  394. The -l and -t options have been implemented. The -l
  395. option tells Yacc not to include #line directives in the code
  396. it produces. The -t option causes debugging code to be
  397. included in the compiled parser.
  398. The code for error recovery has been changed to
  399. implement the same algorithm as AT&T Yacc. There will still
  400. be differences in the way error recovery works because AT&T
  401. Yacc uses more default reductions than Berkeley Yacc.
  402. The environment variable TMPDIR determines the directory
  403. where temporary files will be created. If TMPDIR is defined,
  404. temporary files will be created in the directory whose
  405. pathname is the value of TMPDIR. By default, temporary files
  406. are created in /tmp.
  407. The keywords are now case-insensitive. For example,
  408. %nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are
  409. all equivalent.
  410. Commas and semicolons that are not part of C code are
  411. treated as commentary.
  412. Line-end comments, as in BCPL, are permitted. Line-end
  413. comments begin with // and end at the next end-of-line.
  414. Line-end comments are permitted in C code; they are converted
  415. to C comments on output.
  416. The form of y.output files has been changed to look more
  417. like those produced by AT&T Yacc.
  418. A new kind of declaration has been added.
  419. The form of the declaration is
  420. %ident string
  421. where string is a sequence of characters beginning with a
  422. double quote and ending with either a double quote or the
  423. next end-of-line, whichever comes first. The declaration
  424. will cause a #ident directive to be written near the start
  425. of the output file.
  426. If a parser has been compiled with debugging code, that
  427. code can be enabled by setting an environment variable.
  428. If the environment variable YYDEBUG is set to 0, debugging
  429. output is suppressed. If it is set to 1, debugging output
  430. is written to standard output.
  431. Building BtYacc
  432. ---------------
  433. by Chris Dodd and Vadim Maslov
  434. We used GCC and GNU make to compile BtYacc both on UNIX and
  435. WIN32 paltforms. You are welcome to try different
  436. combinations of makes and compilers. Most likely it will
  437. work, but it may require Makefile changes.
  438. There is no config script.
  439. Just type "make" and it should compile.
  440. AWK. If you want to change file btyaccpa.ske (backtracking
  441. parser skeleton), you will need awk to compile it into
  442. skeleton.c file. We used GNU AWK (gawk) version 3.0.
  443. It is known that using older versions of gawk
  444. may create problems in compilation, because older awks
  445. have problems with backslashes at the end of a line.
  446. For MSDOS, there a "makefile.dos" that should do the trick.
  447. Note: makefile.dos was not tested for a long time.
  448. The result of compilation should be a single executable called
  449. "btyacc" which you can install anywhere you like;
  450. it does not require any other files in the distribution to run.
  451. Legal Stuff
  452. -----------
  453. by Chris Dodd and Vadim Maslov
  454. In English: BtYacc is freeware. BtYacc is distributed with
  455. no warranty whatsoever. The author and any other contributors
  456. take no responsibility for any and all consequences of its use.
  457. In Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS
  458. NOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE
  459. LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL
  460. DAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR
  461. DATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY
  462. THIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN
  463. IF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
  464. DAMAGES.