How to remove global backtracking from your grammar - ANTLR 3. More complex grammars, especially those taken from sources which provide only LR grammars and translated literally into ANTLR representation, generate errors such as the ones as below. This tutorial explains the basic steps to resolve the issue without enabling global backtracking. It is assumed that the reader already has a syntactically correct ANTLR grammar available. Also ANTLRworks is being used as the IDE - for this kind of work it is near ideal. The guidelines covered in this tutorial originate from Jim Idle but are enriched with explanations that stem from my personal experience with ANTLR. This also means that I may not have seen every possible problem or that I haven't really found the optimal solution for a particular problem. Feedback is always welcome! Preparations. Create a new temporary file into which the header (like options and @members) of the old file is copied. If a separate . tokens file is referenced then either save the new file in the same directory or copy the . Is there any general procedure for it? We present a new method for removing left C / C++ / MFC > ATL / WTL / STL.
REmove Left Factoring Published on: Mar 3, 2016. Published in: Education. Transcripts - Program to remove Left factoring. How to write a C program for left factoring? 1 Questions & Answers Place. More questions about Education, School Subjects, Computer Programming. Change also the grammar name. Make sure that no (mutual) left- recursive rules are included - those can't be solved with the techniques detailed below. First step. Copy a small chunk of rules from the original grammar. It is advisable to include the start rule from the beginning. If you happen to have sorted the rules into groups (a facility available in ANTLRworks) then use the groups as delimiters. If the groups are too big you can divide them further. Second step. Copying only parts of the grammar will make ANTLRworks complain about unknown rules - rules which haven't been copied yet. To remove the warnings, create new rule instances. It is recommend to use the keywords as the sole content of the rule and to use for each rule a different keyword. This firstly allows the warnings to be suppressed by ANTLR(works) and prevents new misleading errors due to empty rules which aren't empty in the original grammar. Third step. Now check the grammar. Beware: ANTLRworks reports successful parser generation even if ANTLR emits a warning (newer versions of ANTLRworks may have been fixed already). One has to check the rules, if some of them turned red. If yes, then solve the issue at hand. After solving all open issues repeat these three steps for the next chunk of rules until all rules have been copied into the temporary file. Look at chapter 1. TDAR (The Definitive ANTLR Reference) about Syntactic Predicates for some tips and general explanations. But there are some issues which are not mentioned and only when working with my C# grammar I became aware of these issues. There are four ways to influence ANTLR's ability to generate a non- ambiguous grammar: Using left- factoring, syntactic predicates, backtracking and the k option. The last one doesn't actually turn an ambiguous grammar into a non- ambiguous one, but the reverse may be true. If no look- ahead (unbounded look- ahead) is specified, ANTLR uses a flexible number of look- ahead tokens. Unbounded look- ahead tends to be slower than a bounded look- ahead. Many grammars, however, defeat this optimization in their original form. The recommendation is to use unbounded look- ahead until everything else has been resolved. Then determine the lowest number of look- ahead tokens, which causes no warnings, and reduce the value by one. Give the complaining rules a local k option with the previous value. Repeat this until you get to . The first is left- factoring. Look at the following rule: As you can see, the two L at the beginning prevent that ANTLR can decide which path is the right path. As a consequence, ANTLR excludes the second one. Left- factoring can solve this. The name of this technique stems from the fact that you factor out the common part, which is always on the left side: Left- factoring can be done even between several rules. The starting point is then slightly different. Furthermore, there is another possibility to solve this issue - syntactic predicates. Syntactic Predicates. Syntactic predicates tell ANTLR: . If yes, then take this path, if no, try the next alternative. This speeds things up as ANTLR executes this rule when the other rules aren't chosen. An issue remains: Two solutions for the same problem provoke the question, when to use which solution. I prefer left- factoring when it can be done easily, even beyond rule boundaries. Left- factoring is faster than using predicates and backtracking. But there are cases in which it is too complicated or impossible to separate the common subset without changing the meaning of the grammar. Or the rules to differentiate between cases aren't represented by the existing grammar rules. Then a predicate is the right solution. Predicates appear in parentheses and may contain any regular combination of lexer and parser rules. This allows for the following snippet, where cast. The parser simply tests which alternative is the right one. In fact, I haven't had to use it so far myself, so I can't say when it is required. My recommendation is to use left- factoring and syntactic predicates as much as possible. I also noted that there is some confusion regarding backtracking and syntactic predicates. Terence Parr uses the image that backtracking is like going into every branch of a maze, marking every taken turn on the walls, before jotting down the right path into a notebook, while syntactic predicates are like sending trained monkeys into certain branches and wait for an OK from them. As far as my understanding goes, the difference between backtracking and predicates is that backtracking requires the actual execution of the participating rules (although no actions are executed), using implicitly a stack. Also backtracking creates a predicate at each (non- ambiguous) alternative and checks the entire rule instead of only the necessary tokens. Predicates on the other hand allow the parser to look ahead if the token stream matches by computing a DFA (deterministic finite automaton) that recognizes the token sequence described by the predicate. Only if the DFA can recognize the sequence the corresponding subrule is selected. In effect, backtracking is more powerful, but also slower than syntactic predicates. It is possible to use backtracking on rules locally. This is preferable over the global option. Techniques to Solve Ambiguities. A) Do not create new ambiguities. It may sound simple or silly but solving an ambiguity, while another one shows then up, is a fairly common occurrence. It is possible that ANTLR(works) has silently swallowed a warning before, so the chosen solution is actually correct. But the more common case from my experience is the accidental creation of an ambiguity which has not been in the original grammar. Usually it means that you have chosen the wrong way to solve the first ambiguity. I had the case that the rule for the instance constructor became a true superset for the static constructor, as I added the . This allowed the instance constructor rule to match the static constructor sequence. The solution was to subsume the static constructor rule into the instance constructor rule and to check later, what kind of constructor it actually is (see B). B) Create a two- pass parser if your grammar includes context- dependent rules which are unsolvable with the local information. An example is the decision whether . You can require like in C that the symbol table already has an entry for . Other languages like Java allow a more form- free placement. Only in the second pass the parser can know whether . The solution is to create a syntactic superset and to left- factor the commonalities. The second pass looks for forbidden combinations and can then even create better error messages. An example would be from C# where both classes and methods have the . The C# grammar provides each rule with a specialized modifier rule which has to be replaced with a common modifier rule. This modifier rule has all possible modifiers and has been left- factored into the parent rules. C) Respect dependencies when working on a left- edge ambiguity - one symbol appears in several subrules at the first place. The problem in such a case is that the context is provided by other rules and may cause a new ambiguity. An example of a real case is . If one were to solve this with left- factoring, one would have to create ? Unfortunately, there is some interaction in this situation: ? Now the input sequence . The proper way to handle this rule is to use syntactic predicates and not left- factoring. D) Never use . I had a case where the ternary operator . Assuming that you do not find any feasible alternatives beyond global backtracking, there is a way to find out for sure that you have encountered such a situation. Global backtracking suppresses the warning, but local backtracking does not. F) Solve all ambiguities, but start with the ones you know how to solve. Maybe solving easier ambiguities first makes solving harder ones later easier. If you can't think of a good way, then let it rest for a while and try again later. Also ask on the email list for those which seem to be unsolvable. Fourth step. Once satisfied with results, put the contents back into the original file.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |