doc.odin 2.1 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
  1. /*
  2. package regex_optimizer implements an optimizer which acts upon the AST of a
  3. parsed regular expression pattern, transforming it in-place without moving to a
  4. compilation step.
  5. Where possible, it aims to reduce branching as much as possible in the
  6. expression by reducing usage of `|`.
  7. Here is a summary of the optimizations that it will do:
  8. * Class Simplification : `[aab]` => `[ab]`
  9. `[aa]` => `[a]`
  10. * Class Reduction : `[a]` => `a`
  11. * Range Construction : `[abc]` => `[a-c]`
  12. * Rune Merging into Range : `[aa-c]` => `[a-c]`
  13. * Range Merging : `[a-cc-e]` => `[a-e]`
  14. `[a-cd-e]` => `[a-e]`
  15. `[a-cb-e]` => `[a-e]`
  16. * Alternation to Optional : `a|` => `a?`
  17. * Alternation to Optional Non-Greedy : `|a` => `a??`
  18. * Alternation Reduction : `a|a` => `a`
  19. * Alternation to Class : `a|b` => `[ab]`
  20. * Class Union : `[a0]|[b1]` => `[a0b1]`
  21. `[a-b]|c` => `[a-bc]`
  22. `a|[b-c]` => `[b-ca]`
  23. * Wildcard Reduction : `a|.` => `.`
  24. `.|a` => `.`
  25. `[ab]|.` => `.`
  26. `.|[ab]` => `.`
  27. * Common Suffix Elimination : `blueberry|strawberry` => `(?:blue|straw)berry`
  28. * Common Prefix Elimination : `abi|abe` => `ab(?:i|e)`
  29. * Composition: Consume All to Anchored End
  30. `.*$` => <special opcode>
  31. `.+$` => `.` <special opcode>
  32. Possible future improvements:
  33. - Change the AST of alternations to be a list instead of a tree, so that
  34. constructions such as `(ab|bb|cb)` can be considered in whole by the affix
  35. elimination optimizations.
  36. - Introduce specialized opcodes for certain classes of repetition.
  37. - Add Common Infix Elimination.
  38. - Measure the precise finite minimum and maximum of a pattern, if available,
  39. and check against that on any strings before running the virtual machine.
  40. */
  41. package regex_optimizer