doc.odin 3.8 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
  1. /*
  2. package regex implements a complete suite for using Regular Expressions to
  3. match and capture text.
  4. Regular expressions are used to describe how a piece of text can match to
  5. another, using a pattern language.
  6. Odin's regex library implements the following features:
  7. Alternation: `apple|cherry`
  8. Classes: `[0-9_]`
  9. Classes, negated: `[^0-9_]`
  10. Shorthands: `\d\s\w`
  11. Shorthands, negated: `\D\S\W`
  12. Wildcards: `.`
  13. Repeat, optional: `a*`
  14. Repeat, at least once: `a+`
  15. Repetition: `a{1,2}`
  16. Optional: `a?`
  17. Group, capture: `([0-9])`
  18. Group, non-capture: `(?:[0-9])`
  19. Start & End Anchors: `^hello$`
  20. Word Boundaries: `\bhello\b`
  21. Non-Word Boundaries: `hello\B`
  22. These specifiers can be composed together, such as an optional group:
  23. `(?:hello)?`
  24. This package also supports the non-greedy variants of the repeating and
  25. optional specifiers by appending a `?` to them.
  26. Of the shorthand classes that are supported, they are all ASCII-based, even
  27. when compiling in Unicode mode. This is for the sake of general performance and
  28. simplicity, as there are thousands of Unicode codepoints which would qualify as
  29. either a digit, space, or word character which could be irrelevant depending on
  30. what is being matched.
  31. Here are the shorthand class equivalencies:
  32. \d: [0-9]
  33. \s: [\t\n\f\r ]
  34. \w: [0-9A-Z_a-z]
  35. If you need your own shorthands, you can compose strings together like so:
  36. MY_HEX :: "[0-9A-Fa-f]"
  37. PATTERN :: MY_HEX + "-" + MY_HEX
  38. The compiler will handle turning multiple identical classes into references to
  39. the same set of matching runes, so there's no penalty for doing it like this.
  40. ``Some people, when confronted with a problem, think
  41. "I know, I'll use regular expressions." Now they have two problems.''
  42. - Jamie Zawinski
  43. Regular expressions have gathered a reputation over the decades for often being
  44. chosen as the wrong tool for the job. Here, we will clarify a few cases in
  45. which RegEx might be good or bad.
  46. **When is it a good time to use RegEx?**
  47. - You don't know at compile-time what patterns of text the program will need to
  48. match when it's running.
  49. - As an example, you are making a client which can be configured by the user to
  50. trigger on certain text patterns received from a server.
  51. - For another example, you need a way for users of a text editor to compose
  52. matching strings that are more intricate than a simple substring lookup.
  53. - The text you're matching against is small (< 64 KiB) and your patterns aren't
  54. overly complicated with branches (alternations, repeats, and optionals).
  55. - If none of the above general impressions apply but your project doesn't
  56. warrant long-term maintenance.
  57. **When is it a bad time to use RegEx?**
  58. - You know at compile-time the grammar you're parsing; a hand-made parser has
  59. the potential to be more maintainable and readable.
  60. - The grammar you're parsing has certain validation steps that lend itself to
  61. forming complicated expressions, such as e-mail addresses, URIs, dates,
  62. postal codes, credit cards, et cetera. Using RegEx to validate these
  63. structures is almost always a bad sign.
  64. - The text you're matching against is big (> 1 MiB); you would be better served
  65. by first dividing the text into manageable chunks and using some heuristic to
  66. locate the most likely location of a match before applying RegEx against it.
  67. - You value high performance and low memory usage; RegEx will always have a
  68. certain overhead which increases with the complexity of the pattern.
  69. The implementation of this package has been optimized, but it will never be as
  70. thoroughly performant as a hand-made parser. In comparison, there are just too
  71. many intermediate steps, assumptions, and generalizations in what it takes to
  72. handle a regular expression.
  73. */
  74. package regex