notes.txt 2.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445
  1. TODO:
  2. * Need to go through everything and square it with RightToLeft matching.
  3. The support for this was built into an early version, and lots of things built
  4. afterwards are not savvy about bi-directional matching. Things that spring to
  5. mind: Regex match methods should start at 0 or text.Length depending on
  6. direction. Do split and replace need changes? Match should be aware of its
  7. direction (already applied some of this to NextMatch logic). The interpreter
  8. needs to check left and right bounds. Anchoring and substring discovery need
  9. to be reworked. RTL matches are going to have anchors on the right - ie $, \Z
  10. and \z. This should be added to the anchor logic. QuickSearch needs to work in
  11. reverse. There may be other stuff.... work through the code.
  12. * Add ECMAScript support to the parser. For example, [.\w\s\d] map to ECMA
  13. categories instead of canonical ones [DONE]. There's different behaviour on
  14. backreference/octal disambiguation. Find out what the runtime behavioural
  15. difference is for cyclic backreferences eg (?(1)abc\1) - this is only briefly
  16. mentioned in the spec. I couldn't find much on this in the ECMAScript
  17. specification either.
  18. * Octal/backreference parsing needs a big fix. The rules are ridiculously complex.
  19. * Add a check in QuickSearch for single character substrings. This is likely to
  20. be a common case. There's no need to go through a shift table. Also, have a
  21. look at just computing a relevant subset of the shift table and using an
  22. (offset, size) pair to help test inclusion. Characters not in the table get
  23. the default len + 1 shift.
  24. * Improve the perl test suite. Run under MS runtime to generate checksums for
  25. each trial. Checksums should incorporate: all captures (index, length) for all
  26. groups; names of explicit capturing groups, and the numbers they map to. Any
  27. other state? RegexTrial.Execute() will then compare result and checksum.
  28. * The pattern (?(1?)a|b). It should fail: Perl fails, the MS implementation
  29. fails, but I pass. The documentation says that the construct (?(X)...) can be
  30. processed in two ways. If X is a valid group number, or a valid group name,
  31. then the expression becomes a capture conditional - the (...) part is
  32. executed only if X has been captured. If X is not a group number or name, then
  33. it is treated as a positive lookahead., and (...) is only executed if the
  34. lookahead succeeds. My code does the latter, but on further investigation it
  35. appears that both Perl and MS fail to recognize an expression assertion if the
  36. first character of the assertion is a number - which instead suggests a
  37. capture conditional. The exception raised is something like "invalid group
  38. number". I get the feeling the my behaviour seems more correct, but it's not
  39. consistent with the other implementations, so it should probably be changed.