NEWS 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484
  1. News about PCRE2 releases
  2. -------------------------
  3. Version 10.43 16-February-2024
  4. ------------------------------
  5. There are quite a lot of changes in this release (see ChangeLog and git log for
  6. a list). Those that are not bugfixes or code tidies are:
  7. * The JIT code no longer supports ARMv5 architecture.
  8. * A new function pcre2_get_match_data_heapframes_size() for finer heap control.
  9. * New option flags to restrict the interaction between ASCII and non-ASCII
  10. characters for caseless matching and \d and friends. There are also new
  11. pattern constructs to control these flags from within a pattern.
  12. * Upgrade to Unicode 15.0.0.
  13. * Treat a NULL pattern with zero length as an empty string.
  14. * Added support for limited-length variable-length lookbehind assertions, with
  15. a default maximum length of 255 characters (same as Perl) but with a function
  16. to adjust the limit.
  17. * Support for LoongArch in JIT.
  18. * Perl changed the meaning of (for example) {,3} which did not used to be
  19. recognized as a quantifier. Now it means {0,3} and PCRE2 has also changed.
  20. Note that {,} is still not a quantifier.
  21. * Following Perl, allow spaces and tabs after { and before } in all Perl-
  22. compatible items that use braces, and also around commas in quantifiers. The
  23. one exception in PCRE2 is \u{...}, which is from ECMAScript, not Perl, and
  24. PCRE2 follows ECMAScript usage.
  25. * Changed the meaning of \w and its synonyms and derivatives (\b and \B) in UCP
  26. mode to follow Perl. It now matches characters whose general categories are L
  27. or N or whose particular categories are Mn (non-spacing mark) or Pc
  28. (combining punctuation).
  29. * Changed the default meaning of [:xdigit:] in UCP mode to follow Perl. It now
  30. matches the "fullwidth" versions of hex digits. PCRE2_EXTRA_ASCII_DIGIT can
  31. be used to keep it ASCII only.
  32. * Make PCRE2_UCP the default in UTF mode in pcre2grep and add -no_ucp,
  33. --case-restrict and --posix-digit.
  34. * Add --group-separator and --no-group-separator to pcre2grep.
  35. Version 10.42 11-December-2022
  36. ------------------------------
  37. This is an unexpectedly early release to fix a problem that was introduced in
  38. 10.41. ChangeLog number 19 (GitHub #139) added the default definition of
  39. PCRE2_CALL_CONVENTION to pcre2posix.c instead of pcre2posix.h, which meant that
  40. programs including pcre2posix.h but not pcre2.h couldn't compile. A new test
  41. that checks this case has been added.
  42. A couple of other minor issues are also fixed, and a patch for an intermittent
  43. JIT fault is also included. See ChangeLog and the Git log.
  44. Version 10.41 06-December-2022
  45. ------------------------------
  46. This is another mainly bug-fixing and code-tidying release. There is one
  47. significant upgrade to pcre2grep: it now behaves like GNU grep when matching
  48. more than one pattern and a later pattern matches at an earlier point in the
  49. subject when the matched substrings are being identified by colour or by
  50. offsets.
  51. Version 10.40 15-April-2022
  52. ---------------------------
  53. This is mostly a bug-fixing and code-tidying release. However, there are some
  54. extensions to Unicode property handling:
  55. * Added support for Bidi_Class and a number of binary Unicode properties,
  56. including Bidi_Control.
  57. * A number of changes to script matching for \p and \P:
  58. (a) Script extensions for a character are now coded as a bitmap instead of
  59. a list of script numbers, which should be faster and does not need a
  60. loop.
  61. (b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms
  62. sc and scx).
  63. (c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being
  64. the same as \p{scx:scriptname} because this change happened in Perl at
  65. release 5.26.
  66. (d) The standard Unicode 4-letter abbreviations for script names are now
  67. recognized.
  68. (e) In accordance with Unicode and Perl's "loose matching" rules, spaces,
  69. hyphens, and underscores are ignored in property names, which are then
  70. matched independent of case.
  71. As always, see ChangeLog for a list of all changes (also the Git log).
  72. Version 10.39 29-October-2021
  73. -----------------------------
  74. This release is happening soon after 10.38 because the bug fix is important.
  75. 1. Fix incorrect detection of alternatives in first character search in JIT.
  76. 2. Update to Unicode 14.0.0.
  77. 3. Some code cleanups (see ChangeLog).
  78. Version 10.38 01-October-2021
  79. -----------------------------
  80. As well as some bug fixes and tidies (as always, see ChangeLog for details),
  81. the documentation is updated to list the new URLs, following the move of the
  82. source repository to GitHub and the mailing list to Google Groups.
  83. * The CMake build system can now build both static and shared libraries in one
  84. go.
  85. * Following Perl's lead, \K is now locked out in lookaround assertions by
  86. default, but an option is provided to re-enable the previous behaviour.
  87. Version 10.37 26-May-2021
  88. -------------------------
  89. A few more bug fixes and tidies. The only change of real note is the removal of
  90. the actual POSIX names regcomp etc. from the POSIX wrapper library because
  91. these have caused issues for some applications (see 10.33 #2 below).
  92. Version 10.36 04-December-2020
  93. ------------------------------
  94. Again, mainly bug fixes and tidies. The only enhancements are the addition of
  95. GNU grep's -m (aka --max-count) option to pcre2grep, and also unifying the
  96. handling of substitution strings for both -O and callouts in pcre2grep, with
  97. the addition of $x{...} and $o{...} to allow for characters whose code points
  98. are greater than 255 in Unicode mode.
  99. NOTE: there is an outstanding issue with JIT support for MacOS on arm64
  100. hardware. For details, please see Bugzilla issue #2618.
  101. Version 10.35 15-April-2020
  102. ---------------------------
  103. Bugfixes, tidies, and a few new enhancements.
  104. 1. Capturing groups that contain recursive backreferences to themselves are no
  105. longer automatically atomic, because the restriction is no longer necessary
  106. as a result of the 10.30 restructuring.
  107. 2. Several new options for pcre2_substitute().
  108. 3. When Unicode is supported and PCRE2_UCP is set without PCRE2_UTF, Unicode
  109. character properties are used for upper/lower case computations on characters
  110. whose code points are greater than 127.
  111. 4. The character tables (for low-valued characters) can now more easily be
  112. saved and restored in binary.
  113. 5. Updated to Unicode 13.0.0.
  114. Version 10.34 21-November-2019
  115. ------------------------------
  116. Another release with a few enhancements as well as bugfixes and tidies. The
  117. main new features are:
  118. 1. There is now some support for matching in invalid UTF strings.
  119. 2. Non-atomic positive lookarounds are implemented in the pcre2_match()
  120. interpreter, but not in JIT.
  121. 3. Added two new functions: pcre2_get_match_data_size() and
  122. pcre2_maketables_free().
  123. 4. Upgraded to Unicode 12.1.0.
  124. Version 10.33 16-April-2019
  125. ---------------------------
  126. Yet more bugfixes, tidies, and a few enhancements, summarized here (see
  127. ChangeLog for the full list):
  128. 1. Callouts from pcre2_substitute() are now available.
  129. 2. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper
  130. functions that use the standard POSIX names. However, in pcre2posix.h the POSIX
  131. names are defined as macros. This should help avoid linking with the wrong
  132. library in some environments, while still exporting the POSIX names for
  133. pre-existing programs that use them.
  134. 3. Some new options:
  135. (a) PCRE2_EXTRA_ESCAPED_CR_IS_LF makes \r behave as \n.
  136. (b) PCRE2_EXTRA_ALT_BSUX enables support for ECMAScript 6's \u{hh...}
  137. construct.
  138. (c) PCRE2_COPY_MATCHED_SUBJECT causes a copy of a matched subject to be
  139. made, instead of just remembering a pointer.
  140. 4. Some new Perl features:
  141. (a) Perl 5.28's experimental alphabetic names for atomic groups and
  142. lookaround assertions, for example, (*pla:...) and (*atomic:...).
  143. (b) The new Perl "script run" features (*script_run:...) and
  144. (*atomic_script_run:...) aka (*sr:...) and (*asr:...).
  145. (c) When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in
  146. capture group names.
  147. 5. --disable-percent-zt disables the use of %zu and %td in formatting strings
  148. in pcre2test. They were already automatically disabled for VC and older C
  149. compilers.
  150. 6. Some changes related to callouts in pcre2grep:
  151. (a) Support for running an external program under VMS has been added, in
  152. addition to Windows and fork() support.
  153. (b) --disable-pcre2grep-callout-fork restricts the callout support in
  154. to the inbuilt echo facility.
  155. Version 10.32 10-September-2018
  156. -------------------------------
  157. This is another mainly bugfix and tidying release with a few minor
  158. enhancements. These are the main ones:
  159. 1. pcre2grep now supports the inclusion of binary zeros in patterns that are
  160. read from files via the -f option.
  161. 2. ./configure now supports --enable-jit=auto, which automatically enables JIT
  162. if the hardware supports it.
  163. 3. In pcre2_dfa_match(), internal recursive calls no longer use the stack for
  164. local workspace and local ovectors. Instead, an initial block of stack is
  165. reserved, but if this is insufficient, heap memory is used. The heap limit
  166. parameter now applies to pcre2_dfa_match().
  167. 4. Updated to Unicode version 11.0.0.
  168. 5. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
  169. 6. Added support for \N{U+dddd}, but only in Unicode mode.
  170. 7. Added support for (?^) to unset all imnsx options.
  171. Version 10.31 12-February-2018
  172. ------------------------------
  173. This is mainly a bugfix and tidying release (see ChangeLog for full details).
  174. However, there are some minor enhancements.
  175. 1. New pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
  176. PCRE2_CONFIG_COMPILED_WIDTHS.
  177. 2. New pcre2_pattern_info() option PCRE2_INFO_EXTRAOPTIONS to retrieve the
  178. extra compile time options.
  179. 3. There are now public names for all the pcre2_compile() error numbers.
  180. 4. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
  181. field callout_flags in callout blocks.
  182. Version 10.30 14-August-2017
  183. ----------------------------
  184. The full list of changes that includes bugfixes and tidies is, as always, in
  185. ChangeLog. These are the most important new features:
  186. 1. The main interpreter, pcre2_match(), has been refactored into a new version
  187. that does not use recursive function calls (and therefore the system stack) for
  188. remembering backtracking positions. This makes --disable-stack-for-recursion a
  189. NOOP. The new implementation allows backtracking into recursive group calls in
  190. patterns, making it more compatible with Perl, and also fixes some other
  191. previously hard-to-do issues. For patterns that have a lot of backtracking, the
  192. heap is now used, and there is an explicit limit on the amount, settable by
  193. pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
  194. but is renamed as "depth limit" (though the old names remain for
  195. compatibility).
  196. There is also a change in the way callouts from pcre2_match() are handled. The
  197. offset_vector field in the callout block is no longer a pointer to the
  198. actual ovector that was passed to the matching function in the match data
  199. block. Instead it points to an internal ovector of a size large enough to hold
  200. all possible captured substrings in the pattern.
  201. 2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at
  202. the end of the subject.
  203. 3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and
  204. pcre2test is upgraded to support it. Setting within the pattern by (?xx) is
  205. also supported.
  206. 4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
  207. 5. Additional compile options in the compile context are now available, and the
  208. first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
  209. PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
  210. 6. The newline type PCRE2_NEWLINE_NUL is now available.
  211. 7. The match limit value now also applies to pcre2_dfa_match() as there are
  212. patterns that can use up a lot of resources without necessarily recursing very
  213. deeply.
  214. 8. The option REG_PEND (a GNU extension) is now available for the POSIX
  215. wrapper. Also there is a new option PCRE2_LITERAL which is used to support
  216. REG_NOSPEC.
  217. 9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the
  218. benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented
  219. using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This
  220. is tidier and also fixes some bugs.
  221. 10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0.
  222. 11. There are some experimental functions for converting foreign patterns
  223. (globs and POSIX patterns) into PCRE2 patterns.
  224. Version 10.23 14-February-2017
  225. ------------------------------
  226. 1. ChangeLog has the details of a lot of bug fixes and tidies.
  227. 2. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
  228. checking is now done in the pre-pass that identifies capturing groups. This has
  229. reduced the amount of duplication and made the code tidier. While doing this,
  230. some minor bugs and Perl incompatibilities were fixed (see ChangeLog for
  231. details.)
  232. 3. Back references are now permitted in lookbehind assertions when there are
  233. no duplicated group numbers (that is, (?| has not been used), and, if the
  234. reference is by name, there is only one group of that name. The referenced
  235. group must, of course be of fixed length.
  236. 4. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back
  237. reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does
  238. not recognize this syntax.
  239. 5. pcre2grep now automatically expands its buffer up to a maximum set by
  240. --max-buffer-size.
  241. 6. The -t option (grand total) has been added to pcre2grep.
  242. 7. A new function called pcre2_code_copy_with_tables() exists to copy a
  243. compiled pattern along with a private copy of the character tables that is
  244. uses.
  245. 8. A user supplied a number of patches to upgrade pcre2grep under Windows and
  246. tidy the code.
  247. 9. Several updates have been made to pcre2test and test scripts (see
  248. ChangeLog).
  249. Version 10.22 29-July-2016
  250. --------------------------
  251. 1. ChangeLog has the details of a number of bug fixes.
  252. 2. The POSIX wrapper function regcomp() did not used to support back references
  253. and subroutine calls if called with the REG_NOSUB option. It now does.
  254. 3. A new function, pcre2_code_copy(), is added, to make a copy of a compiled
  255. pattern.
  256. 4. Support for string callouts is added to pcre2grep.
  257. 5. Added the PCRE2_NO_JIT option to pcre2_match().
  258. 6. The pcre2_get_error_message() function now returns with a negative error
  259. code if the error number it is given is unknown.
  260. 7. Several updates have been made to pcre2test and test scripts (see
  261. ChangeLog).
  262. Version 10.21 12-January-2016
  263. -----------------------------
  264. 1. Many bugs have been fixed. A large number of them were provoked only by very
  265. strange pattern input, and were discovered by fuzzers. Some others were
  266. discovered by code auditing. See ChangeLog for details.
  267. 2. The Unicode tables have been updated to Unicode version 8.0.0.
  268. 3. For Perl compatibility in EBCDIC environments, ranges such as a-z in a
  269. class, where both values are literal letters in the same case, omit the
  270. non-letter EBCDIC code points within the range.
  271. 4. There have been a number of enhancements to the pcre2_substitute() function,
  272. giving more flexibility to replacement facilities. It is now also possible to
  273. cause the function to return the needed buffer size if the one given is too
  274. small.
  275. 5. The PCRE2_ALT_VERBNAMES option causes the "name" parts of special verbs such
  276. as (*THEN:name) to be processed for backslashes and to take note of
  277. PCRE2_EXTENDED.
  278. 6. PCRE2_INFO_HASBACKSLASHC makes it possible for a client to find out if a
  279. pattern uses \C, and --never-backslash-C makes it possible to compile a version
  280. PCRE2 in which the use of \C is always forbidden.
  281. 7. A limit to the length of pattern that can be handled can now be set by
  282. calling pcre2_set_max_pattern_length().
  283. 8. When matching an unanchored pattern, a match can be required to begin within
  284. a given number of code units after the start of the subject by calling
  285. pcre2_set_offset_limit().
  286. 9. The pcre2test program has been extended to test new facilities, and it can
  287. now run the tests when LF on its own is not a valid newline sequence.
  288. 10. The RunTest script has also been updated to enable more tests to be run.
  289. 11. There have been some minor performance enhancements.
  290. Version 10.20 30-June-2015
  291. --------------------------
  292. 1. Callouts with string arguments and the pcre2_callout_enumerate() function
  293. have been implemented.
  294. 2. The PCRE2_NEVER_BACKSLASH_C option, which locks out the use of \C, is added.
  295. 3. The PCRE2_ALT_CIRCUMFLEX option lets ^ match after a newline at the end of a
  296. subject in multiline mode.
  297. 4. The way named subpatterns are handled has been refactored. The previous
  298. approach had several bugs.
  299. 5. The handling of \c in EBCDIC environments has been changed to conform to the
  300. perlebcdic document. This is an incompatible change.
  301. 6. Bugs have been mended, many of them discovered by fuzzers.
  302. Version 10.10 06-March-2015
  303. ---------------------------
  304. 1. Serialization and de-serialization functions have been added to the API,
  305. making it possible to save and restore sets of compiled patterns, though
  306. restoration must be done in the same environment that was used for compilation.
  307. 2. The (*NO_JIT) feature has been added; this makes it possible for a pattern
  308. creator to specify that JIT is not to be used.
  309. 3. A number of bugs have been fixed. In particular, bugs that caused building
  310. on Windows using CMake to fail have been mended.
  311. Version 10.00 05-January-2015
  312. -----------------------------
  313. Version 10.00 is the first release of PCRE2, a revised API for the PCRE
  314. library. Changes prior to 10.00 are logged in the ChangeLog file for the old
  315. API, up to item 20 for release 8.36. New programs are recommended to use the
  316. new library. Programs that use the original (PCRE1) API will need changing
  317. before linking with the new library.
  318. ****