123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484 |
- News about PCRE2 releases
- -------------------------
- Version 10.43 16-February-2024
- ------------------------------
- There are quite a lot of changes in this release (see ChangeLog and git log for
- a list). Those that are not bugfixes or code tidies are:
- * The JIT code no longer supports ARMv5 architecture.
- * A new function pcre2_get_match_data_heapframes_size() for finer heap control.
- * New option flags to restrict the interaction between ASCII and non-ASCII
- characters for caseless matching and \d and friends. There are also new
- pattern constructs to control these flags from within a pattern.
- * Upgrade to Unicode 15.0.0.
- * Treat a NULL pattern with zero length as an empty string.
- * Added support for limited-length variable-length lookbehind assertions, with
- a default maximum length of 255 characters (same as Perl) but with a function
- to adjust the limit.
- * Support for LoongArch in JIT.
- * Perl changed the meaning of (for example) {,3} which did not used to be
- recognized as a quantifier. Now it means {0,3} and PCRE2 has also changed.
- Note that {,} is still not a quantifier.
- * Following Perl, allow spaces and tabs after { and before } in all Perl-
- compatible items that use braces, and also around commas in quantifiers. The
- one exception in PCRE2 is \u{...}, which is from ECMAScript, not Perl, and
- PCRE2 follows ECMAScript usage.
- * Changed the meaning of \w and its synonyms and derivatives (\b and \B) in UCP
- mode to follow Perl. It now matches characters whose general categories are L
- or N or whose particular categories are Mn (non-spacing mark) or Pc
- (combining punctuation).
- * Changed the default meaning of [:xdigit:] in UCP mode to follow Perl. It now
- matches the "fullwidth" versions of hex digits. PCRE2_EXTRA_ASCII_DIGIT can
- be used to keep it ASCII only.
- * Make PCRE2_UCP the default in UTF mode in pcre2grep and add -no_ucp,
- --case-restrict and --posix-digit.
- * Add --group-separator and --no-group-separator to pcre2grep.
- Version 10.42 11-December-2022
- ------------------------------
- This is an unexpectedly early release to fix a problem that was introduced in
- 10.41. ChangeLog number 19 (GitHub #139) added the default definition of
- PCRE2_CALL_CONVENTION to pcre2posix.c instead of pcre2posix.h, which meant that
- programs including pcre2posix.h but not pcre2.h couldn't compile. A new test
- that checks this case has been added.
- A couple of other minor issues are also fixed, and a patch for an intermittent
- JIT fault is also included. See ChangeLog and the Git log.
- Version 10.41 06-December-2022
- ------------------------------
- This is another mainly bug-fixing and code-tidying release. There is one
- significant upgrade to pcre2grep: it now behaves like GNU grep when matching
- more than one pattern and a later pattern matches at an earlier point in the
- subject when the matched substrings are being identified by colour or by
- offsets.
- Version 10.40 15-April-2022
- ---------------------------
- This is mostly a bug-fixing and code-tidying release. However, there are some
- extensions to Unicode property handling:
- * Added support for Bidi_Class and a number of binary Unicode properties,
- including Bidi_Control.
- * A number of changes to script matching for \p and \P:
- (a) Script extensions for a character are now coded as a bitmap instead of
- a list of script numbers, which should be faster and does not need a
- loop.
- (b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms
- sc and scx).
- (c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being
- the same as \p{scx:scriptname} because this change happened in Perl at
- release 5.26.
- (d) The standard Unicode 4-letter abbreviations for script names are now
- recognized.
- (e) In accordance with Unicode and Perl's "loose matching" rules, spaces,
- hyphens, and underscores are ignored in property names, which are then
- matched independent of case.
- As always, see ChangeLog for a list of all changes (also the Git log).
- Version 10.39 29-October-2021
- -----------------------------
- This release is happening soon after 10.38 because the bug fix is important.
- 1. Fix incorrect detection of alternatives in first character search in JIT.
- 2. Update to Unicode 14.0.0.
- 3. Some code cleanups (see ChangeLog).
- Version 10.38 01-October-2021
- -----------------------------
- As well as some bug fixes and tidies (as always, see ChangeLog for details),
- the documentation is updated to list the new URLs, following the move of the
- source repository to GitHub and the mailing list to Google Groups.
- * The CMake build system can now build both static and shared libraries in one
- go.
- * Following Perl's lead, \K is now locked out in lookaround assertions by
- default, but an option is provided to re-enable the previous behaviour.
- Version 10.37 26-May-2021
- -------------------------
- A few more bug fixes and tidies. The only change of real note is the removal of
- the actual POSIX names regcomp etc. from the POSIX wrapper library because
- these have caused issues for some applications (see 10.33 #2 below).
- Version 10.36 04-December-2020
- ------------------------------
- Again, mainly bug fixes and tidies. The only enhancements are the addition of
- GNU grep's -m (aka --max-count) option to pcre2grep, and also unifying the
- handling of substitution strings for both -O and callouts in pcre2grep, with
- the addition of $x{...} and $o{...} to allow for characters whose code points
- are greater than 255 in Unicode mode.
- NOTE: there is an outstanding issue with JIT support for MacOS on arm64
- hardware. For details, please see Bugzilla issue #2618.
- Version 10.35 15-April-2020
- ---------------------------
- Bugfixes, tidies, and a few new enhancements.
- 1. Capturing groups that contain recursive backreferences to themselves are no
- longer automatically atomic, because the restriction is no longer necessary
- as a result of the 10.30 restructuring.
- 2. Several new options for pcre2_substitute().
- 3. When Unicode is supported and PCRE2_UCP is set without PCRE2_UTF, Unicode
- character properties are used for upper/lower case computations on characters
- whose code points are greater than 127.
- 4. The character tables (for low-valued characters) can now more easily be
- saved and restored in binary.
- 5. Updated to Unicode 13.0.0.
- Version 10.34 21-November-2019
- ------------------------------
- Another release with a few enhancements as well as bugfixes and tidies. The
- main new features are:
- 1. There is now some support for matching in invalid UTF strings.
- 2. Non-atomic positive lookarounds are implemented in the pcre2_match()
- interpreter, but not in JIT.
- 3. Added two new functions: pcre2_get_match_data_size() and
- pcre2_maketables_free().
- 4. Upgraded to Unicode 12.1.0.
- Version 10.33 16-April-2019
- ---------------------------
- Yet more bugfixes, tidies, and a few enhancements, summarized here (see
- ChangeLog for the full list):
- 1. Callouts from pcre2_substitute() are now available.
- 2. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper
- functions that use the standard POSIX names. However, in pcre2posix.h the POSIX
- names are defined as macros. This should help avoid linking with the wrong
- library in some environments, while still exporting the POSIX names for
- pre-existing programs that use them.
- 3. Some new options:
- (a) PCRE2_EXTRA_ESCAPED_CR_IS_LF makes \r behave as \n.
- (b) PCRE2_EXTRA_ALT_BSUX enables support for ECMAScript 6's \u{hh...}
- construct.
- (c) PCRE2_COPY_MATCHED_SUBJECT causes a copy of a matched subject to be
- made, instead of just remembering a pointer.
- 4. Some new Perl features:
- (a) Perl 5.28's experimental alphabetic names for atomic groups and
- lookaround assertions, for example, (*pla:...) and (*atomic:...).
- (b) The new Perl "script run" features (*script_run:...) and
- (*atomic_script_run:...) aka (*sr:...) and (*asr:...).
- (c) When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in
- capture group names.
- 5. --disable-percent-zt disables the use of %zu and %td in formatting strings
- in pcre2test. They were already automatically disabled for VC and older C
- compilers.
- 6. Some changes related to callouts in pcre2grep:
- (a) Support for running an external program under VMS has been added, in
- addition to Windows and fork() support.
- (b) --disable-pcre2grep-callout-fork restricts the callout support in
- to the inbuilt echo facility.
- Version 10.32 10-September-2018
- -------------------------------
- This is another mainly bugfix and tidying release with a few minor
- enhancements. These are the main ones:
- 1. pcre2grep now supports the inclusion of binary zeros in patterns that are
- read from files via the -f option.
- 2. ./configure now supports --enable-jit=auto, which automatically enables JIT
- if the hardware supports it.
- 3. In pcre2_dfa_match(), internal recursive calls no longer use the stack for
- local workspace and local ovectors. Instead, an initial block of stack is
- reserved, but if this is insufficient, heap memory is used. The heap limit
- parameter now applies to pcre2_dfa_match().
- 4. Updated to Unicode version 11.0.0.
- 5. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
- 6. Added support for \N{U+dddd}, but only in Unicode mode.
- 7. Added support for (?^) to unset all imnsx options.
- Version 10.31 12-February-2018
- ------------------------------
- This is mainly a bugfix and tidying release (see ChangeLog for full details).
- However, there are some minor enhancements.
- 1. New pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
- PCRE2_CONFIG_COMPILED_WIDTHS.
- 2. New pcre2_pattern_info() option PCRE2_INFO_EXTRAOPTIONS to retrieve the
- extra compile time options.
- 3. There are now public names for all the pcre2_compile() error numbers.
- 4. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
- field callout_flags in callout blocks.
- Version 10.30 14-August-2017
- ----------------------------
- The full list of changes that includes bugfixes and tidies is, as always, in
- ChangeLog. These are the most important new features:
- 1. The main interpreter, pcre2_match(), has been refactored into a new version
- that does not use recursive function calls (and therefore the system stack) for
- remembering backtracking positions. This makes --disable-stack-for-recursion a
- NOOP. The new implementation allows backtracking into recursive group calls in
- patterns, making it more compatible with Perl, and also fixes some other
- previously hard-to-do issues. For patterns that have a lot of backtracking, the
- heap is now used, and there is an explicit limit on the amount, settable by
- pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
- but is renamed as "depth limit" (though the old names remain for
- compatibility).
- There is also a change in the way callouts from pcre2_match() are handled. The
- offset_vector field in the callout block is no longer a pointer to the
- actual ovector that was passed to the matching function in the match data
- block. Instead it points to an internal ovector of a size large enough to hold
- all possible captured substrings in the pattern.
- 2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at
- the end of the subject.
- 3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and
- pcre2test is upgraded to support it. Setting within the pattern by (?xx) is
- also supported.
- 4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
- 5. Additional compile options in the compile context are now available, and the
- first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
- PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
- 6. The newline type PCRE2_NEWLINE_NUL is now available.
- 7. The match limit value now also applies to pcre2_dfa_match() as there are
- patterns that can use up a lot of resources without necessarily recursing very
- deeply.
- 8. The option REG_PEND (a GNU extension) is now available for the POSIX
- wrapper. Also there is a new option PCRE2_LITERAL which is used to support
- REG_NOSPEC.
- 9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the
- benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented
- using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This
- is tidier and also fixes some bugs.
- 10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0.
- 11. There are some experimental functions for converting foreign patterns
- (globs and POSIX patterns) into PCRE2 patterns.
- Version 10.23 14-February-2017
- ------------------------------
- 1. ChangeLog has the details of a lot of bug fixes and tidies.
- 2. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
- checking is now done in the pre-pass that identifies capturing groups. This has
- reduced the amount of duplication and made the code tidier. While doing this,
- some minor bugs and Perl incompatibilities were fixed (see ChangeLog for
- details.)
- 3. Back references are now permitted in lookbehind assertions when there are
- no duplicated group numbers (that is, (?| has not been used), and, if the
- reference is by name, there is only one group of that name. The referenced
- group must, of course be of fixed length.
- 4. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back
- reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does
- not recognize this syntax.
- 5. pcre2grep now automatically expands its buffer up to a maximum set by
- --max-buffer-size.
- 6. The -t option (grand total) has been added to pcre2grep.
- 7. A new function called pcre2_code_copy_with_tables() exists to copy a
- compiled pattern along with a private copy of the character tables that is
- uses.
- 8. A user supplied a number of patches to upgrade pcre2grep under Windows and
- tidy the code.
- 9. Several updates have been made to pcre2test and test scripts (see
- ChangeLog).
- Version 10.22 29-July-2016
- --------------------------
- 1. ChangeLog has the details of a number of bug fixes.
- 2. The POSIX wrapper function regcomp() did not used to support back references
- and subroutine calls if called with the REG_NOSUB option. It now does.
- 3. A new function, pcre2_code_copy(), is added, to make a copy of a compiled
- pattern.
- 4. Support for string callouts is added to pcre2grep.
- 5. Added the PCRE2_NO_JIT option to pcre2_match().
- 6. The pcre2_get_error_message() function now returns with a negative error
- code if the error number it is given is unknown.
- 7. Several updates have been made to pcre2test and test scripts (see
- ChangeLog).
- Version 10.21 12-January-2016
- -----------------------------
- 1. Many bugs have been fixed. A large number of them were provoked only by very
- strange pattern input, and were discovered by fuzzers. Some others were
- discovered by code auditing. See ChangeLog for details.
- 2. The Unicode tables have been updated to Unicode version 8.0.0.
- 3. For Perl compatibility in EBCDIC environments, ranges such as a-z in a
- class, where both values are literal letters in the same case, omit the
- non-letter EBCDIC code points within the range.
- 4. There have been a number of enhancements to the pcre2_substitute() function,
- giving more flexibility to replacement facilities. It is now also possible to
- cause the function to return the needed buffer size if the one given is too
- small.
- 5. The PCRE2_ALT_VERBNAMES option causes the "name" parts of special verbs such
- as (*THEN:name) to be processed for backslashes and to take note of
- PCRE2_EXTENDED.
- 6. PCRE2_INFO_HASBACKSLASHC makes it possible for a client to find out if a
- pattern uses \C, and --never-backslash-C makes it possible to compile a version
- PCRE2 in which the use of \C is always forbidden.
- 7. A limit to the length of pattern that can be handled can now be set by
- calling pcre2_set_max_pattern_length().
- 8. When matching an unanchored pattern, a match can be required to begin within
- a given number of code units after the start of the subject by calling
- pcre2_set_offset_limit().
- 9. The pcre2test program has been extended to test new facilities, and it can
- now run the tests when LF on its own is not a valid newline sequence.
- 10. The RunTest script has also been updated to enable more tests to be run.
- 11. There have been some minor performance enhancements.
- Version 10.20 30-June-2015
- --------------------------
- 1. Callouts with string arguments and the pcre2_callout_enumerate() function
- have been implemented.
- 2. The PCRE2_NEVER_BACKSLASH_C option, which locks out the use of \C, is added.
- 3. The PCRE2_ALT_CIRCUMFLEX option lets ^ match after a newline at the end of a
- subject in multiline mode.
- 4. The way named subpatterns are handled has been refactored. The previous
- approach had several bugs.
- 5. The handling of \c in EBCDIC environments has been changed to conform to the
- perlebcdic document. This is an incompatible change.
- 6. Bugs have been mended, many of them discovered by fuzzers.
- Version 10.10 06-March-2015
- ---------------------------
- 1. Serialization and de-serialization functions have been added to the API,
- making it possible to save and restore sets of compiled patterns, though
- restoration must be done in the same environment that was used for compilation.
- 2. The (*NO_JIT) feature has been added; this makes it possible for a pattern
- creator to specify that JIT is not to be used.
- 3. A number of bugs have been fixed. In particular, bugs that caused building
- on Windows using CMake to fail have been mended.
- Version 10.00 05-January-2015
- -----------------------------
- Version 10.00 is the first release of PCRE2, a revised API for the PCRE
- library. Changes prior to 10.00 are logged in the ChangeLog file for the old
- API, up to item 20 for release 8.36. New programs are recommended to use the
- new library. Programs that use the original (PCRE1) API will need changing
- before linking with the new library.
- ****
|