|
@@ -1,6 +1,944 @@
|
|
|
Change Log for PCRE2
|
|
|
--------------------
|
|
|
|
|
|
+Version 10.39 29-October-2021
|
|
|
+-----------------------------
|
|
|
+
|
|
|
+1. Fix incorrect detection of alternatives in first character search in JIT.
|
|
|
+
|
|
|
+2. Merged patch from @carenas (GitHub #28):
|
|
|
+
|
|
|
+ Visual Studio 2013 includes support for %zu and %td, so let newer
|
|
|
+ versions of it avoid the fallback, and while at it, make sure that
|
|
|
+ the first check is for DISABLE_PERCENT_ZT so it will be always
|
|
|
+ honoured if chosen.
|
|
|
+
|
|
|
+ prtdiff_t is signed, so use a signed type instead, and make sure
|
|
|
+ that an appropiate width is chosen if pointers are 64bit wide and
|
|
|
+ long is not (ex: Windows 64bit).
|
|
|
+
|
|
|
+ IMHO removing the cast (and therefore the positibilty of truncation)
|
|
|
+ make the code cleaner and the fallback is likely portable enough
|
|
|
+ with all 64-bit POSIX systems doing LP64 except for Windows.
|
|
|
+
|
|
|
+3. Merged patch from @carenas (GitHub #29) to update to Unicode 14.0.0.
|
|
|
+
|
|
|
+4. Merged patch from @carenas (GitHub #30):
|
|
|
+
|
|
|
+ * Cleanup: remove references to no longer used stdint.h
|
|
|
+
|
|
|
+ Since 19c50b9d (Unconditionally use inttypes.h instead of trying for stdint.h
|
|
|
+ (simplification) and remove the now unnecessary inclusion in
|
|
|
+ pcre2_internal.h., 2018-11-14), stdint.h is no longer used.
|
|
|
+
|
|
|
+ Remove checks for it in autotools and CMake and document better the expected
|
|
|
+ build failures for systems that might have stdint.h (C99) and not inttypes.h
|
|
|
+ (from POSIX), like old Windows.
|
|
|
+
|
|
|
+ * Cleanup: remove detection for inttypes.h which is a hard dependency
|
|
|
+
|
|
|
+ CMake checks for standard headers are not meant to be used for hard
|
|
|
+ dependencies, so will prevent a possible fallback to work.
|
|
|
+
|
|
|
+ Alternatively, the header could be checked to make the configuration fail
|
|
|
+ instead of breaking the build, but that was punted, as it was missing anyway
|
|
|
+ from autotools.
|
|
|
+
|
|
|
+5. Merged patch from @carenas (GitHub #32):
|
|
|
+
|
|
|
+ * jit: allow building with ancient MSVC versions
|
|
|
+
|
|
|
+ Visual Studio older than 2013 fails to build with JIT enabled, because it is
|
|
|
+ unable to parse non C89 compatible syntax, with mixed declarations and code.
|
|
|
+ While most recent compilers wouldn't even report this as a warning since it
|
|
|
+ is valid C99, it could be also made visible by adding to gcc/clang the
|
|
|
+ -Wdeclaration-after-statement flag at build time.
|
|
|
+
|
|
|
+ Move the code below the affected definitions.
|
|
|
+
|
|
|
+ * pcre2grep: avoid mixing declarations with code
|
|
|
+
|
|
|
+ Since d5a61ee8 (Patch to detect (and ignore) symlink loops in pcre2grep,
|
|
|
+ 2021-08-28), code will fail to build in a strict C89 compiler.
|
|
|
+
|
|
|
+ Reformat slightly to make it C89 compatible again.
|
|
|
+
|
|
|
+
|
|
|
+Version 10.38 01-October-2021
|
|
|
+-----------------------------
|
|
|
+
|
|
|
+1. Fix invalid single character repetition issues in JIT when the repetition
|
|
|
+is inside a capturing bracket and the bracket is preceeded by character
|
|
|
+literals.
|
|
|
+
|
|
|
+2. Installed revised CMake configuration files provided by Jan-Willem Blokland.
|
|
|
+This extends the CMake build system to build both static and shared libraries
|
|
|
+in one go, builds the static library with PIC, and exposes PCRE2 libraries
|
|
|
+using the CMake config files. JWB provided these notes:
|
|
|
+
|
|
|
+- Introduced CMake variable BUILD_STATIC_LIBS to build the static library.
|
|
|
+
|
|
|
+- Make a small modification to config-cmake.h.in by removing the PCRE2_STATIC
|
|
|
+ variable. Added PCRE2_STATIC variable to the static build using the
|
|
|
+ target_compile_definitions() function.
|
|
|
+
|
|
|
+- Extended the CMake config files.
|
|
|
+
|
|
|
+ - Introduced CMake variable PCRE2_USE_STATIC_LIBS to easily switch between
|
|
|
+ the static and shared libraries.
|
|
|
+
|
|
|
+ - Added the PCRE_STATIC variable to the target compile definitions for the
|
|
|
+ import of the static library.
|
|
|
+
|
|
|
+Building static and shared libraries using MSVC results in a name clash of
|
|
|
+the libraries. Both static and shared library builds create, for example, the
|
|
|
+file pcre2-8.lib. Therefore, I decided to change the static library names by
|
|
|
+adding "-static". For example, pcre2-8.lib has become pcre2-8-static.lib.
|
|
|
+[Comment by PH: this is MSVC-specific. It doesn't happen on Linux.]
|
|
|
+
|
|
|
+3. Increased the minimum release number for CMake to 3.0.0 because older than
|
|
|
+2.8.12 is deprecated (it was set to 2.8.5) and causes warnings. Even 3.0.0 is
|
|
|
+quite old; it was released in 2014.
|
|
|
+
|
|
|
+4. Implemented a modified version of Thomas Tempelmann's pcre2grep patch for
|
|
|
+detecting symlink loops. This is dependent on the availability of realpath(),
|
|
|
+which is now tested for in ./configure and CMakeLists.txt.
|
|
|
+
|
|
|
+5. Implemented a modified version of Thomas Tempelmann's patch for faster
|
|
|
+case-independent "first code unit" searches for unanchored patterns in 8-bit
|
|
|
+mode in the interpreters. Instead of just remembering whether one case matched
|
|
|
+or not, it remembers the position of a previous match so as to avoid
|
|
|
+unnecessary repeated searching.
|
|
|
+
|
|
|
+6. Perl now locks out \K in lookarounds, so PCRE2 now does the same by default.
|
|
|
+However, just in case anybody was relying on the old behaviour, there is an
|
|
|
+option called PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK that enables the old behaviour.
|
|
|
+An option has also been added to pcre2grep to enable this.
|
|
|
+
|
|
|
+7. Re-enable a JIT optimization which was unintentionally disabled in 10.35.
|
|
|
+
|
|
|
+8. There is a loop counter to catch excessively crazy patterns when checking
|
|
|
+the lengths of lookbehinds at compile time. This was incorrectly getting reset
|
|
|
+whenever a lookahead was processed, leading to some fuzzer-generated patterns
|
|
|
+taking a very long time to compile when (?|) was present in the pattern,
|
|
|
+because (?|) disables caching of group lengths.
|
|
|
+
|
|
|
+
|
|
|
+Version 10.37 26-May-2021
|
|
|
+-------------------------
|
|
|
+
|
|
|
+1. Change RunGrepTest to use tr instead of sed when testing with binary
|
|
|
+zero bytes, because sed varies a lot from system to system and has problems
|
|
|
+with binary zeros. This is from Bugzilla #2681. Patch from Jeremie
|
|
|
+Courreges-Anglas via Nam Nguyen. This fixes RunGrepTest for OpenBSD. Later:
|
|
|
+it broke it for at least one version of Solaris, where tr can't handle binary
|
|
|
+zeros. However, that system had /usr/xpg4/bin/tr installed, which works OK, so
|
|
|
+RunGrepTest now checks for that command and uses it if found.
|
|
|
+
|
|
|
+2. Compiling with gcc 10.2's -fanalyzer option showed up a hypothetical problem
|
|
|
+with a NULL dereference. I don't think this case could ever occur in practice,
|
|
|
+but I have put in a check in order to get rid of the compiler error.
|
|
|
+
|
|
|
+3. An alternative patch for CMakeLists.txt because 10.36 #4 breaks CMake on
|
|
|
+Windows. Patch from [email protected] fixes bugzilla #2688.
|
|
|
+
|
|
|
+4. Two bugs related to over-large numbers have been fixed so the behaviour is
|
|
|
+now the same as Perl.
|
|
|
+
|
|
|
+ (a) A pattern such as /\214748364/ gave an overflow error instead of being
|
|
|
+ treated as the octal number \214 followed by literal digits.
|
|
|
+
|
|
|
+ (b) A sequence such as {65536 that has no terminating } so is not a
|
|
|
+ quantifier was nevertheless complaining that a quantifier number was too big.
|
|
|
+
|
|
|
+5. A run of autoconf suggested that configure.ac was out-of-date with respect
|
|
|
+to the lastest autoconf. Running autoupdate made some valid changes, some valid
|
|
|
+suggestions, and also some invalid changes, which were fixed by hand. Autoconf
|
|
|
+now runs clean and the resulting "configure" seems to work, so I hope nothing
|
|
|
+is broken. Later: the requirement for autoconf 2.70 broke some automatic test
|
|
|
+robots. It doesn't seem to be necessary: trying a reduction to 2.60.
|
|
|
+
|
|
|
+6. The pattern /a\K.(?0)*/ when matched against "abac" by the interpreter gave
|
|
|
+the answer "bac", whereas Perl and JIT both yield "c". This was because the
|
|
|
+effect of \K was not propagating back from the full pattern recursion. Other
|
|
|
+recursions such as /(a\K.(?1)*)/ did not have this problem.
|
|
|
+
|
|
|
+7. Restore single character repetition optimization in JIT. Currently fewer
|
|
|
+character repetitions are optimized than in 10.34.
|
|
|
+
|
|
|
+8. When the names of the functions in the POSIX wrapper were changed to
|
|
|
+pcre2_regcomp() etc. (see change 10.33 #4 below), functions with the original
|
|
|
+names were left in the library so that pre-compiled programs would still work.
|
|
|
+However, this has proved troublesome when programs link with several libraries,
|
|
|
+some of which use PCRE2 via the POSIX interface while others use a native POSIX
|
|
|
+library. For this reason, the POSIX function names are removed in this release.
|
|
|
+The macros in pcre2posix.h should ensure that re-compiling fixes any programs
|
|
|
+that haven't been compiled since before 10.33.
|
|
|
+
|
|
|
+
|
|
|
+Version 10.36 04-December-2020
|
|
|
+------------------------------
|
|
|
+
|
|
|
+1. Add CET_CFLAGS so that when Intel CET is enabled, pass -mshstk to
|
|
|
+compiler. This fixes https://bugs.exim.org/show_bug.cgi?id=2578. Patch for
|
|
|
+Makefile.am and configure.ac by H.J. Lu. Equivalent patch for CMakeLists.txt
|
|
|
+invented by PH.
|
|
|
+
|
|
|
+2. Fix inifinite loop when a single byte newline is searched in JIT when
|
|
|
+invalid utf8 mode is enabled.
|
|
|
+
|
|
|
+3. Updated CMakeLists.txt with patch from Wolfgang Stöggl (Bugzilla #2584):
|
|
|
+
|
|
|
+ - Include GNUInstallDirs and use ${CMAKE_INSTALL_LIBDIR} instead of hardcoded
|
|
|
+ lib. This allows differentiation between lib and lib64.
|
|
|
+ CMAKE_INSTALL_LIBDIR is used for installation of libraries and also for
|
|
|
+ pkgconfig file generation.
|
|
|
+
|
|
|
+ - Add the version of PCRE2 to the configuration summary like ./configure
|
|
|
+ does.
|
|
|
+
|
|
|
+ - Fix typo: MACTHED_STRING->MATCHED_STRING
|
|
|
+
|
|
|
+4. Updated CMakeLists.txt with another patch from Wolfgang Stöggl (Bugzilla
|
|
|
+#2588):
|
|
|
+
|
|
|
+ - Add escaped double quotes around include directory in CMakeLists.txt to
|
|
|
+ allow spaces in directory names.
|
|
|
+
|
|
|
+ - This fixes a cmake error, if the path of the pcre2 source contains a space.
|
|
|
+
|
|
|
+5. Updated CMakeLists.txt with a patch from B. Scott Michel: CMake's
|
|
|
+documentation suggests using CHECK_SYMBOL_EXISTS over CHECK_FUNCTION_EXIST.
|
|
|
+Moreover, these functions come from specific header files, which need to be
|
|
|
+specified (and, thankfully, are the same on both the Linux and WinXX
|
|
|
+platforms.)
|
|
|
+
|
|
|
+6. Added a (uint32_t) cast to prevent a compiler warning in pcre2_compile.c.
|
|
|
+
|
|
|
+7. Applied a patch from Wolfgang Stöggl (Bugzilla #2600) to fix postfix for
|
|
|
+debug Windows builds using CMake. This also updated configure so that it
|
|
|
+generates *.pc files and pcre2-config with the same content, as in the past.
|
|
|
+
|
|
|
+8. If a pattern ended with (?(VERSION=n.d where n is any number but d is just a
|
|
|
+single digit, the code unit beyond d was being read (i.e. there was a read
|
|
|
+buffer overflow). Fixes ClusterFuzz 23779.
|
|
|
+
|
|
|
+9. After the rework in r1235, certain character ranges were incorrectly
|
|
|
+handled by an optimization in JIT. Furthermore a wrong offset was used to
|
|
|
+read a value from a buffer which could lead to memory overread.
|
|
|
+
|
|
|
+10. Unnoticed for many years was the fact that delimiters other than / in the
|
|
|
+testinput1 and testinput4 files could cause incorrect behaviour when these
|
|
|
+files were processed by perltest.sh. There were several tests that used quotes
|
|
|
+as delimiters, and it was just luck that they didn't go wrong with perltest.sh.
|
|
|
+All the patterns in testinput1 and testinput4 now use / as their delimiter.
|
|
|
+This fixes Bugzilla #2641.
|
|
|
+
|
|
|
+11. Perl has started to give an error for \K within lookarounds (though there
|
|
|
+are cases where it doesn't). PCRE2 still allows this, so the tests that include
|
|
|
+this case have been moved from test 1 to test 2.
|
|
|
+
|
|
|
+12. Further to 10 above, pcre2test has been updated to detect and grumble if a
|
|
|
+delimiter other than / is used after #perltest.
|
|
|
+
|
|
|
+13. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS
|
|
|
+was set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding
|
|
|
+the start of a match was not resetting correctly after a failed match on the
|
|
|
+first valid fragment of the subject, possibly causing incorrect "no match"
|
|
|
+returns on subsequent fragments. For example, the pattern /A/ failed to match
|
|
|
+the subject \xe5A. Fixes Bugzilla #2642.
|
|
|
+
|
|
|
+14. Fixed a bug in character set matching when JIT is enabled and both unicode
|
|
|
+scripts and unicode classes are present at the same time.
|
|
|
+
|
|
|
+15. Added GNU grep's -m (aka --max-count) option to pcre2grep.
|
|
|
+
|
|
|
+16. Refactored substitution processing in pcre2grep strings, both for the -O
|
|
|
+option and when dealing with callouts. There is now a single function that
|
|
|
+handles $ expansion in all cases (instead of multiple copies of almost
|
|
|
+identical code). This means that the same escape sequences are available
|
|
|
+everywhere, which was not previously the case. At the same time, the escape
|
|
|
+sequences $x{...} and $o{...} have been introduced, to allow for characters
|
|
|
+whose code points are greater than 255 in Unicode mode.
|
|
|
+
|
|
|
+17. Applied the patch from Bugzilla #2628 to RunGrepTest. This does an explicit
|
|
|
+test for a version of sed that can handle binary zero, instead of assuming that
|
|
|
+any Linux version will work. Later: replaced $(...) by `...` because not all
|
|
|
+shells recognize the former.
|
|
|
+
|
|
|
+18. Fixed a word boundary check bug in JIT when partial matching is enabled.
|
|
|
+
|
|
|
+19. Fix ARM64 compilation warning in JIT. Patch by Carlo.
|
|
|
+
|
|
|
+20. A bug in the RunTest script meant that if the first part of test 2 failed,
|
|
|
+the failure was not reported.
|
|
|
+
|
|
|
+21. Test 2 was failing when run from a directory other than the source
|
|
|
+directory. This failure was previously missed in RunTest because of 20 above.
|
|
|
+Fixes added to both RunTest and RunTest.bat.
|
|
|
+
|
|
|
+22. Patch to CMakeLists.txt from Daniel to fix problem with testing under
|
|
|
+Windows.
|
|
|
+
|
|
|
+
|
|
|
+Version 10.35 09-May-2020
|
|
|
+---------------------------
|
|
|
+
|
|
|
+1. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT.
|
|
|
+
|
|
|
+2. Fix ARMv5 JIT improper handling of labels right after a constant pool.
|
|
|
+
|
|
|
+3. A JIT bug is fixed which allowed to read the fields of the compiled
|
|
|
+pattern before its existence is checked.
|
|
|
+
|
|
|
+4. Back in the PCRE1 day, capturing groups that contained recursive back
|
|
|
+references to themselves were made atomic (version 8.01, change 18) because
|
|
|
+after the end a repeated group, the captured substrings had their values from
|
|
|
+the final repetition, not from an earlier repetition that might be the
|
|
|
+destination of a backtrack. This feature was documented, and was carried over
|
|
|
+into PCRE2. However, it has now been realized that the major refactoring that
|
|
|
+was done for 10.30 has made this atomicizing unnecessary, and it is confusing
|
|
|
+when users are unaware of it, making some patterns appear not to be working as
|
|
|
+expected. Capture values of recursive back references in repeated groups are
|
|
|
+now correctly backtracked, so this unnecessary restriction has been removed.
|
|
|
+
|
|
|
+5. Added PCRE2_SUBSTITUTE_LITERAL.
|
|
|
+
|
|
|
+6. Avoid some VS compiler warnings.
|
|
|
+
|
|
|
+7. Added PCRE2_SUBSTITUTE_MATCHED.
|
|
|
+
|
|
|
+8. Added (?* and (?<* as synonms for (*napla: and (*naplb: to match another
|
|
|
+regex engine. The Perl regex folks are aware of this usage and have made a note
|
|
|
+about it.
|
|
|
+
|
|
|
+9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
|
|
|
+1, believing that repeating an assertion is pointless. However, if a positive
|
|
|
+assertion contains capturing groups, repetition can be useful. In any case, an
|
|
|
+assertion could always be wrapped in a repeated group. The only restriction
|
|
|
+that is now imposed is that an unlimited maximum is changed to one more than
|
|
|
+the minimum.
|
|
|
+
|
|
|
+10. Fix *THEN verbs in lookahead assertions in JIT.
|
|
|
+
|
|
|
+11. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.
|
|
|
+
|
|
|
+12. The JIT stack should be freed when the low-level stack allocation fails.
|
|
|
+
|
|
|
+13. In pcre2grep, if the final line in a scanned file is output but does not
|
|
|
+end with a newline sequence, add a newline according to the --newline setting.
|
|
|
+
|
|
|
+14. (?(DEFINE)...) groups were not being handled correctly when checking for
|
|
|
+the fixed length of a lookbehind assertion. Such a group within a lookbehind
|
|
|
+should be skipped, as it does not contribute to the length of the group.
|
|
|
+Instead, the (DEFINE) group was being processed, and if at the end of the
|
|
|
+lookbehind, that end was not correctly recognized. Errors such as "lookbehind
|
|
|
+assertion is not fixed length" and also "internal error: bad code value in
|
|
|
+parsed_skip()" could result.
|
|
|
+
|
|
|
+15. Put a limit of 1000 on recursive calls in pcre2_study() when searching
|
|
|
+nested groups for starting code units, in order to avoid stack overflow issues.
|
|
|
+If the limit is reached, it just gives up trying for this optimization.
|
|
|
+
|
|
|
+16. The control verb chain list must always be restored when exiting from a
|
|
|
+recurse function in JIT.
|
|
|
+
|
|
|
+17. Fix a crash which occurs when the character type of an invalid UTF
|
|
|
+character is decoded in JIT.
|
|
|
+
|
|
|
+18. Changes in many areas of the code so that when Unicode is supported and
|
|
|
+PCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for
|
|
|
+upper/lower case computations on characters whose code points are greater than
|
|
|
+127.
|
|
|
+
|
|
|
+19. The function for checking UTF-16 validity was returning an incorrect offset
|
|
|
+for the start of the error when a high surrogate was not followed by a valid
|
|
|
+low surrogate. This caused incorrect behaviour, for example when
|
|
|
+PCRE2_MATCH_INVALID_UTF was set and a match started immediately following the
|
|
|
+invalid high surrogate, such as /aa/ matching "\x{d800}aa".
|
|
|
+
|
|
|
+20. If a DEFINE group immediately preceded a lookbehind assertion, the pattern
|
|
|
+could be mis-compiled and therefore not match correctly. This is the example
|
|
|
+that found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to
|
|
|
+match "word" because the "move back" value was set to zero.
|
|
|
+
|
|
|
+21. Following a request from a user, some extensions and tidies to the
|
|
|
+character tables handling have been done:
|
|
|
+
|
|
|
+ (a) The dftables auxiliary program is renamed pcre2_dftables, but it is still
|
|
|
+ not installed for public use.
|
|
|
+
|
|
|
+ (b) There is now a -b option for pcre2_dftables, which causes the tables to
|
|
|
+ be written in binary. There is also a -help option.
|
|
|
+
|
|
|
+ (c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an
|
|
|
+ application that wants to save tables in binary knows how long they are.
|
|
|
+
|
|
|
+22. Changed setting of CMAKE_MODULE_PATH in CMakeLists.txt from SET to
|
|
|
+LIST(APPEND...) to allow a setting from the command line to be included.
|
|
|
+
|
|
|
+23. Updated to Unicode 13.0.0.
|
|
|
+
|
|
|
+24. CMake build now checks for secure_getenv() and strerror(). Patch by Carlo.
|
|
|
+
|
|
|
+25. Avoid using [-1] as a suffix in pcre2test because it can provoke a compiler
|
|
|
+warning.
|
|
|
+
|
|
|
+26. Added tests for __attribute__((uninitialized)) to both the configure and
|
|
|
+CMake build files, and then applied this attribute to the variable called
|
|
|
+stack_frames_vector[] in pcre2_match(). When implemented, this disables
|
|
|
+automatic initialization (a facility in clang), which can take time on big
|
|
|
+variables.
|
|
|
+
|
|
|
+27. Updated CMakeLists.txt (patches by Uwe Korn) to add support for
|
|
|
+pcre2-config, the libpcre*.pc files, SOVERSION, VERSION and the
|
|
|
+MACHO_*_VERSIONS settings for CMake builds.
|
|
|
+
|
|
|
+28. Another patch to CMakeLists.txt to check for mkostemp (configure already
|
|
|
+does). Patch by Carlo Marcelo Arenas Belon.
|
|
|
+
|
|
|
+29. Check for the existence of memfd_create in both CMake and configure
|
|
|
+configurations. Patch by Carlo Marcelo Arenas Belon.
|
|
|
+
|
|
|
+30. Restrict the configuration setting for the SELinux compatible execmem
|
|
|
+allocator (change 10.30/44) to Linux and NetBSD.
|
|
|
+
|
|
|
+
|
|
|
+Version 10.34 21-November-2019
|
|
|
+------------------------------
|
|
|
+
|
|
|
+1. The maximum number of capturing subpatterns is 65535 (documented), but no
|
|
|
+check on this was ever implemented. This omission has been rectified; it fixes
|
|
|
+ClusterFuzz 14376.
|
|
|
+
|
|
|
+2. Improved the invalid utf32 support of the JIT compiler. Now it correctly
|
|
|
+detects invalid characters in the 0xd800-0xdfff range.
|
|
|
+
|
|
|
+3. Fix minor typo bug in JIT compile when \X is used in a non-UTF string.
|
|
|
+
|
|
|
+4. Add support for matching in invalid UTF strings to the pcre2_match()
|
|
|
+interpreter, and integrate with the existing JIT support via the new
|
|
|
+PCRE2_MATCH_INVALID_UTF compile-time option.
|
|
|
+
|
|
|
+5. Give more error detail for invalid UTF-8 when detected in pcre2grep.
|
|
|
+
|
|
|
+6. Add support for invalid UTF-8 to pcre2grep.
|
|
|
+
|
|
|
+7. Adjust the limit for "must have" code unit searching, in particular,
|
|
|
+increase it substantially for non-anchored patterns.
|
|
|
+
|
|
|
+8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
|
|
|
+minimum is potentially useful.
|
|
|
+
|
|
|
+9. Some changes to the way the minimum subject length is handled:
|
|
|
+
|
|
|
+ * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed;
|
|
|
+ pcre2test now omits this item instead of showing a value of zero.
|
|
|
+
|
|
|
+ * An incorrect minimum length could be calculated for a pattern that
|
|
|
+ contained (*ACCEPT) inside a qualified group whose minimum repetition was
|
|
|
+ zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum
|
|
|
+ of 2. The minimum length scan no longer happens for a pattern that
|
|
|
+ contains (*ACCEPT).
|
|
|
+
|
|
|
+ * When no minimum length is set by the normal scan, but a first and/or last
|
|
|
+ code unit is recorded, set the minimum to 1 or 2 as appropriate.
|
|
|
+
|
|
|
+ * When a pattern contains multiple groups with the same number, a back
|
|
|
+ reference cannot know which one to scan for a minimum length. This used to
|
|
|
+ cause the minimum length finder to give up with no result. Now it treats
|
|
|
+ such references as not adding to the minimum length (which it should have
|
|
|
+ done all along).
|
|
|
+
|
|
|
+ * Furthermore, the above action now happens only if the back reference is to
|
|
|
+ a group that exists more than once in a pattern instead of any back
|
|
|
+ reference in a pattern with duplicate numbers.
|
|
|
+
|
|
|
+10. A (*MARK) value inside a successful condition was not being returned by the
|
|
|
+interpretive matcher (it was returned by JIT). This bug has been mended.
|
|
|
+
|
|
|
+11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work
|
|
|
+if the pattern had more than 32 capturing parentheses. This is fixed. In
|
|
|
+addition (a) the default limit for groups requested by -o<n> has been raised to
|
|
|
+50, (b) the new --om-capture option changes the limit, (c) an error is raised
|
|
|
+if -o asks for a group that is above the limit.
|
|
|
+
|
|
|
+12. The quantifier {1} was always being ignored, but this is incorrect when it
|
|
|
+is made possessive and applied to an item in parentheses, because a
|
|
|
+parenthesized item may contain multiple branches or other backtracking points,
|
|
|
+for example /(a|ab){1}+c/ or /(a+){1}+a/.
|
|
|
+
|
|
|
+13. For partial matches, pcre2test was always showing the maximum lookbehind
|
|
|
+characters, flagged with "<", which is misleading when the lookbehind didn't
|
|
|
+actually look behind the start (because it was later in the pattern). Showing
|
|
|
+all consulted preceding characters for partial matches is now controlled by the
|
|
|
+existing "allusedtext" modifier and, as for complete matches, this facility is
|
|
|
+available only for non-JIT matching, because JIT does not maintain the first
|
|
|
+and last consulted characters.
|
|
|
+
|
|
|
+14. DFA matching (using pcre2_dfa_match()) was not recognising a partial match
|
|
|
+if the end of the subject was encountered in a lookahead (conditional or
|
|
|
+otherwise), an atomic group, or a recursion.
|
|
|
+
|
|
|
+15. Give error if pcre2test -t, -T, -tm or -TM is given an argument of zero.
|
|
|
+
|
|
|
+16. Check for integer overflow when computing lookbehind lengths. Fixes
|
|
|
+Clusterfuzz issue 15636.
|
|
|
+
|
|
|
+17. Implemented non-atomic positive lookaround assertions.
|
|
|
+
|
|
|
+18. If a lookbehind contained a lookahead that contained another lookbehind
|
|
|
+within it, the nested lookbehind was not correctly processed. For example, if
|
|
|
+/(?<=(?=(?<=a)))b/ was matched to "ab" it gave no match instead of matching
|
|
|
+"b".
|
|
|
+
|
|
|
+19. Implemented pcre2_get_match_data_size().
|
|
|
+
|
|
|
+20. Two alterations to partial matching:
|
|
|
+
|
|
|
+ (a) The definition of a partial match is slightly changed: if a pattern
|
|
|
+ contains any lookbehinds, an empty partial match may be given, because this
|
|
|
+ is another situation where adding characters to the current subject can
|
|
|
+ lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab".
|
|
|
+
|
|
|
+ (b) Similarly, if a pattern could match an empty string, an empty partial
|
|
|
+ match may be given. Example: /(?![ab]).*/ with subject "ab". This case
|
|
|
+ applies only to PCRE2_PARTIAL_HARD.
|
|
|
+
|
|
|
+ (c) An empty string partial hard match can be returned for \z and \Z as it
|
|
|
+ is documented that they shouldn't match.
|
|
|
+
|
|
|
+21. A branch that started with (*ACCEPT) was not being recognized as one that
|
|
|
+could match an empty string.
|
|
|
+
|
|
|
+22. Corrected pcre2_set_character_tables() tables data type: was const unsigned
|
|
|
+char * instead of const uint8_t *, as generated by pcre2_maketables().
|
|
|
+
|
|
|
+23. Upgraded to Unicode 12.1.0.
|
|
|
+
|
|
|
+24. Add -jitfast command line option to pcre2test (to make all the jit options
|
|
|
+available directly).
|
|
|
+
|
|
|
+25. Make pcre2test -C show if libreadline or libedit is supported.
|
|
|
+
|
|
|
+26. If the length of one branch of a group exceeded 65535 (the maximum value
|
|
|
+that is remembered as a minimum length), the whole group's length was
|
|
|
+incorrectly recorded as 65535, leading to incorrect "no match" when start-up
|
|
|
+optimizations were in force.
|
|
|
+
|
|
|
+27. The "rightmost consulted character" value was not always correct; in
|
|
|
+particular, if a pattern ended with a negative lookahead, characters that were
|
|
|
+inspected in that lookahead were not included.
|
|
|
+
|
|
|
+28. Add the pcre2_maketables_free() function.
|
|
|
+
|
|
|
+29. The start-up optimization that looks for a unique initial matching
|
|
|
+code unit in the interpretive engines uses memchr() in 8-bit mode. When the
|
|
|
+search is caseless, it was doing so inefficiently, which ended up slowing down
|
|
|
+the match drastically when the subject was very long. The revised code (a)
|
|
|
+remembers if one case is not found, so it never repeats the search for that
|
|
|
+case after a bumpalong and (b) when one case has been found, it searches only
|
|
|
+up to that position for an earlier occurrence of the other case. This fix
|
|
|
+applies to both interpretive pcre2_match() and to pcre2_dfa_match().
|
|
|
+
|
|
|
+30. While scanning to find the minimum length of a group, if any branch has
|
|
|
+minimum length zero, there is no need to scan any subsequent branches (a small
|
|
|
+compile-time performance improvement).
|
|
|
+
|
|
|
+31. Installed a .gitignore file on a user's suggestion. When using the svn
|
|
|
+repository with git (through git svn) this helps keep it tidy.
|
|
|
+
|
|
|
+32. Add underflow check in JIT which may occur when the value of subject
|
|
|
+string pointer is close to 0.
|
|
|
+
|
|
|
+33. Arrange for classes such as [Aa] which contain just the two cases of the
|
|
|
+same character, to be treated as a single caseless character. This causes the
|
|
|
+first and required code unit optimizations to kick in where relevant.
|
|
|
+
|
|
|
+34. Improve the bitmap of starting bytes for positive classes that include wide
|
|
|
+characters, but no property types, in UTF-8 mode. Previously, on encountering
|
|
|
+such a class, the bits for all bytes greater than \xc4 were set, thus
|
|
|
+specifying any character with codepoint >= 0x100. Now the only bits that are
|
|
|
+set are for the relevant bytes that start the wide characters. This can give a
|
|
|
+noticeable performance improvement.
|
|
|
+
|
|
|
+35. If the bitmap of starting code units contains only 1 or 2 bits, replace it
|
|
|
+with a single starting code unit (1 bit) or a caseless single starting code
|
|
|
+unit if the two relevant characters are case-partners. This is particularly
|
|
|
+relevant to the 8-bit library, though it applies to all. It can give a
|
|
|
+performance boost for patterns such as [Ww]ord and (word|WORD). However, this
|
|
|
+optimization doesn't happen if there is a "required" code unit of the same
|
|
|
+value (because the search for a "required" code unit starts at the match start
|
|
|
+for non-unique first code unit patterns, but after a unique first code unit,
|
|
|
+and patterns such as a*a need the former action).
|
|
|
+
|
|
|
+36. Small patch to pcre2posix.c to set the erroroffset field to -1 immediately
|
|
|
+after a successful compile, instead of at the start of matching to avoid a
|
|
|
+sanitizer complaint (regexec is supposed to be thread safe).
|
|
|
+
|
|
|
+37. Add NEON vectorization to JIT to speed up matching of first character and
|
|
|
+pairs of characters on ARM64 CPUs.
|
|
|
+
|
|
|
+38. If a non-ASCII character was the first in a starting assertion in a
|
|
|
+caseless match, the "first code unit" optimization did not get the casing
|
|
|
+right, and the assertion failed to match a character in the other case if it
|
|
|
+did not start with the same code unit.
|
|
|
+
|
|
|
+39. Fixed the incorrect computation of jump sizes on x86 CPUs in JIT. A masking
|
|
|
+operation was incorrectly removed in r1136. Reported by Ralf Junker.
|
|
|
+
|
|
|
+
|
|
|
+Version 10.33 16-April-2019
|
|
|
+---------------------------
|
|
|
+
|
|
|
+1. Added "allvector" to pcre2test to make it easy to check the part of the
|
|
|
+ovector that shouldn't be changed, in particular after substitute and failed or
|
|
|
+partial matches.
|
|
|
+
|
|
|
+2. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
|
|
|
+a greater than 1 fixed quantifier. This issue was found by Yunho Kim.
|
|
|
+
|
|
|
+3. Added support for callouts from pcre2_substitute(). After 10.33-RC1, but
|
|
|
+prior to release, fixed a bug that caused a crash if pcre2_substitute() was
|
|
|
+called with a NULL match context.
|
|
|
+
|
|
|
+4. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper
|
|
|
+functions that use the standard POSIX names. However, in pcre2posix.h the POSIX
|
|
|
+names are defined as macros. This should help avoid linking with the wrong
|
|
|
+library in some environments while still exporting the POSIX names for
|
|
|
+pre-existing programs that use them. (The Debian alternative names are also
|
|
|
+defined as macros, but not documented.)
|
|
|
+
|
|
|
+5. Fix an xclass matching issue in JIT.
|
|
|
+
|
|
|
+6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315).
|
|
|
+
|
|
|
+7. Implement the Perl 5.28 experimental alphabetic names for atomic groups and
|
|
|
+lookaround assertions, for example, (*pla:...) and (*atomic:...). These are
|
|
|
+characterized by a lower case letter following (* and to simplify coding for
|
|
|
+this, the character tables created by pcre2_maketables() were updated to add a
|
|
|
+new "is lower case letter" bit. At the same time, the now unused "is
|
|
|
+hexadecimal digit" bit was removed. The default tables in
|
|
|
+src/pcre2_chartables.c.dist are updated.
|
|
|
+
|
|
|
+8. Implement the new Perl "script run" features (*script_run:...) and
|
|
|
+(*atomic_script_run:...) aka (*sr:...) and (*asr:...).
|
|
|
+
|
|
|
+9. Fixed two typos in change 22 for 10.21, which added special handling for
|
|
|
+ranges such as a-z in EBCDIC environments. The original code probably never
|
|
|
+worked, though there were no bug reports.
|
|
|
+
|
|
|
+10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via
|
|
|
+pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast
|
|
|
+path. Also, when a match fails, set the subject field in the match data to NULL
|
|
|
+for tidiness - none of the substring extractors should reference this after
|
|
|
+match failure.
|
|
|
+
|
|
|
+11. If a pattern started with a subroutine call that had a quantifier with a
|
|
|
+minimum of zero, an incorrect "match must start with this character" could be
|
|
|
+recorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to
|
|
|
+be the first character of a match.
|
|
|
+
|
|
|
+12. The heap limit checking code in pcre2_dfa_match() could suffer from
|
|
|
+overflow if the heap limit was set very large. This could cause incorrect "heap
|
|
|
+limit exceeded" errors.
|
|
|
+
|
|
|
+13. Add "kibibytes" to the heap limit output from pcre2test -C to make the
|
|
|
+units clear.
|
|
|
+
|
|
|
+14. Add a call to pcre2_jit_free_unused_memory() in pcre2grep, for tidiness.
|
|
|
+
|
|
|
+15. Updated the VMS-specific code in pcre2test on the advice of a VMS user.
|
|
|
+
|
|
|
+16. Removed the unnecessary inclusion of stdint.h (or inttypes.h) from
|
|
|
+pcre2_internal.h as it is now included by pcre2.h. Also, change 17 for 10.32
|
|
|
+below was unnecessarily complicated, as inttypes.h is a Standard C header,
|
|
|
+which is defined to be a superset of stdint.h. Instead of conditionally
|
|
|
+including stdint.h or inttypes.h, pcre2.h now unconditionally includes
|
|
|
+inttypes.h. This supports environments that do not have stdint.h but do have
|
|
|
+inttypes.h, which are known to exist. A note in the autotools documentation
|
|
|
+says (November 2018) that there are none known that are the other way round.
|
|
|
+
|
|
|
+17. Added --disable-percent-zt to "configure" (and equivalent to CMake) to
|
|
|
+forcibly disable the use of %zu and %td in formatting strings because there is
|
|
|
+at least one version of VMS that claims to be C99 but does not support these
|
|
|
+modifiers.
|
|
|
+
|
|
|
+18. Added --disable-pcre2grep-callout-fork, which restricts the callout support
|
|
|
+in pcre2grep to the inbuilt echo facility. This may be useful in environments
|
|
|
+that do not support fork().
|
|
|
+
|
|
|
+19. Fix two instances of <= 0 being applied to unsigned integers (the VMS
|
|
|
+compiler complains).
|
|
|
+
|
|
|
+20. Added "fork" support for VMS to pcre2grep, for running an external program
|
|
|
+via a string callout.
|
|
|
+
|
|
|
+21. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel.
|
|
|
+
|
|
|
+22. If a pattern started with (*MARK), (*COMMIT), (*PRUNE), (*SKIP), or (*THEN)
|
|
|
+followed by ^ it was not recognized as anchored.
|
|
|
+
|
|
|
+23. The RunGrepTest script used to cut out the test of NUL characters for
|
|
|
+Solaris and MacOS as printf and sed can't handle them. It seems that the *BSD
|
|
|
+systems can't either. I've inverted the test so that only those OS that are
|
|
|
+known to work (currently only Linux) try to run this test.
|
|
|
+
|
|
|
+24. Some tests in RunGrepTest appended to testtrygrep from two different file
|
|
|
+descriptors instead of redirecting stderr to stdout. This worked on Linux, but
|
|
|
+it was reported not to on other systems, causing the tests to fail.
|
|
|
+
|
|
|
+25. In the RunTest script, make the test for stack setting use the same value
|
|
|
+for the stack as it needs for -bigstack.
|
|
|
+
|
|
|
+26. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning.
|
|
|
+
|
|
|
+26. With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s
|
|
|
+which are valid in character classes, but not as the end of ranges, were being
|
|
|
+treated as literals. An example is [_-\s] (but not [\s-_] because that gave an
|
|
|
+error at the *start* of a range). Now an "invalid range" error is given
|
|
|
+independently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
|
|
|
+
|
|
|
+27. Related to 26 above, PCRE2_BAD_ESCAPE_IS_LITERAL was affecting known escape
|
|
|
+sequences such as \eX when they appeared invalidly in a character class. Now
|
|
|
+the option applies only to unrecognized or malformed escape sequences.
|
|
|
+
|
|
|
+28. Fix word boundary in JIT compiler. Patch by Mike Munday.
|
|
|
+
|
|
|
+29. The pcre2_dfa_match() function was incorrectly handling conditional version
|
|
|
+tests such as (?(VERSION>=0)...) when the version test was true. Incorrect
|
|
|
+processing or a crash could result.
|
|
|
+
|
|
|
+30. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group
|
|
|
+names, as Perl does. There was a small bug in this new code, found by
|
|
|
+ClusterFuzz 12950, fixed before release.
|
|
|
+
|
|
|
+31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh}
|
|
|
+construct.
|
|
|
+
|
|
|
+32. Compile \p{Any} to be the same as . in DOTALL mode, so that it benefits
|
|
|
+from auto-anchoring if \p{Any}* starts a pattern.
|
|
|
+
|
|
|
+33. Compile invalid UTF check in JIT test when only pcre32 is enabled.
|
|
|
+
|
|
|
+34. For some time now, CMake has been warning about the setting of policy
|
|
|
+CMP0026 to "OLD" in CmakeLists.txt, and hinting that the feature might be
|
|
|
+removed in a future version. A request for CMake expertise on the list produced
|
|
|
+no result, so I have now hacked CMakeLists.txt along the lines of some changes
|
|
|
+I found on the Internet. The new code no longer needs the policy setting, and
|
|
|
+it appears to work fine on Linux.
|
|
|
+
|
|
|
+35. Setting --enable-jit=auto for an out-of-tree build failed because the
|
|
|
+source directory wasn't in the search path for AC_TRY_COMPILE always. Patch
|
|
|
+from Ross Burton.
|
|
|
+
|
|
|
+36. Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available.
|
|
|
+Patch by Guillem Jover.
|
|
|
+
|
|
|
+37. Changed expressions such as 1<<10 to 1u<<10 in many places because compiler
|
|
|
+warnings were reported.
|
|
|
+
|
|
|
+38. Using the clang compiler with sanitizing options causes runtime complaints
|
|
|
+about truncation for statments such as x = ~x when x is an 8-bit value; it
|
|
|
+seems to compute ~x as a 32-bit value. Changing such statements to x = 255 ^ x
|
|
|
+gets rid of the warnings. There were also two missing casts in pcre2test.
|
|
|
+
|
|
|
+
|
|
|
+Version 10.32 10-September-2018
|
|
|
+-------------------------------
|
|
|
+
|
|
|
+1. When matching using the the REG_STARTEND feature of the POSIX API with a
|
|
|
+non-zero starting offset, unset capturing groups with lower numbers than a
|
|
|
+group that did capture something were not being correctly returned as "unset"
|
|
|
+(that is, with offset values of -1).
|
|
|
+
|
|
|
+2. When matching using the POSIX API, pcre2test used to omit listing unset
|
|
|
+groups altogether. Now it shows those that come before any actual captures as
|
|
|
+"<unset>", as happens for non-POSIX matching.
|
|
|
+
|
|
|
+3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
|
|
|
+whatever the build configuration was. It now correctly says "\R matches all
|
|
|
+Unicode newlines" in the default case when --enable-bsr-anycrlf has not been
|
|
|
+specified. Similarly, running "pcre2test -C bsr" never produced the result
|
|
|
+ANY.
|
|
|
+
|
|
|
+4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
|
|
|
+multi-code-unit characters caused bad behaviour and possibly a crash. This
|
|
|
+issue was fixed for other kinds of repeat in release 10.20 by change 19, but
|
|
|
+repeating character classes were overlooked.
|
|
|
+
|
|
|
+5. pcre2grep now supports the inclusion of binary zeros in patterns that are
|
|
|
+read from files via the -f option.
|
|
|
+
|
|
|
+6. A small fix to pcre2grep to avoid compiler warnings for -Wformat-overflow=2.
|
|
|
+
|
|
|
+7. Added --enable-jit=auto support to configure.ac.
|
|
|
+
|
|
|
+8. Added some dummy variables to the heapframe structure in 16-bit and 32-bit
|
|
|
+modes for the benefit of m68k, where pointers can be 16-bit aligned. The
|
|
|
+dummies force 32-bit alignment and this ensures that the structure is a
|
|
|
+multiple of PCRE2_SIZE, a requirement that is tested at compile time. In other
|
|
|
+architectures, alignment requirements take care of this automatically.
|
|
|
+
|
|
|
+9. When returning an error from pcre2_pattern_convert(), ensure the error
|
|
|
+offset is set zero for early errors.
|
|
|
+
|
|
|
+10. A number of patches for Windows support from Daniel Richard G:
|
|
|
+
|
|
|
+ (a) List of error numbers in Runtest.bat corrected (it was not the same as in
|
|
|
+ Runtest).
|
|
|
+
|
|
|
+ (b) pcre2grep snprintf() workaround as used elsewhere in the tree.
|
|
|
+
|
|
|
+ (c) Support for non-C99 snprintf() that returns -1 in the overflow case.
|
|
|
+
|
|
|
+11. Minor tidy of pcre2_dfa_match() code.
|
|
|
+
|
|
|
+12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer
|
|
|
+use the stack for local workspace and local ovectors. Instead, an initial block
|
|
|
+of stack is reserved, but if this is insufficient, heap memory is used. The
|
|
|
+heap limit parameter now applies to pcre2_dfa_match().
|
|
|
+
|
|
|
+13. If a "find limits" test of DFA matching in pcre2test resulted in too many
|
|
|
+matches for the ovector, no matches were displayed.
|
|
|
+
|
|
|
+14. Removed an occurrence of ctrl/Z from test 6 because Windows treats it as
|
|
|
+EOF. The test looks to have come from a fuzzer.
|
|
|
+
|
|
|
+15. If PCRE2 was built with a default match limit a lot greater than the
|
|
|
+default default of 10 000 000, some JIT tests of the match limit no longer
|
|
|
+failed. All such tests now set 10 000 000 as the upper limit.
|
|
|
+
|
|
|
+16. Another Windows related patch for pcregrep to ensure that WIN32 is
|
|
|
+undefined under Cygwin.
|
|
|
+
|
|
|
+17. Test for the presence of stdint.h and inttypes.h in configure and CMake and
|
|
|
+include whichever exists (stdint preferred) instead of unconditionally
|
|
|
+including stdint. This makes life easier for old and non-standard systems.
|
|
|
+
|
|
|
+18. Further changes to improve portability, especially to old and or non-
|
|
|
+standard systems:
|
|
|
+
|
|
|
+ (a) Put all printf arguments in RunGrepTest into single, not double, quotes,
|
|
|
+ and use \0 not \x00 for binary zero.
|
|
|
+
|
|
|
+ (b) Avoid the use of C++ (i.e. BCPL) // comments.
|
|
|
+
|
|
|
+ (c) Parameterize the use of %zu in pcre2test to make it like %td. For both of
|
|
|
+ these now, if using MSVC or a standard C before C99, %lu is used with a
|
|
|
+ cast if necessary.
|
|
|
+
|
|
|
+19. Applied a contributed patch to CMakeLists.txt to increase the stack size
|
|
|
+when linking pcre2test with MSVC. This gets rid of a stack overflow error in
|
|
|
+the standard set of tests.
|
|
|
+
|
|
|
+20. Output a warning in pcre2test when ignoring the "altglobal" modifier when
|
|
|
+it is given with the "replace" modifier.
|
|
|
+
|
|
|
+21. In both pcre2test and pcre2_substitute(), with global matching, a pattern
|
|
|
+that matched an empty string, but never at the starting match offset, was not
|
|
|
+handled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such
|
|
|
+a pattern. Because \G is in a lookbehind assertion, there has to be a
|
|
|
+"bumpalong" before there can be a match. The automatic "advance by one
|
|
|
+character after an empty string match" rule is therefore inappropriate. A more
|
|
|
+complicated algorithm has now been implemented.
|
|
|
+
|
|
|
+22. When checking to see if a lookbehind is of fixed length, lookaheads were
|
|
|
+correctly ignored, but qualifiers on lookaheads were not being ignored, leading
|
|
|
+to an incorrect "lookbehind assertion is not fixed length" error.
|
|
|
+
|
|
|
+23. The VERSION condition test was reading fractional PCRE2 version numbers
|
|
|
+such as the 04 in 10.04 incorrectly and hence giving wrong results.
|
|
|
+
|
|
|
+24. Updated to Unicode version 11.0.0. As well as the usual addition of new
|
|
|
+scripts and characters, this involved re-jigging the grapheme break property
|
|
|
+algorithm because Unicode has changed the way emojis are handled.
|
|
|
+
|
|
|
+25. Fixed an obscure bug that struck when there were two atomic groups not
|
|
|
+separated by something with a backtracking point. There could be an incorrect
|
|
|
+backtrack into the first of the atomic groups. A complicated example is
|
|
|
+/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ matched against "abc", where the *SKIP
|
|
|
+shouldn't find a MARK (because is in an atomic group), but it did.
|
|
|
+
|
|
|
+26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set
|
|
|
+a list of modifiers for all subsequent patterns - only those that the script
|
|
|
+recognizes are meaningful; (2) #subject lines can be used to set or unset a
|
|
|
+default "mark" modifier; (3) Unsupported #command lines give a warning when
|
|
|
+they are ignored; (4) Mark data is output only if the "mark" modifier is
|
|
|
+present.
|
|
|
+
|
|
|
+27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
|
|
|
+
|
|
|
+28. A (*MARK) name was not being passed back for positive assertions that were
|
|
|
+terminated by (*ACCEPT).
|
|
|
+
|
|
|
+29. Add support for \N{U+dddd}, but only in Unicode mode.
|
|
|
+
|
|
|
+30. Add support for (?^) for unsetting all imnsx options.
|
|
|
+
|
|
|
+31. The PCRE2_EXTENDED (/x) option only ever discarded space characters whose
|
|
|
+code point was less than 256 and that were recognized by the lookup table
|
|
|
+generated by pcre2_maketables(), which uses isspace() to identify white space.
|
|
|
+Now, when Unicode support is compiled, PCRE2_EXTENDED also discards U+0085,
|
|
|
+U+200E, U+200F, U+2028, and U+2029, which are additional characters defined by
|
|
|
+Unicode as "Pattern White Space". This makes PCRE2 compatible with Perl.
|
|
|
+
|
|
|
+32. In certain circumstances, option settings within patterns were not being
|
|
|
+correctly processed. For example, the pattern /((?i)A)(?m)B/ incorrectly
|
|
|
+matched "ab". (The (?m) setting lost the fact that (?i) should be reset at the
|
|
|
+end of its group during the parse process, but without another setting such as
|
|
|
+(?m) the compile phase got it right.) This bug was introduced by the
|
|
|
+refactoring in release 10.23.
|
|
|
+
|
|
|
+33. PCRE2 uses bcopy() if available when memmove() is not, and it used just to
|
|
|
+define memmove() as function call to bcopy(). This hasn't been tested for a
|
|
|
+long time because in pcre2test the result of memmove() was being used, whereas
|
|
|
+bcopy() doesn't return a result. This feature is now refactored always to call
|
|
|
+an emulation function when there is no memmove(). The emulation makes use of
|
|
|
+bcopy() when available.
|
|
|
+
|
|
|
+34. When serializing a pattern, set the memctl, executable_jit, and tables
|
|
|
+fields (that is, all the fields that contain pointers) to zeros so that the
|
|
|
+result of serializing is always the same. These fields are re-set when the
|
|
|
+pattern is deserialized.
|
|
|
+
|
|
|
+35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
|
|
|
+negative class with no characters less than 0x100 followed by a positive class
|
|
|
+with only characters less than 0x100, the first class was incorrectly being
|
|
|
+auto-possessified, causing incorrect match failures.
|
|
|
+
|
|
|
+36. Removed the character type bit ctype_meta, which dates from PCRE1 and is
|
|
|
+not used in PCRE2.
|
|
|
+
|
|
|
+37. Tidied up unnecessarily complicated macros used in the escapes table.
|
|
|
+
|
|
|
+38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted
|
|
|
+from distribution tarballs, owing to a typo in Makefile.am which had
|
|
|
+testoutput8-16-3 twice. Now fixed.
|
|
|
+
|
|
|
+39. If the only branch in a conditional subpattern was anchored, the whole
|
|
|
+subpattern was treated as anchored, when it should not have been, since the
|
|
|
+assumed empty second branch cannot be anchored. Demonstrated by test patterns
|
|
|
+such as /(?(1)^())b/ or /(?(?=^))b/.
|
|
|
+
|
|
|
+40. A repeated conditional subpattern that could match an empty string was
|
|
|
+always assumed to be unanchored. Now it it checked just like any other
|
|
|
+repeated conditional subpattern, and can be found to be anchored if the minimum
|
|
|
+quantifier is one or more. I can't see much use for a repeated anchored
|
|
|
+pattern, but the behaviour is now consistent.
|
|
|
+
|
|
|
+41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint
|
|
|
+(for an event that could never occur but you had to have external information
|
|
|
+to know that).
|
|
|
+
|
|
|
+42. If before the first match in a file that was being searched by pcre2grep
|
|
|
+there was a line that was sufficiently long to cause the input buffer to be
|
|
|
+expanded, the variable holding the location of the end of the previous match
|
|
|
+was being adjusted incorrectly, and could cause an overflow warning from a code
|
|
|
+sanitizer. However, as the value is used only to print pending "after" lines
|
|
|
+when the next match is reached (and there are no such lines in this case) this
|
|
|
+bug could do no damage.
|
|
|
+
|
|
|
|
|
|
Version 10.31 12-February-2018
|
|
|
------------------------------
|
|
@@ -304,8 +1242,8 @@ tests to improve coverage.
|
|
|
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
|
|
pcre2test, a crash could occur.
|
|
|
|
|
|
-32. Make -bigstack in RunTest allocate a 64Mb stack (instead of 16 MB) so that
|
|
|
-all the tests can run with clang's sanitizing options.
|
|
|
+32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16MiB) so
|
|
|
+that all the tests can run with clang's sanitizing options.
|
|
|
|
|
|
33. Implement extra compile options in the compile context and add the first
|
|
|
one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
|
@@ -898,9 +1836,9 @@ to the same code as '.' when PCRE2_DOTALL is set).
|
|
|
40. Fix two clang compiler warnings in pcre2test when only one code unit width
|
|
|
is supported.
|
|
|
|
|
|
-41. Upgrade RunTest to automatically re-run test 2 with a large (64M) stack if
|
|
|
-it fails when running the interpreter with a 16M stack (and if changing the
|
|
|
-stack size via pcre2test is possible). This avoids having to manually set a
|
|
|
+41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack
|
|
|
+if it fails when running the interpreter with a 16MiB stack (and if changing
|
|
|
+the stack size via pcre2test is possible). This avoids having to manually set a
|
|
|
large stack size when testing with clang.
|
|
|
|
|
|
42. Fix register overwite in JIT when SSE2 acceleration is enabled.
|