Browse Source

Updated to pcre 10.39.

Brucey 3 năm trước cách đây
mục cha
commit
047b39e04c
100 tập tin đã thay đổi với 8488 bổ sung5159 xóa
  1. 2 1
      regex.mod/common.bmx
  2. 6 6
      regex.mod/pcre/AUTHORS
  3. 438 133
      regex.mod/pcre/CMakeLists.txt
  4. 943 5
      regex.mod/pcre/ChangeLog
  5. 18 7
      regex.mod/pcre/CheckMan
  6. 39 26
      regex.mod/pcre/HACKING
  7. 3 3
      regex.mod/pcre/INSTALL
  8. 13 13
      regex.mod/pcre/LICENCE
  9. 35 15
      regex.mod/pcre/Makefile.am
  10. 421 284
      regex.mod/pcre/Makefile.in
  11. 162 2
      regex.mod/pcre/NEWS
  12. 46 23
      regex.mod/pcre/NON-AUTOTOOLS-BUILD
  13. 1 1
      regex.mod/pcre/PrepareRelease
  14. 107 85
      regex.mod/pcre/README
  15. 102 34
      regex.mod/pcre/RunGrepTest
  16. 7 2
      regex.mod/pcre/RunGrepTest.bat
  17. 11 6
      regex.mod/pcre/RunTest
  18. 525 523
      regex.mod/pcre/RunTest.bat
  19. 185 119
      regex.mod/pcre/aclocal.m4
  20. 10 9
      regex.mod/pcre/ar-lib
  21. 15 0
      regex.mod/pcre/cmake/pcre2-config-version.cmake.in
  22. 145 0
      regex.mod/pcre/cmake/pcre2-config.cmake.in
  23. 6 6
      regex.mod/pcre/compile
  24. 7 4
      regex.mod/pcre/config-cmake.h.in
  25. 432 375
      regex.mod/pcre/config.guess
  26. 1008 1070
      regex.mod/pcre/config.sub
  27. 217 336
      regex.mod/pcre/configure
  28. 134 41
      regex.mod/pcre/configure.ac
  29. 4 4
      regex.mod/pcre/depcomp
  30. 46 23
      regex.mod/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
  31. 107 85
      regex.mod/pcre/doc/html/README.txt
  32. 7 1
      regex.mod/pcre/doc/html/index.html
  33. 29 13
      regex.mod/pcre/doc/html/pcre2.html
  34. 2 1
      regex.mod/pcre/doc/html/pcre2_code_free.html
  35. 8 2
      regex.mod/pcre/doc/html/pcre2_compile.html
  36. 2 1
      regex.mod/pcre/doc/html/pcre2_compile_context_free.html
  37. 2 1
      regex.mod/pcre/doc/html/pcre2_convert_context_free.html
  38. 2 1
      regex.mod/pcre/doc/html/pcre2_converted_pattern_free.html
  39. 12 4
      regex.mod/pcre/doc/html/pcre2_dfa_match.html
  40. 2 1
      regex.mod/pcre/doc/html/pcre2_general_context_free.html
  41. 39 0
      regex.mod/pcre/doc/html/pcre2_get_match_data_size.html
  42. 8 1
      regex.mod/pcre/doc/html/pcre2_jit_compile.html
  43. 1 1
      regex.mod/pcre/doc/html/pcre2_jit_free_unused_memory.html
  44. 3 1
      regex.mod/pcre/doc/html/pcre2_jit_match.html
  45. 8 3
      regex.mod/pcre/doc/html/pcre2_jit_stack_assign.html
  46. 2 2
      regex.mod/pcre/doc/html/pcre2_jit_stack_create.html
  47. 3 2
      regex.mod/pcre/doc/html/pcre2_jit_stack_free.html
  48. 1 1
      regex.mod/pcre/doc/html/pcre2_maketables.html
  49. 44 0
      regex.mod/pcre/doc/html/pcre2_maketables_free.html
  50. 5 3
      regex.mod/pcre/doc/html/pcre2_match.html
  51. 2 1
      regex.mod/pcre/doc/html/pcre2_match_context_free.html
  52. 3 2
      regex.mod/pcre/doc/html/pcre2_match_data_create.html
  53. 9 6
      regex.mod/pcre/doc/html/pcre2_match_data_create_from_pattern.html
  54. 9 3
      regex.mod/pcre/doc/html/pcre2_match_data_free.html
  55. 3 2
      regex.mod/pcre/doc/html/pcre2_pattern_info.html
  56. 6 3
      regex.mod/pcre/doc/html/pcre2_serialize_decode.html
  57. 8 3
      regex.mod/pcre/doc/html/pcre2_serialize_encode.html
  58. 4 3
      regex.mod/pcre/doc/html/pcre2_serialize_free.html
  59. 2 2
      regex.mod/pcre/doc/html/pcre2_serialize_get_number_of_codes.html
  60. 7 4
      regex.mod/pcre/doc/html/pcre2_set_character_tables.html
  61. 5 2
      regex.mod/pcre/doc/html/pcre2_set_compile_extra_options.html
  62. 1 1
      regex.mod/pcre/doc/html/pcre2_set_glob_separator.html
  63. 43 0
      regex.mod/pcre/doc/html/pcre2_set_substitute_callout.html
  64. 23 6
      regex.mod/pcre/doc/html/pcre2_substitute.html
  65. 1 1
      regex.mod/pcre/doc/html/pcre2_substring_free.html
  66. 2 1
      regex.mod/pcre/doc/html/pcre2_substring_list_free.html
  67. 2 2
      regex.mod/pcre/doc/html/pcre2_substring_nametable_scan.html
  68. 396 175
      regex.mod/pcre/doc/html/pcre2api.html
  69. 101 51
      regex.mod/pcre/doc/html/pcre2build.html
  70. 19 9
      regex.mod/pcre/doc/html/pcre2callout.html
  71. 84 65
      regex.mod/pcre/doc/html/pcre2compat.html
  72. 6 5
      regex.mod/pcre/doc/html/pcre2convert.html
  73. 27 18
      regex.mod/pcre/doc/html/pcre2demo.html
  74. 294 190
      regex.mod/pcre/doc/html/pcre2grep.html
  75. 82 48
      regex.mod/pcre/doc/html/pcre2jit.html
  76. 17 17
      regex.mod/pcre/doc/html/pcre2limits.html
  77. 42 30
      regex.mod/pcre/doc/html/pcre2matching.html
  78. 262 319
      regex.mod/pcre/doc/html/pcre2partial.html
  79. 329 229
      regex.mod/pcre/doc/html/pcre2pattern.html
  80. 27 19
      regex.mod/pcre/doc/html/pcre2perform.html
  81. 92 67
      regex.mod/pcre/doc/html/pcre2posix.html
  82. 19 8
      regex.mod/pcre/doc/html/pcre2serialize.html
  83. 166 69
      regex.mod/pcre/doc/html/pcre2syntax.html
  84. 266 126
      regex.mod/pcre/doc/html/pcre2test.html
  85. 261 69
      regex.mod/pcre/doc/html/pcre2unicode.html
  86. 7 1
      regex.mod/pcre/doc/index.html.src
  87. 15 15
      regex.mod/pcre/doc/pcre2-config.txt
  88. 28 14
      regex.mod/pcre/doc/pcre2.3
  89. 366 270
      regex.mod/pcre/doc/pcre2.txt
  90. 3 2
      regex.mod/pcre/doc/pcre2_code_free.3
  91. 10 3
      regex.mod/pcre/doc/pcre2_compile.3
  92. 3 2
      regex.mod/pcre/doc/pcre2_compile_context_free.3
  93. 3 2
      regex.mod/pcre/doc/pcre2_convert_context_free.3
  94. 3 2
      regex.mod/pcre/doc/pcre2_converted_pattern_free.3
  95. 12 5
      regex.mod/pcre/doc/pcre2_dfa_match.3
  96. 3 2
      regex.mod/pcre/doc/pcre2_general_context_free.3
  97. 27 0
      regex.mod/pcre/doc/pcre2_get_match_data_size.3
  98. 8 2
      regex.mod/pcre/doc/pcre2_jit_compile.3
  99. 1 1
      regex.mod/pcre/doc/pcre2_jit_free_unused_memory.3
  100. 4 2
      regex.mod/pcre/doc/pcre2_jit_match.3

+ 2 - 1
regex.mod/common.bmx

@@ -1,4 +1,4 @@
-' Copyright (c) 2007-2020 Bruce A Henderson
+' Copyright (c) 2007-2021 Bruce A Henderson
 ' All rights reserved.
 ' All rights reserved.
 '
 '
 ' Redistribution and use in source and binary forms, with or without
 ' Redistribution and use in source and binary forms, with or without
@@ -43,6 +43,7 @@ Import "pcre/src/pcre2_match_data.c"
 Import "pcre/src/pcre2_newline.c"
 Import "pcre/src/pcre2_newline.c"
 Import "pcre/src/pcre2_ord2utf.c"
 Import "pcre/src/pcre2_ord2utf.c"
 Import "pcre/src/pcre2_pattern_info.c"
 Import "pcre/src/pcre2_pattern_info.c"
+Import "pcre/src/pcre2_script_run.c"
 Import "pcre/src/pcre2_string_utils.c"
 Import "pcre/src/pcre2_string_utils.c"
 Import "pcre/src/pcre2_study.c"
 Import "pcre/src/pcre2_study.c"
 Import "pcre/src/pcre2_substitute.c"
 Import "pcre/src/pcre2_substitute.c"

+ 6 - 6
regex.mod/pcre/AUTHORS

@@ -2,13 +2,13 @@ THE MAIN PCRE2 LIBRARY CODE
 ---------------------------
 ---------------------------
 
 
 Written by:       Philip Hazel
 Written by:       Philip Hazel
-Email local part: ph10
-Email domain:     cam.ac.uk
+Email local part: Philip.Hazel
+Email domain:     gmail.com
 
 
-University of Cambridge Computing Service,
+Retired from University of Cambridge Computing Service,
 Cambridge, England.
 Cambridge, England.
 
 
-Copyright (c) 1997-2018 University of Cambridge
+Copyright (c) 1997-2021 University of Cambridge
 All rights reserved
 All rights reserved
 
 
 
 
@@ -19,7 +19,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Email local part: hzmester
 Emain domain:     freemail.hu
 Emain domain:     freemail.hu
 
 
-Copyright(c) 2010-2018 Zoltan Herczeg
+Copyright(c) 2010-2021 Zoltan Herczeg
 All rights reserved.
 All rights reserved.
 
 
 
 
@@ -30,7 +30,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Email local part: hzmester
 Emain domain:     freemail.hu
 Emain domain:     freemail.hu
 
 
-Copyright(c) 2009-2018 Zoltan Herczeg
+Copyright(c) 2009-2021 Zoltan Herczeg
 All rights reserved.
 All rights reserved.
 
 
 ####
 ####

+ 438 - 133
regex.mod/pcre/CMakeLists.txt

@@ -1,6 +1,5 @@
 # CMakeLists.txt
 # CMakeLists.txt
 #
 #
-#
 # This file enables PCRE2 to be built with the CMake configuration and build
 # This file enables PCRE2 to be built with the CMake configuration and build
 # tool. Download CMake in source or binary form from http://www.cmake.org/
 # tool. Download CMake in source or binary form from http://www.cmake.org/
 # Converted to support PCRE2 from the original PCRE file, August 2014.
 # Converted to support PCRE2 from the original PCRE file, August 2014.
@@ -80,18 +79,44 @@
 # 2017-03-11 PH turned HEAP_MATCH_RECURSE into a NO-OP for 10.30
 # 2017-03-11 PH turned HEAP_MATCH_RECURSE into a NO-OP for 10.30
 # 2017-04-08 PH added HEAP_LIMIT
 # 2017-04-08 PH added HEAP_LIMIT
 # 2017-06-15 ZH added SUPPORT_JIT_SEALLOC support
 # 2017-06-15 ZH added SUPPORT_JIT_SEALLOC support
+# 2018-06-19 PH added checks for stdint.h and inttypes.h (later removed)
+# 2018-06-27 PH added Daniel's patch to increase the stack for MSVC
+# 2018-11-14 PH removed unnecessary checks for stdint.h and inttypes.h
+# 2018-11-16 PH added PCRE2GREP_SUPPORT_CALLOUT_FORK support and tidied
+# 2019-02-16 PH hacked to avoid CMP0026 policy issue (see comments below)
+# 2020-03-16 PH renamed dftables as pcre2_dftables (as elsewhere)
+# 2020-03-24 PH changed CMAKE_MODULE_PATH definition to add, not replace
+# 2020-04-08 Carlo added function check for secure_getenv, fixed strerror
+# 2020-04-16 enh added check for __attribute__((uninitialized))
+# 2020-04-25 PH applied patches from Uwe Korn to support pkg-config and
+#            library versioning.
+# 2020-04-25 Carlo added function check for mkostemp used in ProtExecAllocator
+# 2020-04-28 PH added function check for memfd_create based on Carlo's patch
+# 2020-05-25 PH added a check for Intel CET
+# 2020-12-03 PH altered the definition of pcre2test as suggested by Daniel
+# 2021-06-29 JWSB added the option to build static library with PIC.
+# 2021-07-05 JWSB modified such both the static and shared library can be
+#            build in one go.
+# 2021-08-28 PH increased minimum version
+# 2021-08-28 PH added test for realpath()
 
 
 PROJECT(PCRE2 C)
 PROJECT(PCRE2 C)
 
 
-# Increased minimum to 2.8.0 to support newer add_test features. Set policy
-# CMP0026 to avoid warnings for the use of LOCATION in GET_TARGET_PROPERTY.
+# Increased minimum to 2.8.5 to support GNUInstallDirs.
+# Increased minimum to 3.0.0 because older than 2.8.12 is deprecated.
+CMAKE_MINIMUM_REQUIRED(VERSION 3.0.0)
+
+# Set policy CMP0026 to avoid warnings for the use of LOCATION in
+# GET_TARGET_PROPERTY. This should no longer be required.
+# CMAKE_POLICY(SET CMP0026 OLD)
 
 
-CMAKE_MINIMUM_REQUIRED(VERSION 2.8.0)
-CMAKE_POLICY(SET CMP0026 OLD)
+# For FindReadline.cmake. This was changed to allow setting CMAKE_MODULE_PATH
+# on the command line.
+# SET(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)
 
 
-SET(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake) # for FindReadline.cmake
+LIST(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)
 
 
-SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -I${PROJECT_SOURCE_DIR}/src")
+INCLUDE_DIRECTORIES(${PROJECT_SOURCE_DIR}/src)
 
 
 # external packages
 # external packages
 FIND_PACKAGE( BZip2 )
 FIND_PACKAGE( BZip2 )
@@ -101,29 +126,60 @@ FIND_PACKAGE( Editline )
 
 
 # Configuration checks
 # Configuration checks
 
 
-INCLUDE(CheckIncludeFile)
+INCLUDE(CheckCSourceCompiles)
 INCLUDE(CheckFunctionExists)
 INCLUDE(CheckFunctionExists)
+INCLUDE(CheckSymbolExists)
+INCLUDE(CheckIncludeFile)
 INCLUDE(CheckTypeSize)
 INCLUDE(CheckTypeSize)
+INCLUDE(GNUInstallDirs) # for CMAKE_INSTALL_LIBDIR
 
 
 CHECK_INCLUDE_FILE(dirent.h     HAVE_DIRENT_H)
 CHECK_INCLUDE_FILE(dirent.h     HAVE_DIRENT_H)
-CHECK_INCLUDE_FILE(stdint.h     HAVE_STDINT_H)
-CHECK_INCLUDE_FILE(inttypes.h   HAVE_INTTYPES_H)
 CHECK_INCLUDE_FILE(sys/stat.h   HAVE_SYS_STAT_H)
 CHECK_INCLUDE_FILE(sys/stat.h   HAVE_SYS_STAT_H)
 CHECK_INCLUDE_FILE(sys/types.h  HAVE_SYS_TYPES_H)
 CHECK_INCLUDE_FILE(sys/types.h  HAVE_SYS_TYPES_H)
 CHECK_INCLUDE_FILE(unistd.h     HAVE_UNISTD_H)
 CHECK_INCLUDE_FILE(unistd.h     HAVE_UNISTD_H)
 CHECK_INCLUDE_FILE(windows.h    HAVE_WINDOWS_H)
 CHECK_INCLUDE_FILE(windows.h    HAVE_WINDOWS_H)
 
 
-CHECK_FUNCTION_EXISTS(bcopy     HAVE_BCOPY)
-CHECK_FUNCTION_EXISTS(memmove   HAVE_MEMMOVE)
-CHECK_FUNCTION_EXISTS(strerror  HAVE_STRERROR)
+CHECK_SYMBOL_EXISTS(bcopy         "strings.h"  HAVE_BCOPY)
+CHECK_SYMBOL_EXISTS(memfd_create  "sys/mman.h" HAVE_MEMFD_CREATE)
+CHECK_SYMBOL_EXISTS(memmove       "string.h"   HAVE_MEMMOVE)
+CHECK_SYMBOL_EXISTS(realpath      "stdlib.h"   HAVE_REALPATH)
+CHECK_SYMBOL_EXISTS(secure_getenv "stdlib.h"   HAVE_SECURE_GETENV)
+CHECK_SYMBOL_EXISTS(strerror      "string.h"   HAVE_STRERROR)
+
+set(ORIG_CMAKE_REQUIRED_FLAGS ${CMAKE_REQUIRED_FLAGS})
+set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -Werror")
+CHECK_C_SOURCE_COMPILES(
+  "int main() { char buf[128] __attribute__((uninitialized)); (void)buf; return 0; }"
+  HAVE_ATTRIBUTE_UNINITIALIZED
+)
+set(CMAKE_REQUIRED_FLAGS ${ORIG_CMAKE_REQUIRED_FLAGS})
+
+# Check whether Intel CET is enabled, and if so, adjust compiler flags. This
+# code was written by PH, trying to imitate the logic from the autotools
+# configuration.
+
+CHECK_C_SOURCE_COMPILES(
+  "#ifndef __CET__
+   #error CET is not enabled
+   #endif
+   int main() { return 0; }"
+  INTEL_CET_ENABLED
+)
+
+IF (INTEL_CET_ENABLED)
+  SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mshstk")
+ENDIF(INTEL_CET_ENABLED)
+
+
 
 
 # User-configurable options
 # User-configurable options
 #
 #
 # Note: CMakeSetup displays these in alphabetical order, regardless of
 # Note: CMakeSetup displays these in alphabetical order, regardless of
 # the order we use here.
 # the order we use here.
 
 
-SET(BUILD_SHARED_LIBS OFF CACHE BOOL
-    "Build shared libraries instead of static ones.")
+SET(BUILD_SHARED_LIBS OFF CACHE BOOL "Build shared libraries.")
+
+OPTION(BUILD_STATIC_LIBS "Build static libraries." ON)
 
 
 OPTION(PCRE2_BUILD_PCRE2_8 "Build 8 bit PCRE2 library" ON)
 OPTION(PCRE2_BUILD_PCRE2_8 "Build 8 bit PCRE2 library" ON)
 
 
@@ -131,8 +187,12 @@ OPTION(PCRE2_BUILD_PCRE2_16 "Build 16 bit PCRE2 library" OFF)
 
 
 OPTION(PCRE2_BUILD_PCRE2_32 "Build 32 bit PCRE2 library" OFF)
 OPTION(PCRE2_BUILD_PCRE2_32 "Build 32 bit PCRE2 library" OFF)
 
 
+OPTION(PCRE2_STATIC_PIC "Build the static library with the option position independent code enabled." OFF)
+
 OPTION(PCRE2_DEBUG "Include debugging code" OFF)
 OPTION(PCRE2_DEBUG "Include debugging code" OFF)
 
 
+OPTION(PCRE2_DISABLE_PERCENT_ZT "Disable the use of %zu and %td (rarely needed)" OFF)
+
 SET(PCRE2_EBCDIC OFF CACHE BOOL
 SET(PCRE2_EBCDIC OFF CACHE BOOL
     "Use EBCDIC coding instead of ASCII. (This is rarely used outside of mainframe systems.)")
     "Use EBCDIC coding instead of ASCII. (This is rarely used outside of mainframe systems.)")
 
 
@@ -146,7 +206,7 @@ SET(PCRE2_PARENS_NEST_LIMIT "250" CACHE STRING
     "Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.")
     "Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.")
 
 
 SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING
 SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING
-    "Default limit on heap memory (kilobytes). See HEAP_LIMIT in config.h.in for details.")
+    "Default limit on heap memory (kibibytes). See HEAP_LIMIT in config.h.in for details.")
 
 
 SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING
 SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING
     "Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.")
     "Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.")
@@ -169,15 +229,22 @@ SET(PCRE2_HEAP_MATCH_RECURSE OFF CACHE BOOL
 SET(PCRE2_SUPPORT_JIT OFF CACHE BOOL
 SET(PCRE2_SUPPORT_JIT OFF CACHE BOOL
     "Enable support for Just-in-time compiling.")
     "Enable support for Just-in-time compiling.")
 
 
-SET(PCRE2_SUPPORT_JIT_SEALLOC OFF CACHE BOOL
-    "Enable SELinux compatible execmem allocator in JIT.")
+IF(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD)
+    SET(PCRE2_SUPPORT_JIT_SEALLOC OFF CACHE BOOL
+        "Enable SELinux compatible execmem allocator in JIT (experimental).")
+ELSE(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD)
+    SET(PCRE2_SUPPORT_JIT_SEALLOC IGNORE)
+ENDIF(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD)
 
 
-SET(PCRE2_SUPPORT_PCRE2GREP_JIT ON CACHE BOOL
+SET(PCRE2GREP_SUPPORT_JIT ON CACHE BOOL
     "Enable use of Just-in-time compiling in pcre2grep.")
     "Enable use of Just-in-time compiling in pcre2grep.")
 
 
-SET(PCRE2_SUPPORT_PCRE2GREP_CALLOUT ON CACHE BOOL
+SET(PCRE2GREP_SUPPORT_CALLOUT ON CACHE BOOL
     "Enable callout string support in pcre2grep.")
     "Enable callout string support in pcre2grep.")
 
 
+SET(PCRE2GREP_SUPPORT_CALLOUT_FORK ON CACHE BOOL
+    "Enable callout string fork support in pcre2grep.")
+
 SET(PCRE2_SUPPORT_UNICODE ON CACHE BOOL
 SET(PCRE2_SUPPORT_UNICODE ON CACHE BOOL
     "Enable support for Unicode and UTF-8/UTF-16/UTF-32 encoding.")
     "Enable support for Unicode and UTF-8/UTF-16/UTF-32 encoding.")
 
 
@@ -247,9 +314,9 @@ ENDIF(PCRE2_SUPPORT_LIBREADLINE)
 
 
 # Prepare build configuration
 # Prepare build configuration
 
 
-IF(NOT BUILD_SHARED_LIBS)
-        SET(PCRE2_STATIC 1)
-ENDIF(NOT BUILD_SHARED_LIBS)
+IF(NOT BUILD_SHARED_LIBS AND NOT BUILD_STATIC_LIBS)
+        MESSAGE(FATAL_ERROR "At least one of BUILD_SHARED_LIBS or BUILD_STATIC_LIBS must be enabled.")
+ENDIF(NOT BUILD_SHARED_LIBS AND NOT BUILD_STATIC_LIBS)
 
 
 IF(NOT PCRE2_BUILD_PCRE2_8 AND NOT PCRE2_BUILD_PCRE2_16 AND NOT PCRE2_BUILD_PCRE2_32)
 IF(NOT PCRE2_BUILD_PCRE2_8 AND NOT PCRE2_BUILD_PCRE2_16 AND NOT PCRE2_BUILD_PCRE2_32)
         MESSAGE(FATAL_ERROR "At least one of PCRE2_BUILD_PCRE2_8, PCRE2_BUILD_PCRE2_16 or PCRE2_BUILD_PCRE2_32 must be enabled")
         MESSAGE(FATAL_ERROR "At least one of PCRE2_BUILD_PCRE2_8, PCRE2_BUILD_PCRE2_16 or PCRE2_BUILD_PCRE2_32 must be enabled")
@@ -293,21 +360,40 @@ IF(PCRE2_SUPPORT_JIT)
 ENDIF(PCRE2_SUPPORT_JIT)
 ENDIF(PCRE2_SUPPORT_JIT)
 
 
 IF(PCRE2_SUPPORT_JIT_SEALLOC)
 IF(PCRE2_SUPPORT_JIT_SEALLOC)
-        SET(SLJIT_PROT_EXECUTABLE_ALLOCATOR 1)
+        SET(CMAKE_REQUIRED_DEFINITIONS -D_GNU_SOURCE)
+	CHECK_SYMBOL_EXISTS(mkostemp stdlib.h REQUIRED)
+        UNSET(CMAKE_REQUIRED_DEFINITIONS)
+        IF(${REQUIRED})
+                IF(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD)
+                        ADD_DEFINITIONS(-D_GNU_SOURCE)
+                        SET(SLJIT_PROT_EXECUTABLE_ALLOCATOR 1)
+                ELSE(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD)
+                        MESSAGE(FATAL_ERROR "Your configuration is not supported")
+                ENDIF(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD)
+        ELSE(${REQUIRED})
+                SET(PCRE2_SUPPORT_JIT_SEALLOC OFF)
+        ENDIF(${REQUIRED})
 ENDIF(PCRE2_SUPPORT_JIT_SEALLOC)
 ENDIF(PCRE2_SUPPORT_JIT_SEALLOC)
 
 
-IF(PCRE2_SUPPORT_PCRE2GREP_JIT)
+IF(PCRE2GREP_SUPPORT_JIT)
         SET(SUPPORT_PCRE2GREP_JIT 1)
         SET(SUPPORT_PCRE2GREP_JIT 1)
-ENDIF(PCRE2_SUPPORT_PCRE2GREP_JIT)
+ENDIF(PCRE2GREP_SUPPORT_JIT)
 
 
-IF(PCRE2_SUPPORT_PCRE2GREP_CALLOUT)
+IF(PCRE2GREP_SUPPORT_CALLOUT)
         SET(SUPPORT_PCRE2GREP_CALLOUT 1)
         SET(SUPPORT_PCRE2GREP_CALLOUT 1)
-ENDIF(PCRE2_SUPPORT_PCRE2GREP_CALLOUT)
+        IF(PCRE2GREP_SUPPORT_CALLOUT_FORK)
+                SET(SUPPORT_PCRE2GREP_CALLOUT_FORK 1)
+        ENDIF(PCRE2GREP_SUPPORT_CALLOUT_FORK)
+ENDIF(PCRE2GREP_SUPPORT_CALLOUT)
 
 
 IF(PCRE2_SUPPORT_VALGRIND)
 IF(PCRE2_SUPPORT_VALGRIND)
         SET(SUPPORT_VALGRIND 1)
         SET(SUPPORT_VALGRIND 1)
 ENDIF(PCRE2_SUPPORT_VALGRIND)
 ENDIF(PCRE2_SUPPORT_VALGRIND)
 
 
+IF(PCRE2_DISABLE_PERCENT_ZT)
+        SET(DISABLE_PERCENT_ZT 1)
+ENDIF(PCRE2_DISABLE_PERCENT_ZT)
+
 # This next one used to reference ${READLINE_LIBRARY})
 # This next one used to reference ${READLINE_LIBRARY})
 # but I was advised to add the NCURSES test as well, along with
 # but I was advised to add the NCURSES test as well, along with
 # some modifications to cmake/FindReadline.cmake which should
 # some modifications to cmake/FindReadline.cmake which should
@@ -382,12 +468,13 @@ file(STRINGS ${PROJECT_SOURCE_DIR}/configure.ac
   LIMIT_COUNT 50 # Read only the first 50 lines of the file
   LIMIT_COUNT 50 # Read only the first 50 lines of the file
 )
 )
 
 
-set(SEARCHED_VARIABLES "pcre2_major" "pcre2_minor" "pcre2_prerelease" "pcre2_date")
+set(SEARCHED_VARIABLES "pcre2_major" "pcre2_minor" "pcre2_prerelease" "pcre2_date"
+  "libpcre2_posix_version" "libpcre2_8_version" "libpcre2_16_version" "libpcre2_32_version")
 foreach(configure_line ${configure_lines})
 foreach(configure_line ${configure_lines})
     foreach(_substitution_variable ${SEARCHED_VARIABLES})
     foreach(_substitution_variable ${SEARCHED_VARIABLES})
         string(TOUPPER ${_substitution_variable} _substitution_variable_upper)
         string(TOUPPER ${_substitution_variable} _substitution_variable_upper)
         if (NOT ${_substitution_variable_upper})
         if (NOT ${_substitution_variable_upper})
-            string(REGEX MATCH "m4_define\\(${_substitution_variable}, \\[(.*)\\]" MACTHED_STRING ${configure_line})
+            string(REGEX MATCH "m4_define\\(${_substitution_variable}, *\\[(.*)\\]" MATCHED_STRING ${configure_line})
             if (CMAKE_MATCH_1)
             if (CMAKE_MATCH_1)
                 set(${_substitution_variable_upper} ${CMAKE_MATCH_1})
                 set(${_substitution_variable_upper} ${CMAKE_MATCH_1})
             endif()
             endif()
@@ -395,21 +482,83 @@ foreach(configure_line ${configure_lines})
     endforeach()
     endforeach()
 endforeach()
 endforeach()
 
 
+macro(PARSE_LIB_VERSION VARIABLE_PREFIX)
+  string(REPLACE ":" ";" ${VARIABLE_PREFIX}_VERSION_LIST ${${VARIABLE_PREFIX}_VERSION})
+  list(GET ${VARIABLE_PREFIX}_VERSION_LIST 0 ${VARIABLE_PREFIX}_VERSION_CURRENT)
+  list(GET ${VARIABLE_PREFIX}_VERSION_LIST 1 ${VARIABLE_PREFIX}_VERSION_REVISION)
+  list(GET ${VARIABLE_PREFIX}_VERSION_LIST 2 ${VARIABLE_PREFIX}_VERSION_AGE)
+
+  math(EXPR ${VARIABLE_PREFIX}_SOVERSION "${${VARIABLE_PREFIX}_VERSION_CURRENT} - ${${VARIABLE_PREFIX}_VERSION_AGE}")
+  math(EXPR ${VARIABLE_PREFIX}_MACHO_COMPATIBILITY_VERSION "${${VARIABLE_PREFIX}_VERSION_CURRENT} + 1")
+  math(EXPR ${VARIABLE_PREFIX}_MACHO_CURRENT_VERSION "${${VARIABLE_PREFIX}_VERSION_CURRENT} + 1")
+  set(${VARIABLE_PREFIX}_MACHO_CURRENT_VERSION "${${VARIABLE_PREFIX}_MACHO_CURRENT_VERSION}.${${VARIABLE_PREFIX}_VERSION_REVISION}}")
+  set(${VARIABLE_PREFIX}_VERSION "${${VARIABLE_PREFIX}_SOVERSION}.${${VARIABLE_PREFIX}_VERSION_AGE}.${${VARIABLE_PREFIX}_VERSION_REVISION}")
+endmacro()
+
+PARSE_LIB_VERSION(LIBPCRE2_POSIX)
+PARSE_LIB_VERSION(LIBPCRE2_8)
+PARSE_LIB_VERSION(LIBPCRE2_16)
+PARSE_LIB_VERSION(LIBPCRE2_32)
+
 CONFIGURE_FILE(src/pcre2.h.in
 CONFIGURE_FILE(src/pcre2.h.in
                ${PROJECT_BINARY_DIR}/pcre2.h
                ${PROJECT_BINARY_DIR}/pcre2.h
                @ONLY)
                @ONLY)
 
 
-# What about pcre2-config and libpcre2.pc?
+# Make sure to not link debug libs
+# against release libs and vice versa
+IF(WIN32)
+  SET(CMAKE_DEBUG_POSTFIX "d")
+ENDIF(WIN32)
+
+# Generate pkg-config files
+
+SET(PACKAGE_VERSION "${PCRE2_MAJOR}.${PCRE2_MINOR}")
+SET(prefix ${CMAKE_INSTALL_PREFIX})
+
+SET(exec_prefix "\${prefix}")
+SET(libdir "\${exec_prefix}/${CMAKE_INSTALL_LIBDIR}")
+SET(includedir "\${prefix}/include")
+IF(WIN32 AND (CMAKE_BUILD_TYPE MATCHES Debug))
+  SET(LIB_POSTFIX ${CMAKE_DEBUG_POSTFIX})
+ENDIF()
+CONFIGURE_FILE(libpcre2-posix.pc.in libpcre2-posix.pc @ONLY)
+SET(pkg_config_files ${pkg_config_files} "${CMAKE_CURRENT_BINARY_DIR}/libpcre2-posix.pc")
+
+IF(PCRE2_BUILD_PCRE2_8)
+  CONFIGURE_FILE(libpcre2-8.pc.in libpcre2-8.pc @ONLY)
+  SET(pkg_config_files ${pkg_config_files} "${CMAKE_CURRENT_BINARY_DIR}/libpcre2-8.pc")
+  SET(enable_pcre2_8 "yes")
+ELSE()
+  SET(enable_pcre2_8 "no")
+ENDIF()
+
+IF(PCRE2_BUILD_PCRE2_16)
+  CONFIGURE_FILE(libpcre2-16.pc.in libpcre2-16.pc @ONLY)
+  SET(pkg_config_files ${pkg_config_files} "${CMAKE_CURRENT_BINARY_DIR}/libpcre2-16.pc")
+  SET(enable_pcre2_16 "yes")
+ELSE()
+  SET(enable_pcre2_16 "no")
+ENDIF()
+
+IF(PCRE2_BUILD_PCRE2_32)
+  CONFIGURE_FILE(libpcre2-32.pc.in libpcre2-32.pc @ONLY)
+  SET(pkg_config_files ${pkg_config_files} "${CMAKE_CURRENT_BINARY_DIR}/libpcre2-32.pc")
+  SET(enable_pcre2_32 "yes")
+ELSE()
+  SET(enable_pcre2_32 "no")
+ENDIF()
+
+CONFIGURE_FILE(pcre2-config.in pcre2-config @ONLY)
 
 
 # Character table generation
 # Character table generation
 
 
 OPTION(PCRE2_REBUILD_CHARTABLES "Rebuild char tables" OFF)
 OPTION(PCRE2_REBUILD_CHARTABLES "Rebuild char tables" OFF)
 IF(PCRE2_REBUILD_CHARTABLES)
 IF(PCRE2_REBUILD_CHARTABLES)
-  ADD_EXECUTABLE(dftables src/dftables.c)
+  ADD_EXECUTABLE(pcre2_dftables src/pcre2_dftables.c)
   ADD_CUSTOM_COMMAND(
   ADD_CUSTOM_COMMAND(
     COMMENT "Generating character tables (pcre2_chartables.c) for current locale"
     COMMENT "Generating character tables (pcre2_chartables.c) for current locale"
-    DEPENDS dftables
-    COMMAND dftables
+    DEPENDS pcre2_dftables
+    COMMAND pcre2_dftables
     ARGS        ${PROJECT_BINARY_DIR}/pcre2_chartables.c
     ARGS        ${PROJECT_BINARY_DIR}/pcre2_chartables.c
     OUTPUT      ${PROJECT_BINARY_DIR}/pcre2_chartables.c
     OUTPUT      ${PROJECT_BINARY_DIR}/pcre2_chartables.c
   )
   )
@@ -441,6 +590,7 @@ SET(PCRE2_SOURCES
   src/pcre2_newline.c
   src/pcre2_newline.c
   src/pcre2_ord2utf.c
   src/pcre2_ord2utf.c
   src/pcre2_pattern_info.c
   src/pcre2_pattern_info.c
+  src/pcre2_script_run.c
   src/pcre2_serialize.c
   src/pcre2_serialize.c
   src/pcre2_string_utils.c
   src/pcre2_string_utils.c
   src/pcre2_study.c
   src/pcre2_study.c
@@ -455,39 +605,35 @@ SET(PCRE2_SOURCES
 SET(PCRE2POSIX_HEADERS src/pcre2posix.h)
 SET(PCRE2POSIX_HEADERS src/pcre2posix.h)
 SET(PCRE2POSIX_SOURCES src/pcre2posix.c)
 SET(PCRE2POSIX_SOURCES src/pcre2posix.c)
 
 
-IF(MINGW AND NOT PCRE2_STATIC)
-IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc)
-ADD_CUSTOM_COMMAND(OUTPUT ${PROJECT_SOURCE_DIR}/pcre2.o
-PRE-LINK
-COMMAND windres ARGS pcre2.rc pcre2.o
-WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
-COMMENT Using pcre2 coff info in mingw build)
-SET(PCRE2_SOURCES
-  ${PCRE2_SOURCES} ${PROJECT_SOURCE_DIR}/pcre2.o
-)
-ENDIF(EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc)
-IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc)
-ADD_CUSTOM_COMMAND(OUTPUT ${PROJECT_SOURCE_DIR}/pcre2posix.o
-PRE-LINK
-COMMAND windres ARGS pcre2posix.rc pcre2posix.o
-WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
-COMMENT Using pcre2posix coff info in mingw build)
-SET(PCRE2POSIX_SOURCES
-  ${PCRE2POSIX_SOURCES} ${PROJECT_SOURCE_DIR}/pcre2posix.o
-)
-ENDIF(EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc)
-ENDIF(MINGW AND NOT PCRE2_STATIC)
-
-IF(MSVC AND NOT PCRE2_STATIC)
-IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc)
-SET(PCRE2_SOURCES
-  ${PCRE2_SOURCES} pcre2.rc)
-ENDIF(EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc)
-IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc)
-SET(PCRE2POSIX_SOURCES
-  ${PCRE2POSIX_SOURCES} pcre2posix.rc)
-ENDIF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc)
-ENDIF(MSVC AND NOT PCRE2_STATIC)
+IF(MINGW AND BUILD_SHARED_LIBS)
+  IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc)
+    ADD_CUSTOM_COMMAND(OUTPUT ${PROJECT_SOURCE_DIR}/pcre2.o
+      PRE-LINK
+      COMMAND windres ARGS pcre2.rc pcre2.o
+      WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
+      COMMENT Using pcre2 coff info in mingw build)
+    SET(PCRE2_SOURCES ${PCRE2_SOURCES} ${PROJECT_SOURCE_DIR}/pcre2.o)
+  ENDIF(EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc)
+
+  IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc)
+    ADD_CUSTOM_COMMAND(OUTPUT ${PROJECT_SOURCE_DIR}/pcre2posix.o
+      PRE-LINK
+      COMMAND windres ARGS pcre2posix.rc pcre2posix.o
+      WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
+      COMMENT Using pcre2posix coff info in mingw build)
+    SET(PCRE2POSIX_SOURCES ${PCRE2POSIX_SOURCES} ${PROJECT_SOURCE_DIR}/pcre2posix.o)
+  ENDIF(EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc)
+ENDIF(MINGW AND BUILD_SHARED_LIBS)
+
+IF(MSVC AND BUILD_SHARED_LIBS)
+  IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc)
+    SET(PCRE2_SOURCES ${PCRE2_SOURCES} pcre2.rc)
+  ENDIF(EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc)
+
+  IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc)
+    SET(PCRE2POSIX_SOURCES ${PCRE2POSIX_SOURCES} pcre2posix.rc)
+  ENDIF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc)
+ENDIF(MSVC AND BUILD_SHARED_LIBS)
 
 
 # Fix static compilation with MSVC: https://bugs.exim.org/show_bug.cgi?id=1681
 # Fix static compilation with MSVC: https://bugs.exim.org/show_bug.cgi?id=1681
 # This code was taken from the CMake wiki, not from WebM.
 # This code was taken from the CMake wiki, not from WebM.
@@ -510,71 +656,187 @@ IF(MSVC)
 ENDIF(MSVC)
 ENDIF(MSVC)
 
 
 SET(CMAKE_INCLUDE_CURRENT_DIR 1)
 SET(CMAKE_INCLUDE_CURRENT_DIR 1)
-# needed to make sure to not link debug libs
-# against release libs and vice versa
-IF(WIN32)
-  SET(CMAKE_DEBUG_POSTFIX "d")
-ENDIF(WIN32)
 
 
 SET(targets)
 SET(targets)
 
 
 # 8-bit library
 # 8-bit library
 
 
 IF(PCRE2_BUILD_PCRE2_8)
 IF(PCRE2_BUILD_PCRE2_8)
-ADD_LIBRARY(pcre2-8 ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h)
-SET_PROPERTY(TARGET pcre2-8
-  PROPERTY COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=8)
-SET(targets ${targets} pcre2-8)
-ADD_LIBRARY(pcre2-posix ${PCRE2POSIX_HEADERS} ${PCRE2POSIX_SOURCES})
-SET_PROPERTY(TARGET pcre2-posix
-  PROPERTY COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=8)
-SET(targets ${targets} pcre2-posix)
-TARGET_LINK_LIBRARIES(pcre2-posix pcre2-8)
-
-IF(MINGW AND NOT PCRE2_STATIC)
-  IF(NON_STANDARD_LIB_PREFIX)
-    SET_TARGET_PROPERTIES(pcre2-8 pcre2-posix PROPERTIES PREFIX "")
-  ENDIF(NON_STANDARD_LIB_PREFIX)
-  IF(NON_STANDARD_LIB_SUFFIX)
-    SET_TARGET_PROPERTIES(pcre2-8 pcre2-posix PROPERTIES SUFFIX "-0.dll")
-  ENDIF(NON_STANDARD_LIB_SUFFIX)
-ENDIF(MINGW AND NOT PCRE2_STATIC)
+  IF(BUILD_STATIC_LIBS)
+    ADD_LIBRARY(pcre2-8-static STATIC ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h)
+    SET_TARGET_PROPERTIES(pcre2-8-static PROPERTIES
+      COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=8
+      MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_8_MACHO_COMPATIBILITY_VERSION}"
+      MACHO_CURRENT_VERSION "${LIBPCRE2_8_MACHO_CURRENT_VERSION}"
+      VERSION ${LIBPCRE2_8_VERSION}
+      SOVERSION ${LIBPCRE2_8_SOVERSION})
+    TARGET_COMPILE_DEFINITIONS(pcre2-8-static PUBLIC PCRE2_STATIC)
+    SET(targets ${targets} pcre2-8-static)
+    ADD_LIBRARY(pcre2-posix-static STATIC ${PCRE2POSIX_HEADERS} ${PCRE2POSIX_SOURCES})
+    SET_TARGET_PROPERTIES(pcre2-posix-static PROPERTIES
+      COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=8
+      MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_POSIX_MACHO_COMPATIBILITY_VERSION}"
+      MACHO_CURRENT_VERSION "${LIBPCRE2_POSIX_MACHO_CURRENT_VERSION}"
+      VERSION ${LIBPCRE2_POSIX_VERSION}
+      SOVERSION ${LIBPCRE2_POSIX_SOVERSION})
+    TARGET_LINK_LIBRARIES(pcre2-posix-static pcre2-8-static)
+    TARGET_COMPILE_DEFINITIONS(pcre2-posix-static PUBLIC PCRE2_STATIC)
+    SET(targets ${targets} pcre2-posix-static)
+
+    IF(MSVC)
+      SET_TARGET_PROPERTIES(pcre2-8-static PROPERTIES OUTPUT_NAME pcre2-8-static)
+      SET_TARGET_PROPERTIES(pcre2-posix-static PROPERTIES OUTPUT_NAME pcre2-posix-static)
+    ELSE(MSVC)
+      SET_TARGET_PROPERTIES(pcre2-8-static PROPERTIES OUTPUT_NAME pcre2-8)
+      SET_TARGET_PROPERTIES(pcre2-posix-static PROPERTIES OUTPUT_NAME pcre2-posix)
+    ENDIF(MSVC)
+    IF(PCRE2_STATIC_PIC)
+      SET_TARGET_PROPERTIES(pcre2-8-static pcre2-posix-static PROPERTIES POSITION_INDEPENDENT_CODE 1)
+    ENDIF(PCRE2_STATIC_PIC)
+  ENDIF(BUILD_STATIC_LIBS)
+
+  IF(BUILD_SHARED_LIBS)
+    ADD_LIBRARY(pcre2-8-shared SHARED ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h)
+    SET_TARGET_PROPERTIES(pcre2-8-shared PROPERTIES
+      COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=8
+      MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_8_MACHO_COMPATIBILITY_VERSION}"
+      MACHO_CURRENT_VERSION "${LIBPCRE2_8_MACHO_CURRENT_VERSION}"
+      VERSION ${LIBPCRE2_8_VERSION}
+      SOVERSION ${LIBPCRE2_8_SOVERSION}
+      OUTPUT_NAME pcre2-8)
+    SET(targets ${targets} pcre2-8-shared)
+    ADD_LIBRARY(pcre2-posix-shared SHARED ${PCRE2POSIX_HEADERS} ${PCRE2POSIX_SOURCES})
+    SET_TARGET_PROPERTIES(pcre2-posix-shared PROPERTIES
+      COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=8
+      MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_POSIX_MACHO_COMPATIBILITY_VERSION}"
+      MACHO_CURRENT_VERSION "${LIBPCRE2_POSIX_MACHO_CURRENT_VERSION}"
+      VERSION ${LIBPCRE2_POSIX_VERSION}
+      SOVERSION ${LIBPCRE2_POSIX_SOVERSION}
+      OUTPUT_NAME pcre2-posix)
+    TARGET_LINK_LIBRARIES(pcre2-posix-shared pcre2-8-shared)
+    SET(targets ${targets} pcre2-posix-shared)
+
+    IF(MINGW)
+      IF(NON_STANDARD_LIB_PREFIX)
+        SET_TARGET_PROPERTIES(pcre2-8-shared pcre2-posix-shared PROPERTIES PREFIX "")
+      ENDIF(NON_STANDARD_LIB_PREFIX)
+      IF(NON_STANDARD_LIB_SUFFIX)
+        SET_TARGET_PROPERTIES(pcre2-8-shared pcre2-posix-shared PROPERTIES SUFFIX "-0.dll")
+      ENDIF(NON_STANDARD_LIB_SUFFIX)
+    ENDIF(MINGW)
+  ENDIF(BUILD_SHARED_LIBS)
+
+  IF(BUILD_STATIC_LIBS)
+    ADD_LIBRARY(pcre2-8 ALIAS pcre2-8-static)
+    ADD_LIBRARY(pcre2-posix ALIAS pcre2-posix-static)
+  ELSE(BUILD_STATIC_LIBS)
+    ADD_LIBRARY(pcre2-8 ALIAS pcre2-8-shared)
+    ADD_LIBRARY(pcre2-posix ALIAS pcre2-posix-shared)
+  ENDIF(BUILD_STATIC_LIBS)
 ENDIF(PCRE2_BUILD_PCRE2_8)
 ENDIF(PCRE2_BUILD_PCRE2_8)
 
 
 # 16-bit library
 # 16-bit library
 
 
 IF(PCRE2_BUILD_PCRE2_16)
 IF(PCRE2_BUILD_PCRE2_16)
-ADD_LIBRARY(pcre2-16 ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h)
-SET_PROPERTY(TARGET pcre2-16
-  PROPERTY COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=16)
-SET(targets ${targets} pcre2-16)
-
-IF(MINGW AND NOT PCRE2_STATIC)
-  IF(NON_STANDARD_LIB_PREFIX)
-    SET_TARGET_PROPERTIES(pcre2-16 PROPERTIES PREFIX "")
-  ENDIF(NON_STANDARD_LIB_PREFIX)
-  IF(NON_STANDARD_LIB_SUFFIX)
-    SET_TARGET_PROPERTIES(pcre2-16 PROPERTIES SUFFIX "-0.dll")
-  ENDIF(NON_STANDARD_LIB_SUFFIX)
-ENDIF(MINGW AND NOT PCRE2_STATIC)
+  IF(BUILD_STATIC_LIBS)
+    ADD_LIBRARY(pcre2-16-static STATIC ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h)
+    SET_TARGET_PROPERTIES(pcre2-16-static PROPERTIES
+      COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=16
+      MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_32_MACHO_COMPATIBILITY_VERSION}"
+      MACHO_CURRENT_VERSION "${LIBPCRE2_32_MACHO_CURRENT_VERSION}"
+      VERSION ${LIBPCRE2_16_VERSION}
+      SOVERSION ${LIBPCRE2_16_SOVERSION})
+    TARGET_COMPILE_DEFINITIONS(pcre2-16-static PUBLIC PCRE2_STATIC)
+    SET(targets ${targets} pcre2-16-static)
+
+    IF(MSVC)
+      SET_TARGET_PROPERTIES(pcre2-16-static PROPERTIES OUTPUT_NAME pcre2-16-static)
+    ELSE(MSVC)
+      SET_TARGET_PROPERTIES(pcre2-16-static PROPERTIES OUTPUT_NAME pcre2-16)
+    ENDIF(MSVC)
+    IF(PCRE2_STATIC_PIC)
+      SET_TARGET_PROPERTIES(pcre2-16-static PROPERTIES POSITION_INDEPENDENT_CODE 1)
+    ENDIF(PCRE2_STATIC_PIC)
+  ENDIF(BUILD_STATIC_LIBS)
+
+  IF(BUILD_SHARED_LIBS)
+    ADD_LIBRARY(pcre2-16-shared SHARED ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h)
+    SET_TARGET_PROPERTIES(pcre2-16-shared PROPERTIES
+      COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=16
+      MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_32_MACHO_COMPATIBILITY_VERSION}"
+      MACHO_CURRENT_VERSION "${LIBPCRE2_32_MACHO_CURRENT_VERSION}"
+      VERSION ${LIBPCRE2_16_VERSION}
+      SOVERSION ${LIBPCRE2_16_SOVERSION}
+      OUTPUT_NAME pcre2-16)
+    SET(targets ${targets} pcre2-16-shared)
+
+    IF(MINGW)
+      IF(NON_STANDARD_LIB_PREFIX)
+        SET_TARGET_PROPERTIES(pcre2-16-shared PROPERTIES PREFIX "")
+      ENDIF(NON_STANDARD_LIB_PREFIX)
+      IF(NON_STANDARD_LIB_SUFFIX)
+        SET_TARGET_PROPERTIES(pcre2-16-shared PROPERTIES SUFFIX "-0.dll")
+      ENDIF(NON_STANDARD_LIB_SUFFIX)
+    ENDIF(MINGW)
+  ENDIF(BUILD_SHARED_LIBS)
+
+  IF(BUILD_STATIC_LIBS)
+    ADD_LIBRARY(pcre2-16 ALIAS pcre2-16-static)
+  ELSE(BUILD_STATIC_LIBS)
+    ADD_LIBRARY(pcre2-16 ALIAS pcre2-16-shared)
+  ENDIF(BUILD_STATIC_LIBS)
 ENDIF(PCRE2_BUILD_PCRE2_16)
 ENDIF(PCRE2_BUILD_PCRE2_16)
 
 
 # 32-bit library
 # 32-bit library
 
 
 IF(PCRE2_BUILD_PCRE2_32)
 IF(PCRE2_BUILD_PCRE2_32)
-ADD_LIBRARY(pcre2-32 ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h)
-SET_PROPERTY(TARGET pcre2-32
-  PROPERTY COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=32)
-SET(targets ${targets} pcre2-32)
-
-IF(MINGW AND NOT PCRE2_STATIC)
-  IF(NON_STANDARD_LIB_PREFIX)
-    SET_TARGET_PROPERTIES(pcre2-32 PROPERTIES PREFIX "")
-  ENDIF(NON_STANDARD_LIB_PREFIX)
-  IF(NON_STANDARD_LIB_SUFFIX)
-    SET_TARGET_PROPERTIES(pcre2-32 PROPERTIES SUFFIX "-0.dll")
-  ENDIF(NON_STANDARD_LIB_SUFFIX)
-ENDIF(MINGW AND NOT PCRE2_STATIC)
+  IF(BUILD_STATIC_LIBS)
+    ADD_LIBRARY(pcre2-32-static STATIC ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h)
+    SET_TARGET_PROPERTIES(pcre2-32-static PROPERTIES
+      COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=32
+      MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_32_MACHO_COMPATIBILITY_VERSION}"
+      MACHO_CURRENT_VERSION "${LIBPCRE2_32_MACHO_CURRENT_VERSION}"
+      VERSION ${LIBPCRE2_32_VERSION}
+      SOVERSION ${LIBPCRE2_32_SOVERSION})
+    TARGET_COMPILE_DEFINITIONS(pcre2-32-static PUBLIC PCRE2_STATIC)
+    SET(targets ${targets} pcre2-32-static)
+
+    IF(MSVC)
+      SET_TARGET_PROPERTIES(pcre2-32-static PROPERTIES OUTPUT_NAME pcre2-32-static)
+    ELSE(MSVC)
+      SET_TARGET_PROPERTIES(pcre2-32-static PROPERTIES OUTPUT_NAME pcre2-32)
+    ENDIF(MSVC)
+    IF(PCRE2_STATIC_PIC)
+      SET_TARGET_PROPERTIES(pcre2-32-static PROPERTIES POSITION_INDEPENDENT_CODE 1)
+    ENDIF(PCRE2_STATIC_PIC)
+  ENDIF(BUILD_STATIC_LIBS)
+
+  IF(BUILD_SHARED_LIBS)
+    ADD_LIBRARY(pcre2-32-shared SHARED ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h)
+    SET_TARGET_PROPERTIES(pcre2-32-shared PROPERTIES
+      COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=32
+      MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_32_MACHO_COMPATIBILITY_VERSION}"
+      MACHO_CURRENT_VERSION "${LIBPCRE2_32_MACHO_CURRENT_VERSION}"
+      VERSION ${LIBPCRE2_32_VERSION}
+      SOVERSION ${LIBPCRE2_32_SOVERSION}
+      OUTPUT_NAME pcre2-32)
+    SET(targets ${targets} pcre2-32-shared)
+
+    IF(MINGW)
+      IF(NON_STANDARD_LIB_PREFIX)
+        SET_TARGET_PROPERTIES(pcre2-32-shared PROPERTIES PREFIX "")
+      ENDIF(NON_STANDARD_LIB_PREFIX)
+      IF(NON_STANDARD_LIB_SUFFIX)
+        SET_TARGET_PROPERTIES(pcre2-32-shared PROPERTIES SUFFIX "-0.dll")
+      ENDIF(NON_STANDARD_LIB_SUFFIX)
+    ENDIF(MINGW)
+  ENDIF(BUILD_SHARED_LIBS)
+
+  IF(BUILD_STATIC_LIBS)
+    ADD_LIBRARY(pcre2-32 ALIAS pcre2-32-static)
+  ELSE(BUILD_STATIC_LIBS)
+    ADD_LIBRARY(pcre2-32 ALIAS pcre2-32-shared)
+  ENDIF(BUILD_STATIC_LIBS)
 ENDIF(PCRE2_BUILD_PCRE2_32)
 ENDIF(PCRE2_BUILD_PCRE2_32)
 
 
 # Executables
 # Executables
@@ -594,6 +856,13 @@ IF(PCRE2_BUILD_TESTS)
 
 
   SET(PCRE2TEST_SOURCES src/pcre2test.c)
   SET(PCRE2TEST_SOURCES src/pcre2test.c)
 
 
+  IF(MSVC)
+    # This is needed to avoid a stack overflow error in the standard tests. The
+    # flag should be indicated with a forward-slash instead of a hyphen, but
+    # then CMake treats it as a file path.
+    SET(PCRE2TEST_LINKER_FLAGS -STACK:2500000)
+  ENDIF(MSVC)
+
   ADD_EXECUTABLE(pcre2test ${PCRE2TEST_SOURCES})
   ADD_EXECUTABLE(pcre2test ${PCRE2TEST_SOURCES})
   SET(targets ${targets} pcre2test)
   SET(targets ${targets} pcre2test)
   IF(PCRE2_BUILD_PCRE2_8)
   IF(PCRE2_BUILD_PCRE2_8)
@@ -605,7 +874,7 @@ IF(PCRE2_BUILD_TESTS)
   IF(PCRE2_BUILD_PCRE2_32)
   IF(PCRE2_BUILD_PCRE2_32)
     LIST(APPEND PCRE2TEST_LIBS pcre2-32)
     LIST(APPEND PCRE2TEST_LIBS pcre2-32)
   ENDIF(PCRE2_BUILD_PCRE2_32)
   ENDIF(PCRE2_BUILD_PCRE2_32)
-  TARGET_LINK_LIBRARIES(pcre2test ${PCRE2TEST_LIBS})
+  TARGET_LINK_LIBRARIES(pcre2test ${PCRE2TEST_LIBS} ${PCRE2TEST_LINKER_FLAGS})
 
 
   IF(PCRE2_SUPPORT_JIT)
   IF(PCRE2_SUPPORT_JIT)
     ADD_EXECUTABLE(pcre2_jit_test src/pcre2_jit_test.c)
     ADD_EXECUTABLE(pcre2_jit_test src/pcre2_jit_test.c)
@@ -623,14 +892,27 @@ IF(PCRE2_BUILD_TESTS)
     TARGET_LINK_LIBRARIES(pcre2_jit_test ${PCRE2_JIT_TEST_LIBS})
     TARGET_LINK_LIBRARIES(pcre2_jit_test ${PCRE2_JIT_TEST_LIBS})
   ENDIF(PCRE2_SUPPORT_JIT)
   ENDIF(PCRE2_SUPPORT_JIT)
 
 
-  # exes in Debug location tested by the RunTest shell script
+  # exes in Debug location tested by the RunTest and RunGrepTest shell scripts
   # via "make test"
   # via "make test"
 
 
+  # The commented out code below provokes a warning about future removal
+  # of the facility, and requires policy CMP0026 to be set to "OLD". I have
+  # got fed-up with the warnings, but my plea for help on the mailing list
+  # produced no response. So, I've hacked. The new code below seems to work on
+  # Linux.
+
+#  IF(PCRE2_BUILD_PCRE2GREP)
+#    GET_TARGET_PROPERTY(PCRE2GREP_EXE pcre2grep DEBUG_LOCATION)
+#  ENDIF(PCRE2_BUILD_PCRE2GREP)
+#
+#  GET_TARGET_PROPERTY(PCRE2TEST_EXE pcre2test DEBUG_LOCATION)
+
   IF(PCRE2_BUILD_PCRE2GREP)
   IF(PCRE2_BUILD_PCRE2GREP)
-    GET_TARGET_PROPERTY(PCRE2GREP_EXE pcre2grep DEBUG_LOCATION)
+    SET(PCRE2GREP_EXE $<TARGET_FILE:pcre2grep>)
   ENDIF(PCRE2_BUILD_PCRE2GREP)
   ENDIF(PCRE2_BUILD_PCRE2GREP)
 
 
-  GET_TARGET_PROPERTY(PCRE2TEST_EXE pcre2test DEBUG_LOCATION)
+  SET(PCRE2TEST_EXE $<TARGET_FILE:pcre2test>)
+
 
 
 # =================================================
 # =================================================
   # Write out a CTest configuration file
   # Write out a CTest configuration file
@@ -679,7 +961,9 @@ if test \"$?\" != \"0\"; then exit 1; fi
 \@echo off
 \@echo off
 setlocal
 setlocal
 SET srcdir=\"${winsrc}\"
 SET srcdir=\"${winsrc}\"
-SET pcre2test=\"${winexe}\"
+# The next line was replaced by the following one after a user comment.
+# SET pcre2test=\"${winexe}\"
+SET pcre2test=\"${winbin}\\pcre2test.exe\"
 if not [%CMAKE_CONFIG_TYPE%]==[] SET pcre2test=\"${winbin}\\%CMAKE_CONFIG_TYPE%\\pcre2test.exe\"
 if not [%CMAKE_CONFIG_TYPE%]==[] SET pcre2test=\"${winbin}\\%CMAKE_CONFIG_TYPE%\\pcre2test.exe\"
 call %srcdir%\\RunTest.Bat
 call %srcdir%\\RunTest.Bat
 if errorlevel 1 exit /b 1
 if errorlevel 1 exit /b 1
@@ -715,11 +999,25 @@ SET(CMAKE_INSTALL_ALWAYS 1)
 
 
 INSTALL(TARGETS ${targets}
 INSTALL(TARGETS ${targets}
         RUNTIME DESTINATION bin
         RUNTIME DESTINATION bin
-        LIBRARY DESTINATION lib
-        ARCHIVE DESTINATION lib)
+        LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
+        ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR})
+INSTALL(FILES ${pkg_config_files} DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig)
+INSTALL(FILES "${CMAKE_CURRENT_BINARY_DIR}/pcre2-config"
+  DESTINATION bin
+  # Set 0755 permissions
+  PERMISSIONS OWNER_WRITE OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE)
 
 
 INSTALL(FILES ${PCRE2_HEADERS} ${PCRE2POSIX_HEADERS} DESTINATION include)
 INSTALL(FILES ${PCRE2_HEADERS} ${PCRE2POSIX_HEADERS} DESTINATION include)
 
 
+# CMake config files.
+set(PCRE2_CONFIG_IN  ${CMAKE_CURRENT_SOURCE_DIR}/cmake/pcre2-config.cmake.in)
+set(PCRE2_CONFIG_OUT ${CMAKE_CURRENT_BINARY_DIR}/cmake/pcre2-config.cmake)
+configure_file(${PCRE2_CONFIG_IN} ${PCRE2_CONFIG_OUT} @ONLY)
+set(PCRE2_CONFIG_VERSION_IN  ${CMAKE_CURRENT_SOURCE_DIR}/cmake/pcre2-config-version.cmake.in)
+set(PCRE2_CONFIG_VERSION_OUT ${CMAKE_CURRENT_BINARY_DIR}/cmake/pcre2-config-version.cmake)
+configure_file(${PCRE2_CONFIG_VERSION_IN} ${PCRE2_CONFIG_VERSION_OUT} @ONLY)
+install(FILES ${PCRE2_CONFIG_OUT} ${PCRE2_CONFIG_VERSION_OUT} DESTINATION cmake)
+
 FILE(GLOB html ${PROJECT_SOURCE_DIR}/doc/html/*.html)
 FILE(GLOB html ${PROJECT_SOURCE_DIR}/doc/html/*.html)
 FILE(GLOB man1 ${PROJECT_SOURCE_DIR}/doc/*.1)
 FILE(GLOB man1 ${PROJECT_SOURCE_DIR}/doc/*.1)
 FILE(GLOB man3 ${PROJECT_SOURCE_DIR}/doc/*.3)
 FILE(GLOB man3 ${PROJECT_SOURCE_DIR}/doc/*.3)
@@ -746,11 +1044,11 @@ IF(MSVC AND INSTALL_MSVC_PDB)
 ENDIF(MSVC AND INSTALL_MSVC_PDB)
 ENDIF(MSVC AND INSTALL_MSVC_PDB)
 
 
 # Help, only for nice output
 # Help, only for nice output
-IF(BUILD_SHARED_LIBS)
-  SET(BUILD_STATIC_LIBS OFF)
-ELSE(BUILD_SHARED_LIBS)
+IF(BUILD_STATIC_LIBS)
   SET(BUILD_STATIC_LIBS ON)
   SET(BUILD_STATIC_LIBS ON)
-ENDIF(BUILD_SHARED_LIBS)
+ELSE(BUILD_STATIC_LIBS)
+  SET(BUILD_STATIC_LIBS OFF)
+ENDIF(BUILD_STATIC_LIBS)
 
 
 IF(PCRE2_HEAP_MATCH_RECURSE)
 IF(PCRE2_HEAP_MATCH_RECURSE)
   MESSAGE(WARNING "HEAP_MATCH_RECURSE is obsolete and does nothing.")
   MESSAGE(WARNING "HEAP_MATCH_RECURSE is obsolete and does nothing.")
@@ -763,7 +1061,7 @@ IF(PCRE2_SHOW_REPORT)
   ENDIF(CMAKE_C_FLAGS)
   ENDIF(CMAKE_C_FLAGS)
   MESSAGE(STATUS "")
   MESSAGE(STATUS "")
   MESSAGE(STATUS "")
   MESSAGE(STATUS "")
-  MESSAGE(STATUS "PCRE2 configuration summary:")
+  MESSAGE(STATUS "PCRE2-${PCRE2_MAJOR}.${PCRE2_MINOR} configuration summary:")
   MESSAGE(STATUS "")
   MESSAGE(STATUS "")
   MESSAGE(STATUS "  Install prefix .................. : ${CMAKE_INSTALL_PREFIX}")
   MESSAGE(STATUS "  Install prefix .................. : ${CMAKE_INSTALL_PREFIX}")
   MESSAGE(STATUS "  C compiler ...................... : ${CMAKE_C_COMPILER}")
   MESSAGE(STATUS "  C compiler ...................... : ${CMAKE_C_COMPILER}")
@@ -788,11 +1086,13 @@ IF(PCRE2_SHOW_REPORT)
   MESSAGE(STATUS "  Match depth limit ............... : ${PCRE2_MATCH_LIMIT_DEPTH}")
   MESSAGE(STATUS "  Match depth limit ............... : ${PCRE2_MATCH_LIMIT_DEPTH}")
   MESSAGE(STATUS "  Build shared libs ............... : ${BUILD_SHARED_LIBS}")
   MESSAGE(STATUS "  Build shared libs ............... : ${BUILD_SHARED_LIBS}")
   MESSAGE(STATUS "  Build static libs ............... : ${BUILD_STATIC_LIBS}")
   MESSAGE(STATUS "  Build static libs ............... : ${BUILD_STATIC_LIBS}")
+  MESSAGE(STATUS "     with PIC enabled ............. : ${PCRE2_STATIC_PIC}")
   MESSAGE(STATUS "  Build pcre2grep ................. : ${PCRE2_BUILD_PCRE2GREP}")
   MESSAGE(STATUS "  Build pcre2grep ................. : ${PCRE2_BUILD_PCRE2GREP}")
-  MESSAGE(STATUS "  Enable JIT in pcre2grep ......... : ${PCRE2_SUPPORT_PCRE2GREP_JIT}")
-  MESSAGE(STATUS "  Enable callouts in pcre2grep .... : ${PCRE2_SUPPORT_PCRE2GREP_CALLOUT}")
+  MESSAGE(STATUS "  Enable JIT in pcre2grep ......... : ${PCRE2GREP_SUPPORT_JIT}")
+  MESSAGE(STATUS "  Enable callouts in pcre2grep .... : ${PCRE2GREP_SUPPORT_CALLOUT}")
+  MESSAGE(STATUS "  Enable callout fork in pcre2grep. : ${PCRE2GREP_SUPPORT_CALLOUT_FORK}")
   MESSAGE(STATUS "  Buffer size for pcre2grep ....... : ${PCRE2GREP_BUFSIZE}")
   MESSAGE(STATUS "  Buffer size for pcre2grep ....... : ${PCRE2GREP_BUFSIZE}")
-  MESSAGE(STATUS "  Build tests (implies pcre2test  . : ${PCRE2_BUILD_TESTS}")
+  MESSAGE(STATUS "  Build tests (implies pcre2test .. : ${PCRE2_BUILD_TESTS}")
   MESSAGE(STATUS "               and pcre2grep)")
   MESSAGE(STATUS "               and pcre2grep)")
   IF(ZLIB_FOUND)
   IF(ZLIB_FOUND)
     MESSAGE(STATUS "  Link pcre2grep with libz ........ : ${PCRE2_SUPPORT_LIBZ}")
     MESSAGE(STATUS "  Link pcre2grep with libz ........ : ${PCRE2_SUPPORT_LIBZ}")
@@ -815,11 +1115,16 @@ IF(PCRE2_SHOW_REPORT)
     MESSAGE(STATUS "  Link pcre2test with libreadline . : Library not found" )
     MESSAGE(STATUS "  Link pcre2test with libreadline . : Library not found" )
   ENDIF(READLINE_FOUND)
   ENDIF(READLINE_FOUND)
   MESSAGE(STATUS "  Support Valgrind .................: ${PCRE2_SUPPORT_VALGRIND}")
   MESSAGE(STATUS "  Support Valgrind .................: ${PCRE2_SUPPORT_VALGRIND}")
+  IF(PCRE2_DISABLE_PERCENT_ZT)
+    MESSAGE(STATUS "  Use %zu and %td ..................: OFF" )
+  ELSE(PCRE2_DISABLE_PERCENT_ZT)
+    MESSAGE(STATUS "  Use %zu and %td ..................: AUTO" )
+  ENDIF(PCRE2_DISABLE_PERCENT_ZT)
 
 
-  IF(MINGW AND NOT PCRE2_STATIC)
+  IF(MINGW AND BUILD_SHARED_LIBS)
     MESSAGE(STATUS "  Non-standard dll names (prefix) . : ${NON_STANDARD_LIB_PREFIX}")
     MESSAGE(STATUS "  Non-standard dll names (prefix) . : ${NON_STANDARD_LIB_PREFIX}")
     MESSAGE(STATUS "  Non-standard dll names (suffix) . : ${NON_STANDARD_LIB_SUFFIX}")
     MESSAGE(STATUS "  Non-standard dll names (suffix) . : ${NON_STANDARD_LIB_SUFFIX}")
-  ENDIF(MINGW AND NOT PCRE2_STATIC)
+  ENDIF(MINGW AND BUILD_SHARED_LIBS)
 
 
   IF(MSVC)
   IF(MSVC)
     MESSAGE(STATUS "  Install MSVC .pdb files ..........: ${INSTALL_MSVC_PDB}")
     MESSAGE(STATUS "  Install MSVC .pdb files ..........: ${INSTALL_MSVC_PDB}")

+ 943 - 5
regex.mod/pcre/ChangeLog

@@ -1,6 +1,944 @@
 Change Log for PCRE2
 Change Log for PCRE2
 --------------------
 --------------------
 
 
+Version 10.39 29-October-2021
+-----------------------------
+
+1. Fix incorrect detection of alternatives in first character search in JIT.
+
+2. Merged patch from @carenas (GitHub #28):
+
+  Visual Studio 2013 includes support for %zu and %td, so let newer
+  versions of it avoid the fallback, and while at it, make sure that
+  the first check is for DISABLE_PERCENT_ZT so it will be always
+  honoured if chosen.
+
+  prtdiff_t is signed, so use a signed type instead, and make sure
+  that an appropiate width is chosen if pointers are 64bit wide and
+  long is not (ex: Windows 64bit).
+
+  IMHO removing the cast (and therefore the positibilty of truncation)
+  make the code cleaner and the fallback is likely portable enough
+  with all 64-bit POSIX systems doing LP64 except for Windows.
+
+3. Merged patch from @carenas (GitHub #29) to update to Unicode 14.0.0.
+
+4. Merged patch from @carenas (GitHub #30):
+
+  * Cleanup: remove references to no longer used stdint.h
+
+  Since 19c50b9d (Unconditionally use inttypes.h instead of trying for stdint.h
+  (simplification) and remove the now unnecessary inclusion in
+  pcre2_internal.h., 2018-11-14), stdint.h is no longer used.
+
+  Remove checks for it in autotools and CMake and document better the expected
+  build failures for systems that might have stdint.h (C99) and not inttypes.h
+  (from POSIX), like old Windows.
+
+  * Cleanup: remove detection for inttypes.h which is a hard dependency
+
+  CMake checks for standard headers are not meant to be used for hard
+  dependencies, so will prevent a possible fallback to work.
+
+  Alternatively, the header could be checked to make the configuration fail
+  instead of breaking the build, but that was punted, as it was missing anyway
+  from autotools.
+
+5. Merged patch from @carenas (GitHub #32):
+
+  * jit: allow building with ancient MSVC versions
+
+  Visual Studio older than 2013 fails to build with JIT enabled, because it is
+  unable to parse non C89 compatible syntax, with mixed declarations and code.
+  While most recent compilers wouldn't even report this as a warning since it
+  is valid C99, it could be also made visible by adding to gcc/clang the
+  -Wdeclaration-after-statement flag at build time.
+
+  Move the code below the affected definitions.
+
+  * pcre2grep: avoid mixing declarations with code
+
+  Since d5a61ee8 (Patch to detect (and ignore) symlink loops in pcre2grep,
+  2021-08-28), code will fail to build in a strict C89 compiler.
+
+  Reformat slightly to make it C89 compatible again.
+
+
+Version 10.38 01-October-2021
+-----------------------------
+
+1. Fix invalid single character repetition issues in JIT when the repetition
+is inside a capturing bracket and the bracket is preceeded by character
+literals.
+
+2. Installed revised CMake configuration files provided by Jan-Willem Blokland.
+This extends the CMake build system to build both static and shared libraries
+in one go, builds the static library with PIC, and exposes PCRE2 libraries
+using the CMake config files. JWB provided these notes:
+
+- Introduced CMake variable BUILD_STATIC_LIBS to build the static library.
+
+- Make a small modification to config-cmake.h.in by removing the PCRE2_STATIC
+  variable. Added PCRE2_STATIC variable to the static build using the
+  target_compile_definitions() function.
+
+- Extended the CMake config files.
+
+  - Introduced CMake variable PCRE2_USE_STATIC_LIBS to easily switch between
+    the static and shared libraries.
+
+  - Added the PCRE_STATIC variable to the target compile definitions for the
+    import of the static library.
+
+Building static and shared libraries using MSVC results in a name clash of
+the libraries. Both static and shared library builds create, for example, the
+file pcre2-8.lib. Therefore, I decided to change the static library names by
+adding "-static". For example, pcre2-8.lib has become pcre2-8-static.lib.
+[Comment by PH: this is MSVC-specific. It doesn't happen on Linux.]
+
+3. Increased the minimum release number for CMake to 3.0.0 because older than
+2.8.12 is deprecated (it was set to 2.8.5) and causes warnings. Even 3.0.0 is
+quite old; it was released in 2014.
+
+4. Implemented a modified version of Thomas Tempelmann's pcre2grep patch for
+detecting symlink loops. This is dependent on the availability of realpath(),
+which is now tested for in ./configure and CMakeLists.txt.
+
+5. Implemented a modified version of Thomas Tempelmann's patch for faster
+case-independent "first code unit" searches for unanchored patterns in 8-bit
+mode in the interpreters. Instead of just remembering whether one case matched
+or not, it remembers the position of a previous match so as to avoid
+unnecessary repeated searching.
+
+6. Perl now locks out \K in lookarounds, so PCRE2 now does the same by default.
+However, just in case anybody was relying on the old behaviour, there is an
+option called PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK that enables the old behaviour.
+An option has also been added to pcre2grep to enable this.
+
+7. Re-enable a JIT optimization which was unintentionally disabled in 10.35.
+
+8. There is a loop counter to catch excessively crazy patterns when checking
+the lengths of lookbehinds at compile time. This was incorrectly getting reset
+whenever a lookahead was processed, leading to some fuzzer-generated patterns
+taking a very long time to compile when (?|) was present in the pattern,
+because (?|) disables caching of group lengths.
+
+
+Version 10.37 26-May-2021
+-------------------------
+
+1. Change RunGrepTest to use tr instead of sed when testing with binary
+zero bytes, because sed varies a lot from system to system and has problems
+with binary zeros. This is from Bugzilla #2681. Patch from Jeremie
+Courreges-Anglas via Nam Nguyen. This fixes RunGrepTest for OpenBSD. Later:
+it broke it for at least one version of Solaris, where tr can't handle binary
+zeros. However, that system had /usr/xpg4/bin/tr installed, which works OK, so
+RunGrepTest now checks for that command and uses it if found.
+
+2. Compiling with gcc 10.2's -fanalyzer option showed up a hypothetical problem
+with a NULL dereference. I don't think this case could ever occur in practice,
+but I have put in a check in order to get rid of the compiler error.
+
+3. An alternative patch for CMakeLists.txt because 10.36 #4 breaks CMake on
+Windows. Patch from [email protected] fixes bugzilla #2688.
+
+4. Two bugs related to over-large numbers have been fixed so the behaviour is
+now the same as Perl.
+
+  (a) A pattern such as /\214748364/ gave an overflow error instead of being
+  treated as the octal number \214 followed by literal digits.
+
+  (b) A sequence such as {65536 that has no terminating } so is not a
+  quantifier was nevertheless complaining that a quantifier number was too big.
+
+5. A run of autoconf suggested that configure.ac was out-of-date with respect
+to the lastest autoconf. Running autoupdate made some valid changes, some valid
+suggestions, and also some invalid changes, which were fixed by hand. Autoconf
+now runs clean and the resulting "configure" seems to work, so I hope nothing
+is broken. Later: the requirement for autoconf 2.70 broke some automatic test
+robots. It doesn't seem to be necessary: trying a reduction to 2.60.
+
+6. The pattern /a\K.(?0)*/ when matched against "abac" by the interpreter gave
+the answer "bac", whereas Perl and JIT both yield "c". This was because the
+effect of \K was not propagating back from the full pattern recursion. Other
+recursions such as /(a\K.(?1)*)/ did not have this problem.
+
+7. Restore single character repetition optimization in JIT. Currently fewer
+character repetitions are optimized than in 10.34.
+
+8. When the names of the functions in the POSIX wrapper were changed to
+pcre2_regcomp() etc. (see change 10.33 #4 below), functions with the original
+names were left in the library so that pre-compiled programs would still work.
+However, this has proved troublesome when programs link with several libraries,
+some of which use PCRE2 via the POSIX interface while others use a native POSIX
+library. For this reason, the POSIX function names are removed in this release.
+The macros in pcre2posix.h should ensure that re-compiling fixes any programs
+that haven't been compiled since before 10.33.
+
+
+Version 10.36 04-December-2020
+------------------------------
+
+1. Add CET_CFLAGS so that when Intel CET is enabled, pass -mshstk to
+compiler. This fixes https://bugs.exim.org/show_bug.cgi?id=2578. Patch for
+Makefile.am and configure.ac by H.J. Lu. Equivalent patch for CMakeLists.txt
+invented by PH.
+
+2. Fix inifinite loop when a single byte newline is searched in JIT when
+invalid utf8 mode is enabled.
+
+3. Updated CMakeLists.txt with patch from Wolfgang Stöggl (Bugzilla #2584):
+
+  - Include GNUInstallDirs and use ${CMAKE_INSTALL_LIBDIR} instead of hardcoded
+    lib. This allows differentiation between lib and lib64.
+    CMAKE_INSTALL_LIBDIR is used for installation of libraries and also for
+    pkgconfig file generation.
+
+  - Add the version of PCRE2 to the configuration summary like ./configure
+    does.
+
+  - Fix typo: MACTHED_STRING->MATCHED_STRING
+
+4. Updated CMakeLists.txt with another patch from Wolfgang Stöggl (Bugzilla
+#2588):
+
+  - Add escaped double quotes around include directory in CMakeLists.txt to
+    allow spaces in directory names.
+
+  - This fixes a cmake error, if the path of the pcre2 source contains a space.
+
+5. Updated CMakeLists.txt with a patch from B. Scott Michel: CMake's
+documentation suggests using CHECK_SYMBOL_EXISTS over CHECK_FUNCTION_EXIST.
+Moreover, these functions come from specific header files, which need to be
+specified (and, thankfully, are the same on both the Linux and WinXX
+platforms.)
+
+6. Added a (uint32_t) cast to prevent a compiler warning in pcre2_compile.c.
+
+7. Applied a patch from Wolfgang Stöggl (Bugzilla #2600) to fix postfix for
+debug Windows builds using CMake. This also updated configure so that it
+generates *.pc files and pcre2-config with the same content, as in the past.
+
+8. If a pattern ended with (?(VERSION=n.d where n is any number but d is just a
+single digit, the code unit beyond d was being read (i.e. there was a read
+buffer overflow). Fixes ClusterFuzz 23779.
+
+9. After the rework in r1235, certain character ranges were incorrectly
+handled by an optimization in JIT. Furthermore a wrong offset was used to
+read a value from a buffer which could lead to memory overread.
+
+10. Unnoticed for many years was the fact that delimiters other than / in the
+testinput1 and testinput4 files could cause incorrect behaviour when these
+files were processed by perltest.sh. There were several tests that used quotes
+as delimiters, and it was just luck that they didn't go wrong with perltest.sh.
+All the patterns in testinput1 and testinput4 now use / as their delimiter.
+This fixes Bugzilla #2641.
+
+11. Perl has started to give an error for \K within lookarounds (though there
+are cases where it doesn't). PCRE2 still allows this, so the tests that include
+this case have been moved from test 1 to test 2.
+
+12. Further to 10 above, pcre2test has been updated to detect and grumble if a
+delimiter other than / is used after #perltest.
+
+13. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS
+was set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding
+the start of a match was not resetting correctly after a failed match on the
+first valid fragment of the subject, possibly causing incorrect "no match"
+returns on subsequent fragments. For example, the pattern /A/ failed to match
+the subject \xe5A. Fixes Bugzilla #2642.
+
+14. Fixed a bug in character set matching when JIT is enabled and both unicode
+scripts and unicode classes are present at the same time.
+
+15. Added GNU grep's -m (aka --max-count) option to pcre2grep.
+
+16. Refactored substitution processing in pcre2grep strings, both for the -O
+option and when dealing with callouts. There is now a single function that
+handles $ expansion in all cases (instead of multiple copies of almost
+identical code). This means that the same escape sequences are available
+everywhere, which was not previously the case. At the same time, the escape
+sequences $x{...} and $o{...} have been introduced, to allow for characters
+whose code points are greater than 255 in Unicode mode.
+
+17. Applied the patch from Bugzilla #2628 to RunGrepTest. This does an explicit
+test for a version of sed that can handle binary zero, instead of assuming that
+any Linux version will work. Later: replaced $(...) by `...` because not all
+shells recognize the former.
+
+18. Fixed a word boundary check bug in JIT when partial matching is enabled.
+
+19. Fix ARM64 compilation warning in JIT. Patch by Carlo.
+
+20. A bug in the RunTest script meant that if the first part of test 2 failed,
+the failure was not reported.
+
+21. Test 2 was failing when run from a directory other than the source
+directory. This failure was previously missed in RunTest because of 20 above.
+Fixes added to both RunTest and RunTest.bat.
+
+22. Patch to CMakeLists.txt from Daniel to fix problem with testing under
+Windows.
+
+
+Version 10.35 09-May-2020
+---------------------------
+
+1. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT.
+
+2. Fix ARMv5 JIT improper handling of labels right after a constant pool.
+
+3. A JIT bug is fixed which allowed to read the fields of the compiled
+pattern before its existence is checked.
+
+4. Back in the PCRE1 day, capturing groups that contained recursive back
+references to themselves were made atomic (version 8.01, change 18) because
+after the end a repeated group, the captured substrings had their values from
+the final repetition, not from an earlier repetition that might be the
+destination of a backtrack. This feature was documented, and was carried over
+into PCRE2. However, it has now been realized that the major refactoring that
+was done for 10.30 has made this atomicizing unnecessary, and it is confusing
+when users are unaware of it, making some patterns appear not to be working as
+expected. Capture values of recursive back references in repeated groups are
+now correctly backtracked, so this unnecessary restriction has been removed.
+
+5. Added PCRE2_SUBSTITUTE_LITERAL.
+
+6. Avoid some VS compiler warnings.
+
+7. Added PCRE2_SUBSTITUTE_MATCHED.
+
+8. Added (?* and (?<* as synonms for (*napla: and (*naplb: to match another
+regex engine. The Perl regex folks are aware of this usage and have made a note
+about it.
+
+9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
+1, believing that repeating an assertion is pointless. However, if a positive
+assertion contains capturing groups, repetition can be useful. In any case, an
+assertion could always be wrapped in a repeated group. The only restriction
+that is now imposed is that an unlimited maximum is changed to one more than
+the minimum.
+
+10. Fix *THEN verbs in lookahead assertions in JIT.
+
+11. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.
+
+12. The JIT stack should be freed when the low-level stack allocation fails.
+
+13. In pcre2grep, if the final line in a scanned file is output but does not
+end with a newline sequence, add a newline according to the --newline setting.
+
+14. (?(DEFINE)...) groups were not being handled correctly when checking for
+the fixed length of a lookbehind assertion. Such a group within a lookbehind
+should be skipped, as it does not contribute to the length of the group.
+Instead, the (DEFINE) group was being processed, and if at the end of the
+lookbehind, that end was not correctly recognized. Errors such as "lookbehind
+assertion is not fixed length" and also "internal error: bad code value in
+parsed_skip()" could result.
+
+15. Put a limit of 1000 on recursive calls in pcre2_study() when searching
+nested groups for starting code units, in order to avoid stack overflow issues.
+If the limit is reached, it just gives up trying for this optimization.
+
+16. The control verb chain list must always be restored when exiting from a
+recurse function in JIT.
+
+17. Fix a crash which occurs when the character type of an invalid UTF
+character is decoded in JIT.
+
+18. Changes in many areas of the code so that when Unicode is supported and
+PCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for
+upper/lower case computations on characters whose code points are greater than
+127.
+
+19. The function for checking UTF-16 validity was returning an incorrect offset
+for the start of the error when a high surrogate was not followed by a valid
+low surrogate. This caused incorrect behaviour, for example when
+PCRE2_MATCH_INVALID_UTF was set and a match started immediately following the
+invalid high surrogate, such as /aa/ matching "\x{d800}aa".
+
+20. If a DEFINE group immediately preceded a lookbehind assertion, the pattern
+could be mis-compiled and therefore not match correctly. This is the example
+that found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to
+match "word" because the "move back" value was set to zero.
+
+21. Following a request from a user, some extensions and tidies to the
+character tables handling have been done:
+
+  (a) The dftables auxiliary program is renamed pcre2_dftables, but it is still
+  not installed for public use.
+
+  (b) There is now a -b option for pcre2_dftables, which causes the tables to
+  be written in binary. There is also a -help option.
+
+  (c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an
+  application that wants to save tables in binary knows how long they are.
+
+22. Changed setting of CMAKE_MODULE_PATH in CMakeLists.txt from SET to
+LIST(APPEND...) to allow a setting from the command line to be included.
+
+23. Updated to Unicode 13.0.0.
+
+24. CMake build now checks for secure_getenv() and strerror(). Patch by Carlo.
+
+25. Avoid using [-1] as a suffix in pcre2test because it can provoke a compiler
+warning.
+
+26. Added tests for __attribute__((uninitialized)) to both the configure and
+CMake build files, and then applied this attribute to the variable called
+stack_frames_vector[] in pcre2_match(). When implemented, this disables
+automatic initialization (a facility in clang), which can take time on big
+variables.
+
+27. Updated CMakeLists.txt (patches by Uwe Korn) to add support for
+pcre2-config, the libpcre*.pc files, SOVERSION, VERSION and the
+MACHO_*_VERSIONS settings for CMake builds.
+
+28. Another patch to CMakeLists.txt to check for mkostemp (configure already
+does). Patch by Carlo Marcelo Arenas Belon.
+
+29. Check for the existence of memfd_create in both CMake and configure
+configurations. Patch by Carlo Marcelo Arenas Belon.
+
+30. Restrict the configuration setting for the SELinux compatible execmem
+allocator (change 10.30/44) to Linux and NetBSD.
+
+
+Version 10.34 21-November-2019
+------------------------------
+
+1. The maximum number of capturing subpatterns is 65535 (documented), but no
+check on this was ever implemented. This omission has been rectified; it fixes
+ClusterFuzz 14376.
+
+2. Improved the invalid utf32 support of the JIT compiler. Now it correctly
+detects invalid characters in the 0xd800-0xdfff range.
+
+3. Fix minor typo bug in JIT compile when \X is used in a non-UTF string.
+
+4. Add support for matching in invalid UTF strings to the pcre2_match()
+interpreter, and integrate with the existing JIT support via the new
+PCRE2_MATCH_INVALID_UTF compile-time option.
+
+5. Give more error detail for invalid UTF-8 when detected in pcre2grep.
+
+6. Add support for invalid UTF-8 to pcre2grep.
+
+7. Adjust the limit for "must have" code unit searching, in particular,
+increase it substantially for non-anchored patterns.
+
+8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
+minimum is potentially useful.
+
+9. Some changes to the way the minimum subject length is handled:
+
+   * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed;
+     pcre2test now omits this item instead of showing a value of zero.
+
+   * An incorrect minimum length could be calculated for a pattern that
+     contained (*ACCEPT) inside a qualified group whose minimum repetition was
+     zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum
+     of 2. The minimum length scan no longer happens for a pattern that
+     contains (*ACCEPT).
+
+   * When no minimum length is set by the normal scan, but a first and/or last
+     code unit is recorded, set the minimum to 1 or 2 as appropriate.
+
+   * When a pattern contains multiple groups with the same number, a back
+     reference cannot know which one to scan for a minimum length. This used to
+     cause the minimum length finder to give up with no result. Now it treats
+     such references as not adding to the minimum length (which it should have
+     done all along).
+
+   * Furthermore, the above action now happens only if the back reference is to
+     a group that exists more than once in a pattern instead of any back
+     reference in a pattern with duplicate numbers.
+
+10. A (*MARK) value inside a successful condition was not being returned by the
+interpretive matcher (it was returned by JIT). This bug has been mended.
+
+11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work
+if the pattern had more than 32 capturing parentheses. This is fixed. In
+addition (a) the default limit for groups requested by -o<n> has been raised to
+50, (b) the new --om-capture option changes the limit, (c) an error is raised
+if -o asks for a group that is above the limit.
+
+12. The quantifier {1} was always being ignored, but this is incorrect when it
+is made possessive and applied to an item in parentheses, because a
+parenthesized item may contain multiple branches or other backtracking points,
+for example /(a|ab){1}+c/ or /(a+){1}+a/.
+
+13. For partial matches, pcre2test was always showing the maximum lookbehind
+characters, flagged with "<", which is misleading when the lookbehind didn't
+actually look behind the start (because it was later in the pattern). Showing
+all consulted preceding characters for partial matches is now controlled by the
+existing "allusedtext" modifier and, as for complete matches, this facility is
+available only for non-JIT matching, because JIT does not maintain the first
+and last consulted characters.
+
+14. DFA matching (using pcre2_dfa_match()) was not recognising a partial match
+if the end of the subject was encountered in a lookahead (conditional or
+otherwise), an atomic group, or a recursion.
+
+15. Give error if pcre2test -t, -T, -tm or -TM is given an argument of zero.
+
+16. Check for integer overflow when computing lookbehind lengths. Fixes
+Clusterfuzz issue 15636.
+
+17. Implemented non-atomic positive lookaround assertions.
+
+18. If a lookbehind contained a lookahead that contained another lookbehind
+within it, the nested lookbehind was not correctly processed. For example, if
+/(?<=(?=(?<=a)))b/ was matched to "ab" it gave no match instead of matching
+"b".
+
+19. Implemented pcre2_get_match_data_size().
+
+20. Two alterations to partial matching:
+
+    (a) The definition of a partial match is slightly changed: if a pattern
+    contains any lookbehinds, an empty partial match may be given, because this
+    is another situation where adding characters to the current subject can
+    lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab".
+
+    (b) Similarly, if a pattern could match an empty string, an empty partial
+    match may be given. Example: /(?![ab]).*/ with subject "ab". This case
+    applies only to PCRE2_PARTIAL_HARD.
+
+    (c) An empty string partial hard match can be returned for \z and \Z as it
+    is documented that they shouldn't match.
+
+21. A branch that started with (*ACCEPT) was not being recognized as one that
+could match an empty string.
+
+22. Corrected pcre2_set_character_tables() tables data type: was const unsigned
+char * instead of const uint8_t *, as generated by pcre2_maketables().
+
+23. Upgraded to Unicode 12.1.0.
+
+24. Add -jitfast command line option to pcre2test (to make all the jit options
+available directly).
+
+25. Make pcre2test -C show if libreadline or libedit is supported.
+
+26. If the length of one branch of a group exceeded 65535 (the maximum value
+that is remembered as a minimum length), the whole group's length was
+incorrectly recorded as 65535, leading to incorrect "no match" when start-up
+optimizations were in force.
+
+27. The "rightmost consulted character" value was not always correct; in
+particular, if a pattern ended with a negative lookahead, characters that were
+inspected in that lookahead were not included.
+
+28. Add the pcre2_maketables_free() function.
+
+29. The start-up optimization that looks for a unique initial matching
+code unit in the interpretive engines uses memchr() in 8-bit mode. When the
+search is caseless, it was doing so inefficiently, which ended up slowing down
+the match drastically when the subject was very long. The revised code (a)
+remembers if one case is not found, so it never repeats the search for that
+case after a bumpalong and (b) when one case has been found, it searches only
+up to that position for an earlier occurrence of the other case. This fix
+applies to both interpretive pcre2_match() and to pcre2_dfa_match().
+
+30. While scanning to find the minimum length of a group, if any branch has
+minimum length zero, there is no need to scan any subsequent branches (a small
+compile-time performance improvement).
+
+31. Installed a .gitignore file on a user's suggestion. When using the svn
+repository with git (through git svn) this helps keep it tidy.
+
+32. Add underflow check in JIT which may occur when the value of subject
+string pointer is close to 0.
+
+33. Arrange for classes such as [Aa] which contain just the two cases of the
+same character, to be treated as a single caseless character. This causes the
+first and required code unit optimizations to kick in where relevant.
+
+34. Improve the bitmap of starting bytes for positive classes that include wide
+characters, but no property types, in UTF-8 mode. Previously, on encountering
+such a class, the bits for all bytes greater than \xc4 were set, thus
+specifying any character with codepoint >= 0x100. Now the only bits that are
+set are for the relevant bytes that start the wide characters. This can give a
+noticeable performance improvement.
+
+35. If the bitmap of starting code units contains only 1 or 2 bits, replace it
+with a single starting code unit (1 bit) or a caseless single starting code
+unit if the two relevant characters are case-partners. This is particularly
+relevant to the 8-bit library, though it applies to all. It can give a
+performance boost for patterns such as [Ww]ord and (word|WORD). However, this
+optimization doesn't happen if there is a "required" code unit of the same
+value (because the search for a "required" code unit starts at the match start
+for non-unique first code unit patterns, but after a unique first code unit,
+and patterns such as a*a need the former action).
+
+36. Small patch to pcre2posix.c to set the erroroffset field to -1 immediately
+after a successful compile, instead of at the start of matching to avoid a
+sanitizer complaint (regexec is supposed to be thread safe).
+
+37. Add NEON vectorization to JIT to speed up matching of first character and
+pairs of characters on ARM64 CPUs.
+
+38. If a non-ASCII character was the first in a starting assertion in a
+caseless match, the "first code unit" optimization did not get the casing
+right, and the assertion failed to match a character in the other case if it
+did not start with the same code unit.
+
+39. Fixed the incorrect computation of jump sizes on x86 CPUs in JIT. A masking
+operation was incorrectly removed in r1136. Reported by Ralf Junker.
+
+
+Version 10.33 16-April-2019
+---------------------------
+
+1. Added "allvector" to pcre2test to make it easy to check the part of the
+ovector that shouldn't be changed, in particular after substitute and failed or
+partial matches.
+
+2. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
+a greater than 1 fixed quantifier. This issue was found by Yunho Kim.
+
+3. Added support for callouts from pcre2_substitute(). After 10.33-RC1, but
+prior to release, fixed a bug that caused a crash if pcre2_substitute() was
+called with a NULL match context.
+
+4. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper
+functions that use the standard POSIX names. However, in pcre2posix.h the POSIX
+names are defined as macros. This should help avoid linking with the wrong
+library in some environments while still exporting the POSIX names for
+pre-existing programs that use them. (The Debian alternative names are also
+defined as macros, but not documented.)
+
+5. Fix an xclass matching issue in JIT.
+
+6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315).
+
+7. Implement the Perl 5.28 experimental alphabetic names for atomic groups and
+lookaround assertions, for example, (*pla:...) and (*atomic:...). These are
+characterized by a lower case letter following (* and to simplify coding for
+this, the character tables created by pcre2_maketables() were updated to add a
+new "is lower case letter" bit. At the same time, the now unused "is
+hexadecimal digit" bit was removed. The default tables in
+src/pcre2_chartables.c.dist are updated.
+
+8. Implement the new Perl "script run" features (*script_run:...) and
+(*atomic_script_run:...) aka (*sr:...) and (*asr:...).
+
+9. Fixed two typos in change 22 for 10.21, which added special handling for
+ranges such as a-z in EBCDIC environments. The original code probably never
+worked, though there were no bug reports.
+
+10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via
+pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast
+path. Also, when a match fails, set the subject field in the match data to NULL
+for tidiness - none of the substring extractors should reference this after
+match failure.
+
+11. If a pattern started with a subroutine call that had a quantifier with a
+minimum of zero, an incorrect "match must start with this character" could be
+recorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to
+be the first character of a match.
+
+12. The heap limit checking code in pcre2_dfa_match() could suffer from
+overflow if the heap limit was set very large. This could cause incorrect "heap
+limit exceeded" errors.
+
+13. Add "kibibytes" to the heap limit output from pcre2test -C to make the
+units clear.
+
+14. Add a call to pcre2_jit_free_unused_memory() in pcre2grep, for tidiness.
+
+15. Updated the VMS-specific code in pcre2test on the advice of a VMS user.
+
+16. Removed the unnecessary inclusion of stdint.h (or inttypes.h) from
+pcre2_internal.h as it is now included by pcre2.h. Also, change 17 for 10.32
+below was unnecessarily complicated, as inttypes.h is a Standard C header,
+which is defined to be a superset of stdint.h. Instead of conditionally
+including stdint.h or inttypes.h, pcre2.h now unconditionally includes
+inttypes.h. This supports environments that do not have stdint.h but do have
+inttypes.h, which are known to exist. A note in the autotools documentation
+says (November 2018) that there are none known that are the other way round.
+
+17. Added --disable-percent-zt to "configure" (and equivalent to CMake) to
+forcibly disable the use of %zu and %td in formatting strings because there is
+at least one version of VMS that claims to be C99 but does not support these
+modifiers.
+
+18. Added --disable-pcre2grep-callout-fork, which restricts the callout support
+in pcre2grep to the inbuilt echo facility. This may be useful in environments
+that do not support fork().
+
+19. Fix two instances of <= 0 being applied to unsigned integers (the VMS
+compiler complains).
+
+20. Added "fork" support for VMS to pcre2grep, for running an external program
+via a string callout.
+
+21. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel.
+
+22. If a pattern started with (*MARK), (*COMMIT), (*PRUNE), (*SKIP), or (*THEN)
+followed by ^ it was not recognized as anchored.
+
+23. The RunGrepTest script used to cut out the test of NUL characters for
+Solaris and MacOS as printf and sed can't handle them. It seems that the *BSD
+systems can't either. I've inverted the test so that only those OS that are
+known to work (currently only Linux) try to run this test.
+
+24. Some tests in RunGrepTest appended to testtrygrep from two different file
+descriptors instead of redirecting stderr to stdout. This worked on Linux, but
+it was reported not to on other systems, causing the tests to fail.
+
+25. In the RunTest script, make the test for stack setting use the same value
+for the stack as it needs for -bigstack.
+
+26. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning.
+
+26. With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s
+which are valid in character classes, but not as the end of ranges, were being
+treated as literals. An example is [_-\s] (but not [\s-_] because that gave an
+error at the *start* of a range). Now an "invalid range" error is given
+independently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
+
+27. Related to 26 above, PCRE2_BAD_ESCAPE_IS_LITERAL was affecting known escape
+sequences such as \eX when they appeared invalidly in a character class. Now
+the option applies only to unrecognized or malformed escape sequences.
+
+28. Fix word boundary in JIT compiler. Patch by Mike Munday.
+
+29. The pcre2_dfa_match() function was incorrectly handling conditional version
+tests such as (?(VERSION>=0)...) when the version test was true. Incorrect
+processing or a crash could result.
+
+30. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group
+names, as Perl does. There was a small bug in this new code, found by
+ClusterFuzz 12950, fixed before release.
+
+31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh}
+construct.
+
+32. Compile \p{Any} to be the same as . in DOTALL mode, so that it benefits
+from auto-anchoring if \p{Any}* starts a pattern.
+
+33. Compile invalid UTF check in JIT test when only pcre32 is enabled.
+
+34. For some time now, CMake has been warning about the setting of policy
+CMP0026 to "OLD" in CmakeLists.txt, and hinting that the feature might be
+removed in a future version. A request for CMake expertise on the list produced
+no result, so I have now hacked CMakeLists.txt along the lines of some changes
+I found on the Internet. The new code no longer needs the policy setting, and
+it appears to work fine on Linux.
+
+35. Setting --enable-jit=auto for an out-of-tree build failed because the
+source directory wasn't in the search path for AC_TRY_COMPILE always. Patch
+from Ross Burton.
+
+36. Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available.
+Patch by Guillem Jover.
+
+37. Changed expressions such as 1<<10 to 1u<<10 in many places because compiler
+warnings were reported.
+
+38. Using the clang compiler with sanitizing options causes runtime complaints
+about truncation for statments such as x = ~x when x is an 8-bit value; it
+seems to compute ~x as a 32-bit value. Changing such statements to x = 255 ^ x
+gets rid of the warnings. There were also two missing casts in pcre2test.
+
+
+Version 10.32 10-September-2018
+-------------------------------
+
+1. When matching using the the REG_STARTEND feature of the POSIX API with a
+non-zero starting offset, unset capturing groups with lower numbers than a
+group that did capture something were not being correctly returned as "unset"
+(that is, with offset values of -1).
+
+2. When matching using the POSIX API, pcre2test used to omit listing unset
+groups altogether. Now it shows those that come before any actual captures as
+"<unset>", as happens for non-POSIX matching.
+
+3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
+whatever the build configuration was. It now correctly says "\R matches all
+Unicode newlines" in the default case when --enable-bsr-anycrlf has not been
+specified. Similarly, running "pcre2test -C bsr" never produced the result
+ANY.
+
+4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
+multi-code-unit characters caused bad behaviour and possibly a crash. This
+issue was fixed for other kinds of repeat in release 10.20 by change 19, but
+repeating character classes were overlooked.
+
+5. pcre2grep now supports the inclusion of binary zeros in patterns that are
+read from files via the -f option.
+
+6. A small fix to pcre2grep to avoid compiler warnings for -Wformat-overflow=2.
+
+7. Added --enable-jit=auto support to configure.ac.
+
+8. Added some dummy variables to the heapframe structure in 16-bit and 32-bit
+modes for the benefit of m68k, where pointers can be 16-bit aligned. The
+dummies force 32-bit alignment and this ensures that the structure is a
+multiple of PCRE2_SIZE, a requirement that is tested at compile time. In other
+architectures, alignment requirements take care of this automatically.
+
+9. When returning an error from pcre2_pattern_convert(), ensure the error
+offset is set zero for early errors.
+
+10. A number of patches for Windows support from Daniel Richard G:
+
+  (a) List of error numbers in Runtest.bat corrected (it was not the same as in
+      Runtest).
+
+  (b) pcre2grep snprintf() workaround as used elsewhere in the tree.
+
+  (c) Support for non-C99 snprintf() that returns -1 in the overflow case.
+
+11. Minor tidy of pcre2_dfa_match() code.
+
+12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer
+use the stack for local workspace and local ovectors. Instead, an initial block
+of stack is reserved, but if this is insufficient, heap memory is used. The
+heap limit parameter now applies to pcre2_dfa_match().
+
+13. If a "find limits" test of DFA matching in pcre2test resulted in too many
+matches for the ovector, no matches were displayed.
+
+14. Removed an occurrence of ctrl/Z from test 6 because Windows treats it as
+EOF. The test looks to have come from a fuzzer.
+
+15. If PCRE2 was built with a default match limit a lot greater than the
+default default of 10 000 000, some JIT tests of the match limit no longer
+failed. All such tests now set 10 000 000 as the upper limit.
+
+16. Another Windows related patch for pcregrep to ensure that WIN32 is
+undefined under Cygwin.
+
+17. Test for the presence of stdint.h and inttypes.h in configure and CMake and
+include whichever exists (stdint preferred) instead of unconditionally
+including stdint. This makes life easier for old and non-standard systems.
+
+18. Further changes to improve portability, especially to old and or non-
+standard systems:
+
+  (a) Put all printf arguments in RunGrepTest into single, not double, quotes,
+      and use \0 not \x00 for binary zero.
+
+  (b) Avoid the use of C++ (i.e. BCPL) // comments.
+
+  (c) Parameterize the use of %zu in pcre2test to make it like %td. For both of
+      these now, if using MSVC or a standard C before C99, %lu is used with a
+      cast if necessary.
+
+19. Applied a contributed patch to CMakeLists.txt to increase the stack size
+when linking pcre2test with MSVC. This gets rid of a stack overflow error in
+the standard set of tests.
+
+20. Output a warning in pcre2test when ignoring the "altglobal" modifier when
+it is given with the "replace" modifier.
+
+21. In both pcre2test and pcre2_substitute(), with global matching, a pattern
+that matched an empty string, but never at the starting match offset, was not
+handled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such
+a pattern. Because \G is in a lookbehind assertion, there has to be a
+"bumpalong" before there can be a match. The automatic "advance by one
+character after an empty string match" rule is therefore inappropriate. A more
+complicated algorithm has now been implemented.
+
+22. When checking to see if a lookbehind is of fixed length, lookaheads were
+correctly ignored, but qualifiers on lookaheads were not being ignored, leading
+to an incorrect "lookbehind assertion is not fixed length" error.
+
+23. The VERSION condition test was reading fractional PCRE2 version numbers
+such as the 04 in 10.04 incorrectly and hence giving wrong results.
+
+24. Updated to Unicode version 11.0.0. As well as the usual addition of new
+scripts and characters, this involved re-jigging the grapheme break property
+algorithm because Unicode has changed the way emojis are handled.
+
+25. Fixed an obscure bug that struck when there were two atomic groups not
+separated by something with a backtracking point. There could be an incorrect
+backtrack into the first of the atomic groups. A complicated example is
+/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ matched against "abc", where the *SKIP
+shouldn't find a MARK (because is in an atomic group), but it did.
+
+26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set
+a list of modifiers for all subsequent patterns - only those that the script
+recognizes are meaningful; (2) #subject lines can be used to set or unset a
+default "mark" modifier; (3) Unsupported #command lines give a warning when
+they are ignored; (4) Mark data is output only if the "mark" modifier is
+present.
+
+27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
+
+28. A (*MARK) name was not being passed back for positive assertions that were
+terminated by (*ACCEPT).
+
+29. Add support for \N{U+dddd}, but only in Unicode mode.
+
+30. Add support for (?^) for unsetting all imnsx options.
+
+31. The PCRE2_EXTENDED (/x) option only ever discarded space characters whose
+code point was less than 256 and that were recognized by the lookup table
+generated by pcre2_maketables(), which uses isspace() to identify white space.
+Now, when Unicode support is compiled, PCRE2_EXTENDED also discards U+0085,
+U+200E, U+200F, U+2028, and U+2029, which are additional characters defined by
+Unicode as "Pattern White Space". This makes PCRE2 compatible with Perl.
+
+32. In certain circumstances, option settings within patterns were not being
+correctly processed. For example, the pattern /((?i)A)(?m)B/ incorrectly
+matched "ab". (The (?m) setting lost the fact that (?i) should be reset at the
+end of its group during the parse process, but without another setting such as
+(?m) the compile phase got it right.) This bug was introduced by the
+refactoring in release 10.23.
+
+33. PCRE2 uses bcopy() if available when memmove() is not, and it used just to
+define memmove() as function call to bcopy(). This hasn't been tested for a
+long time because in pcre2test the result of memmove() was being used, whereas
+bcopy() doesn't return a result. This feature is now refactored always to call
+an emulation function when there is no memmove(). The emulation makes use of
+bcopy() when available.
+
+34. When serializing a pattern, set the memctl, executable_jit, and tables
+fields (that is, all the fields that contain pointers) to zeros so that the
+result of serializing is always the same. These fields are re-set when the
+pattern is deserialized.
+
+35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
+negative class with no characters less than 0x100 followed by a positive class
+with only characters less than 0x100, the first class was incorrectly being
+auto-possessified, causing incorrect match failures.
+
+36. Removed the character type bit ctype_meta, which dates from PCRE1 and is
+not used in PCRE2.
+
+37. Tidied up unnecessarily complicated macros used in the escapes table.
+
+38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted
+from distribution tarballs, owing to a typo in Makefile.am which had
+testoutput8-16-3 twice. Now fixed.
+
+39. If the only branch in a conditional subpattern was anchored, the whole
+subpattern was treated as anchored, when it should not have been, since the
+assumed empty second branch cannot be anchored. Demonstrated by test patterns
+such as /(?(1)^())b/ or /(?(?=^))b/.
+
+40. A repeated conditional subpattern that could match an empty string was
+always assumed to be unanchored. Now it it checked just like any other
+repeated conditional subpattern, and can be found to be anchored if the minimum
+quantifier is one or more. I can't see much use for a repeated anchored
+pattern, but the behaviour is now consistent.
+
+41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint
+(for an event that could never occur but you had to have external information
+to know that).
+
+42. If before the first match in a file that was being searched by pcre2grep
+there was a line that was sufficiently long to cause the input buffer to be
+expanded, the variable holding the location of the end of the previous match
+was being adjusted incorrectly, and could cause an overflow warning from a code
+sanitizer. However, as the value is used only to print pending "after" lines
+when the next match is reached (and there are no such lines in this case) this
+bug could do no damage.
+
 
 
 Version 10.31 12-February-2018
 Version 10.31 12-February-2018
 ------------------------------
 ------------------------------
@@ -304,8 +1242,8 @@ tests to improve coverage.
 31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
 31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
 pcre2test, a crash could occur.
 pcre2test, a crash could occur.
 
 
-32. Make -bigstack in RunTest allocate a 64Mb stack (instead of 16 MB) so that
-all the tests can run with clang's sanitizing options.
+32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16MiB) so
+that all the tests can run with clang's sanitizing options.
 
 
 33. Implement extra compile options in the compile context and add the first
 33. Implement extra compile options in the compile context and add the first
 one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
 one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
@@ -898,9 +1836,9 @@ to the same code as '.' when PCRE2_DOTALL is set).
 40. Fix two clang compiler warnings in pcre2test when only one code unit width
 40. Fix two clang compiler warnings in pcre2test when only one code unit width
 is supported.
 is supported.
 
 
-41. Upgrade RunTest to automatically re-run test 2 with a large (64M) stack if
-it fails when running the interpreter with a 16M stack (and if changing the
-stack size via pcre2test is possible). This avoids having to manually set a
+41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack
+if it fails when running the interpreter with a 16MiB stack (and if changing
+the stack size via pcre2test is possible). This avoids having to manually set a
 large stack size when testing with clang.
 large stack size when testing with clang.
 
 
 42. Fix register overwite in JIT when SSE2 acceleration is enabled.
 42. Fix register overwite in JIT when SSE2 acceleration is enabled.

+ 18 - 7
regex.mod/pcre/CheckMan

@@ -16,6 +16,7 @@ while (scalar(@ARGV) > 0)
   
   
   while (<IN>)
   while (<IN>)
     {  
     {  
+    $count = 0;
     $line++; 
     $line++; 
     if (/^\s*$/)
     if (/^\s*$/)
       {
       {
@@ -50,14 +51,24 @@ while (scalar(@ARGV) > 0)
         $yield = 1;
         $yield = 1;
         }
         }
       }
       }
-    else
+    elsif (/\\[^ef]|\\f[^IBP]/)
       {
       {
-      if (/\\[^ef]|\\f[^IBP]/)
-        {
-        printf "Bad backslash in line $line of $file\n";  
-        $yield = 1; 
-        } 
-      }   
+      printf "Bad backslash in line $line of $file\n";
+      $yield = 1;
+      }
+    while (/\\f[BI]/g)
+      {
+      $count++;
+      }
+    while (/\\fP/g)
+      {
+      $count--;
+      }
+    if ($count != 0)
+      {
+      printf "Mismatching formatting in line $line of $file\n";
+      $yield = 1;
+      }
     }
     }
      
      
   close(IN);   
   close(IN);   

+ 39 - 26
regex.mod/pcre/HACKING

@@ -195,6 +195,7 @@ META_END              End of pattern (this value is 0x80000000)
 META_FAIL             (*FAIL)
 META_FAIL             (*FAIL)
 META_KET              ) closing parenthesis
 META_KET              ) closing parenthesis
 META_LOOKAHEAD        (?= start of lookahead
 META_LOOKAHEAD        (?= start of lookahead
+META_LOOKAHEAD_NA     (*napla: start of non-atomic lookahead
 META_LOOKAHEADNOT     (?! start of negative lookahead
 META_LOOKAHEADNOT     (?! start of negative lookahead
 META_NOCAPTURE        (?: no capture parens
 META_NOCAPTURE        (?: no capture parens
 META_PLUS             +
 META_PLUS             +
@@ -256,6 +257,7 @@ The following are followed by a length element, then a number of character code
 values (which should match with the length):
 values (which should match with the length):
 
 
 META_MARK             (*MARK:xxxx)
 META_MARK             (*MARK:xxxx)
+META_COMMIT_ARG       )*COMMIT:xxxx)
 META_PRUNE_ARG        (*PRUNE:xxx)
 META_PRUNE_ARG        (*PRUNE:xxx)
 META_SKIP_ARG         (*SKIP:xxxx)
 META_SKIP_ARG         (*SKIP:xxxx)
 META_THEN_ARG         (*THEN:xxxx)
 META_THEN_ARG         (*THEN:xxxx)
@@ -285,8 +287,9 @@ The following are also followed just by an offset, but also the lower 16 bits
 of the main word contain the length of the first branch of the lookbehind
 of the main word contain the length of the first branch of the lookbehind
 group; this is used when generating OP_REVERSE for that branch.
 group; this is used when generating OP_REVERSE for that branch.
 
 
-META_LOOKBEHIND       (?<=
-META_LOOKBEHINDNOT    (?<!
+META_LOOKBEHIND       (?<=      start of lookbehind
+META_LOOKBEHIND_NA    (*naplb:  start of non-atomic lookbehind
+META_LOOKBEHINDNOT    (?<!      start of negative lookbehind
 
 
 The following are followed by two elements, the minimum and maximum. Repeat
 The following are followed by two elements, the minimum and maximum. Repeat
 values are limited to 65535 (MAX_REPEAT). A maximum value of "unlimited" is
 values are limited to 65535 (MAX_REPEAT). A maximum value of "unlimited" is
@@ -348,7 +351,7 @@ The /i, /m, or /s options (PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, and
 others) may be changed in the middle of patterns by items such as (?i). Their
 others) may be changed in the middle of patterns by items such as (?i). Their
 processing is handled entirely at compile time by generating different opcodes
 processing is handled entirely at compile time by generating different opcodes
 for the different settings. The runtime functions do not need to keep track of
 for the different settings. The runtime functions do not need to keep track of
-an options state.
+an option's state.
 
 
 PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE
 PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE
 are tracked and processed during the parsing pre-pass. The others are handled
 are tracked and processed during the parsing pre-pass. The others are handled
@@ -370,7 +373,7 @@ default value for LINK_SIZE is 2, except for the 32-bit library, where it can
 only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
 only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
 and the 16-bit library can be compiled to use 4-byte values, though this
 and the 16-bit library can be compiled to use 4-byte values, though this
 impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
 impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
-necessary only when patterns whose compiled length is greater than 64K code
+necessary only when patterns whose compiled length is greater than 65535 code
 units are going to be processed. When a LINK_SIZE value uses more than one code
 units are going to be processed. When a LINK_SIZE value uses more than one code
 unit, the most significant unit is first.
 unit, the most significant unit is first.
 
 
@@ -382,7 +385,7 @@ that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode
 Opcodes with no following data
 Opcodes with no following data
 ------------------------------
 ------------------------------
 
 
-These items are all just one unit long
+These items are all just one unit long:
 
 
   OP_END                 end of pattern
   OP_END                 end of pattern
   OP_ANY                 match any one character other than newline
   OP_ANY                 match any one character other than newline
@@ -430,14 +433,22 @@ character). Another use is for [^] when empty classes are permitted
 (PCRE2_ALLOW_EMPTY_CLASS is set).
 (PCRE2_ALLOW_EMPTY_CLASS is set).
 
 
 
 
-Backtracking control verbs with optional data
----------------------------------------------
+Backtracking control verbs
+--------------------------
 
 
-(*THEN) without an argument generates the opcode OP_THEN and no following data.
-OP_MARK is followed by the mark name, preceded by a length in one code unit,
-and followed by a binary zero. For (*PRUNE), (*SKIP), and (*THEN) with
-arguments, the opcodes OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used,
-with the name following in the same format as OP_MARK.
+Verbs with no arguments generate opcodes with no following data (as listed
+in the section above). 
+
+(*MARK:NAME) generates OP_MARK followed by the mark name, preceded by a
+length in one code unit, and followed by a binary zero. The name length is
+limited by the size of the code unit.
+
+(*ACCEPT:NAME) and (*FAIL:NAME) are compiled as (*MARK:NAME)(*ACCEPT) and
+(*MARK:NAME)(*FAIL) respectively.
+
+For (*COMMIT:NAME), (*PRUNE:NAME), (*SKIP:NAME), and (*THEN:NAME), the opcodes
+OP_COMMIT_ARG, OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used, with the
+name following in the same format as for OP_MARK.
 
 
 
 
 Matching literal characters
 Matching literal characters
@@ -706,13 +717,15 @@ Assertions
 ----------
 ----------
 
 
 Forward assertions are also just like other subpatterns, but starting with one
 Forward assertions are also just like other subpatterns, but starting with one
-of the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
-OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
-is OP_REVERSE, followed by a count of the number of characters to move back the
-pointer in the subject string. In ASCII or UTF-32 mode, the count is also the
-number of code units, but in UTF-8/16 mode each character may occupy more than
-one code unit. A separate count is present in each alternative of a lookbehind
-assertion, allowing them to have different (but fixed) lengths.
+of the opcodes OP_ASSERT, OP_ASSERT_NA (non-atomic assertion), or
+OP_ASSERT_NOT. Backward assertions use the opcodes OP_ASSERTBACK, 
+OP_ASSERTBACK_NA, and OP_ASSERTBACK_NOT, and the first opcode inside the
+assertion is OP_REVERSE, followed by a count of the number of characters to
+move back the pointer in the subject string. In ASCII or UTF-32 mode, the count
+is also the number of code units, but in UTF-8/16 mode each character may
+occupy more than one code unit. A separate count is present in each alternative
+of a lookbehind assertion, allowing each branch to have a different (but fixed)
+length.
 
 
 
 
 Conditional subpatterns
 Conditional subpatterns
@@ -745,11 +758,11 @@ tests the PCRE2 version number. This compiles into one of the opcodes OP_TRUE
 or OP_FALSE.
 or OP_FALSE.
 
 
 If a condition is not a back reference, recursion test, DEFINE, or VERSION, it
 If a condition is not a back reference, recursion test, DEFINE, or VERSION, it
-must start with a parenthesized assertion, whose opcode normally immediately
-follows OP_COND or OP_SCOND. However, if automatic callouts are enabled, a
-callout is inserted immediately before the assertion. It is also possible to
-insert a manual callout at this point. Only assertion conditions may have
-callouts preceding the condition.
+must start with a parenthesized atomic assertion, whose opcode normally
+immediately follows OP_COND or OP_SCOND. However, if automatic callouts are
+enabled, a callout is inserted immediately before the assertion. It is also
+possible to insert a manual callout at this point. Only assertion conditions
+may have callouts preceding the condition.
 
 
 A condition that is the negative assertion (?!) is optimized to OP_FAIL in all
 A condition that is the negative assertion (?!) is optimized to OP_FAIL in all
 parts of the pattern, so this is another opcode that may appear as a condition.
 parts of the pattern, so this is another opcode that may appear as a condition.
@@ -764,7 +777,7 @@ OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting
 bracket from the start of the whole pattern. OP_RECURSE is also used for
 bracket from the start of the whole pattern. OP_RECURSE is also used for
 "subroutine" calls, even though they are not strictly a recursion. Up till
 "subroutine" calls, even though they are not strictly a recursion. Up till
 release 10.30 recursions were treated as atomic groups, making them
 release 10.30 recursions were treated as atomic groups, making them
-incompatible with Perl (but PCRE had then well before Perl did). From 10.30,
+incompatible with Perl (but PCRE had them well before Perl did). From 10.30,
 backtracking into recursions is supported.
 backtracking into recursions is supported.
 
 
 Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only
 Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only
@@ -814,4 +827,4 @@ not a real opcode, but is used to check at compile time that tables indexed by
 opcode are the correct length, in order to catch updating errors.
 opcode are the correct length, in order to catch updating errors.
 
 
 Philip Hazel
 Philip Hazel
-21 April 2017
+12 July 2019

+ 3 - 3
regex.mod/pcre/INSTALL

@@ -1,8 +1,8 @@
 Installation Instructions
 Installation Instructions
 *************************
 *************************
 
 
-   Copyright (C) 1994-1996, 1999-2002, 2004-2016 Free Software
-Foundation, Inc.
+   Copyright (C) 1994-1996, 1999-2002, 2004-2017, 2020-2021 Free
+Software Foundation, Inc.
 
 
    Copying and distribution of this file, with or without modification,
    Copying and distribution of this file, with or without modification,
 are permitted in any medium without royalty provided the copyright
 are permitted in any medium without royalty provided the copyright
@@ -225,7 +225,7 @@ order to use an ANSI C compiler:
 
 
 and if that doesn't work, install pre-built binaries of GCC for HP-UX.
 and if that doesn't work, install pre-built binaries of GCC for HP-UX.
 
 
-   HP-UX 'make' updates targets which have the same time stamps as their
+   HP-UX 'make' updates targets which have the same timestamps as their
 prerequisites, which makes it generally unusable when shipped generated
 prerequisites, which makes it generally unusable when shipped generated
 files such as 'configure' are involved.  Use GNU 'make' instead.
 files such as 'configure' are involved.  Use GNU 'make' instead.
 
 

+ 13 - 13
regex.mod/pcre/LICENCE

@@ -4,11 +4,11 @@ PCRE2 LICENCE
 PCRE2 is a library of functions to support regular expressions whose syntax
 PCRE2 is a library of functions to support regular expressions whose syntax
 and semantics are as close as possible to those of the Perl 5 language.
 and semantics are as close as possible to those of the Perl 5 language.
 
 
-Release 10 of PCRE2 is distributed under the terms of the "BSD" licence, as
-specified below, with one exemption for certain binary redistributions. The
-documentation for PCRE2, supplied in the "doc" directory, is distributed under
-the same terms as the software itself. The data in the testdata directory is
-not copyrighted and is in the public domain.
+Releases 10.00 and above of PCRE2 are distributed under the terms of the "BSD"
+licence, as specified below, with one exemption for certain binary
+redistributions. The documentation for PCRE2, supplied in the "doc" directory,
+is distributed under the same terms as the software itself. The data in the
+testdata directory is not copyrighted and is in the public domain.
 
 
 The basic library functions are written in C and are freestanding. Also
 The basic library functions are written in C and are freestanding. Also
 included in the distribution is a just-in-time compiler that can be used to
 included in the distribution is a just-in-time compiler that can be used to
@@ -20,13 +20,13 @@ THE BASIC LIBRARY FUNCTIONS
 ---------------------------
 ---------------------------
 
 
 Written by:       Philip Hazel
 Written by:       Philip Hazel
-Email local part: ph10
-Email domain:     cam.ac.uk
+Email local part: Philip.Hazel
+Email domain:     gmail.com
 
 
-University of Cambridge Computing Service,
+Retired from University of Cambridge Computing Service,
 Cambridge, England.
 Cambridge, England.
 
 
-Copyright (c) 1997-2018 University of Cambridge
+Copyright (c) 1997-2021 University of Cambridge
 All rights reserved.
 All rights reserved.
 
 
 
 
@@ -35,9 +35,9 @@ PCRE2 JUST-IN-TIME COMPILATION SUPPORT
 
 
 Written by:       Zoltan Herczeg
 Written by:       Zoltan Herczeg
 Email local part: hzmester
 Email local part: hzmester
-Emain domain:     freemail.hu
+Email domain:     freemail.hu
 
 
-Copyright(c) 2010-2018 Zoltan Herczeg
+Copyright(c) 2010-2021 Zoltan Herczeg
 All rights reserved.
 All rights reserved.
 
 
 
 
@@ -46,9 +46,9 @@ STACK-LESS JUST-IN-TIME COMPILER
 
 
 Written by:       Zoltan Herczeg
 Written by:       Zoltan Herczeg
 Email local part: hzmester
 Email local part: hzmester
-Emain domain:     freemail.hu
+Email domain:     freemail.hu
 
 
-Copyright(c) 2009-2018 Zoltan Herczeg
+Copyright(c) 2009-2021 Zoltan Herczeg
 All rights reserved.
 All rights reserved.
 
 
 
 

+ 35 - 15
regex.mod/pcre/Makefile.am

@@ -46,6 +46,7 @@ dist_html_DATA = \
   doc/html/pcre2_general_context_free.html \
   doc/html/pcre2_general_context_free.html \
   doc/html/pcre2_get_error_message.html \
   doc/html/pcre2_get_error_message.html \
   doc/html/pcre2_get_mark.html \
   doc/html/pcre2_get_mark.html \
+  doc/html/pcre2_get_match_data_size.html \
   doc/html/pcre2_get_ovector_count.html \
   doc/html/pcre2_get_ovector_count.html \
   doc/html/pcre2_get_ovector_pointer.html \
   doc/html/pcre2_get_ovector_pointer.html \
   doc/html/pcre2_get_startchar.html \
   doc/html/pcre2_get_startchar.html \
@@ -56,6 +57,7 @@ dist_html_DATA = \
   doc/html/pcre2_jit_stack_create.html \
   doc/html/pcre2_jit_stack_create.html \
   doc/html/pcre2_jit_stack_free.html \
   doc/html/pcre2_jit_stack_free.html \
   doc/html/pcre2_maketables.html \
   doc/html/pcre2_maketables.html \
+  doc/html/pcre2_maketables_free.html \
   doc/html/pcre2_match.html \
   doc/html/pcre2_match.html \
   doc/html/pcre2_match_context_copy.html \
   doc/html/pcre2_match_context_copy.html \
   doc/html/pcre2_match_context_create.html \
   doc/html/pcre2_match_context_create.html \
@@ -85,6 +87,7 @@ dist_html_DATA = \
   doc/html/pcre2_set_parens_nest_limit.html \
   doc/html/pcre2_set_parens_nest_limit.html \
   doc/html/pcre2_set_recursion_limit.html \
   doc/html/pcre2_set_recursion_limit.html \
   doc/html/pcre2_set_recursion_memory_management.html \
   doc/html/pcre2_set_recursion_memory_management.html \
+  doc/html/pcre2_set_substitute_callout.html \
   doc/html/pcre2_substitute.html \
   doc/html/pcre2_substitute.html \
   doc/html/pcre2_substring_copy_byname.html \
   doc/html/pcre2_substring_copy_byname.html \
   doc/html/pcre2_substring_copy_bynumber.html \
   doc/html/pcre2_substring_copy_bynumber.html \
@@ -139,6 +142,7 @@ dist_man_MANS = \
   doc/pcre2_general_context_free.3 \
   doc/pcre2_general_context_free.3 \
   doc/pcre2_get_error_message.3 \
   doc/pcre2_get_error_message.3 \
   doc/pcre2_get_mark.3 \
   doc/pcre2_get_mark.3 \
+  doc/pcre2_get_match_data_size.3 \
   doc/pcre2_get_ovector_count.3 \
   doc/pcre2_get_ovector_count.3 \
   doc/pcre2_get_ovector_pointer.3 \
   doc/pcre2_get_ovector_pointer.3 \
   doc/pcre2_get_startchar.3 \
   doc/pcre2_get_startchar.3 \
@@ -149,6 +153,7 @@ dist_man_MANS = \
   doc/pcre2_jit_stack_create.3 \
   doc/pcre2_jit_stack_create.3 \
   doc/pcre2_jit_stack_free.3 \
   doc/pcre2_jit_stack_free.3 \
   doc/pcre2_maketables.3 \
   doc/pcre2_maketables.3 \
+  doc/pcre2_maketables_free.3 \
   doc/pcre2_match.3 \
   doc/pcre2_match.3 \
   doc/pcre2_match_context_copy.3 \
   doc/pcre2_match_context_copy.3 \
   doc/pcre2_match_context_create.3 \
   doc/pcre2_match_context_create.3 \
@@ -178,6 +183,7 @@ dist_man_MANS = \
   doc/pcre2_set_parens_nest_limit.3 \
   doc/pcre2_set_parens_nest_limit.3 \
   doc/pcre2_set_recursion_limit.3 \
   doc/pcre2_set_recursion_limit.3 \
   doc/pcre2_set_recursion_memory_management.3 \
   doc/pcre2_set_recursion_memory_management.3 \
+  doc/pcre2_set_substitute_callout.3 \
   doc/pcre2_substitute.3 \
   doc/pcre2_substitute.3 \
   doc/pcre2_substring_copy_byname.3 \
   doc/pcre2_substring_copy_byname.3 \
   doc/pcre2_substring_copy_bynumber.3 \
   doc/pcre2_substring_copy_bynumber.3 \
@@ -231,7 +237,7 @@ noinst_PROGRAMS =
 # and 'make maintainer-clean'.
 # and 'make maintainer-clean'.
 
 
 CLEANFILES =
 CLEANFILES =
-DISTCLEANFILES = src/config.h.in~ config.h
+DISTCLEANFILES = src/config.h.in~
 MAINTAINERCLEANFILES =
 MAINTAINERCLEANFILES =
 
 
 # Additional files to bundle with the distribution, over and above what
 # Additional files to bundle with the distribution, over and above what
@@ -319,18 +325,18 @@ include_HEADERS = src/pcre2posix.h
 bin_SCRIPTS = pcre2-config
 bin_SCRIPTS = pcre2-config
 
 
 ## ---------------------------------------------------------------
 ## ---------------------------------------------------------------
-## The dftables program is used to rebuild character tables before compiling
-## PCRE2, if --enable-rebuild-chartables is specified. It is not a user-visible
-## program. The default (when --enable-rebuild-chartables is not specified) is
-## to copy a distributed set of tables that are defined for ASCII code. In this
-## case, dftables is not needed.
+## The pcre2_dftables program is used to rebuild character tables before
+## compiling PCRE2, if --enable-rebuild-chartables is specified. It is not an
+## installed program. The default (when --enable-rebuild-chartables is not
+## specified) is to copy a distributed set of tables that are defined for ASCII
+## code. In this case, pcre2_dftables is not needed.
 
 
 if WITH_REBUILD_CHARTABLES
 if WITH_REBUILD_CHARTABLES
-noinst_PROGRAMS += dftables
-dftables_SOURCES = src/dftables.c
-src/pcre2_chartables.c: dftables$(EXEEXT)
+noinst_PROGRAMS += pcre2_dftables
+pcre2_dftables_SOURCES = src/pcre2_dftables.c
+src/pcre2_chartables.c: pcre2_dftables$(EXEEXT)
 	rm -f $@
 	rm -f $@
-	./dftables$(EXEEXT) $@
+	./pcre2_dftables$(EXEEXT) $@
 else
 else
 src/pcre2_chartables.c: $(srcdir)/src/pcre2_chartables.c.dist
 src/pcre2_chartables.c: $(srcdir)/src/pcre2_chartables.c.dist
 	rm -f $@
 	rm -f $@
@@ -356,12 +362,15 @@ COMMON_SOURCES = \
   src/pcre2_internal.h \
   src/pcre2_internal.h \
   src/pcre2_intmodedep.h \
   src/pcre2_intmodedep.h \
   src/pcre2_jit_compile.c \
   src/pcre2_jit_compile.c \
+  src/pcre2_jit_neon_inc.h \
+  src/pcre2_jit_simd_inc.h \
   src/pcre2_maketables.c \
   src/pcre2_maketables.c \
   src/pcre2_match.c \
   src/pcre2_match.c \
   src/pcre2_match_data.c \
   src/pcre2_match_data.c \
   src/pcre2_newline.c \
   src/pcre2_newline.c \
   src/pcre2_ord2utf.c \
   src/pcre2_ord2utf.c \
   src/pcre2_pattern_info.c \
   src/pcre2_pattern_info.c \
+  src/pcre2_script_run.c \
   src/pcre2_serialize.c \
   src/pcre2_serialize.c \
   src/pcre2_string_utils.c \
   src/pcre2_string_utils.c \
   src/pcre2_study.c \
   src/pcre2_study.c \
@@ -382,6 +391,7 @@ nodist_libpcre2_8_la_SOURCES = \
 libpcre2_8_la_CFLAGS = \
 libpcre2_8_la_CFLAGS = \
   -DPCRE2_CODE_UNIT_WIDTH=8 \
   -DPCRE2_CODE_UNIT_WIDTH=8 \
   $(VISIBILITY_CFLAGS) \
   $(VISIBILITY_CFLAGS) \
+  $(CET_CFLAGS) \
   $(AM_CFLAGS)
   $(AM_CFLAGS)
 libpcre2_8_la_LIBADD =
 libpcre2_8_la_LIBADD =
 endif # WITH_PCRE2_8
 endif # WITH_PCRE2_8
@@ -395,6 +405,7 @@ nodist_libpcre2_16_la_SOURCES = \
 libpcre2_16_la_CFLAGS = \
 libpcre2_16_la_CFLAGS = \
   -DPCRE2_CODE_UNIT_WIDTH=16 \
   -DPCRE2_CODE_UNIT_WIDTH=16 \
   $(VISIBILITY_CFLAGS) \
   $(VISIBILITY_CFLAGS) \
+  $(CET_CFLAGS) \
   $(AM_CFLAGS)
   $(AM_CFLAGS)
 libpcre2_16_la_LIBADD =
 libpcre2_16_la_LIBADD =
 endif # WITH_PCRE2_16
 endif # WITH_PCRE2_16
@@ -408,6 +419,7 @@ nodist_libpcre2_32_la_SOURCES = \
 libpcre2_32_la_CFLAGS = \
 libpcre2_32_la_CFLAGS = \
   -DPCRE2_CODE_UNIT_WIDTH=32 \
   -DPCRE2_CODE_UNIT_WIDTH=32 \
   $(VISIBILITY_CFLAGS) \
   $(VISIBILITY_CFLAGS) \
+  $(CET_CFLAGS) \
   $(AM_CFLAGS)
   $(AM_CFLAGS)
 libpcre2_32_la_LIBADD =
 libpcre2_32_la_LIBADD =
 endif # WITH_PCRE2_32
 endif # WITH_PCRE2_32
@@ -436,15 +448,15 @@ EXTRA_DIST += \
   src/sljit/sljitNativePPC_32.c \
   src/sljit/sljitNativePPC_32.c \
   src/sljit/sljitNativePPC_64.c \
   src/sljit/sljitNativePPC_64.c \
   src/sljit/sljitNativePPC_common.c \
   src/sljit/sljitNativePPC_common.c \
+  src/sljit/sljitNativeS390X.c \
   src/sljit/sljitNativeSPARC_32.c \
   src/sljit/sljitNativeSPARC_32.c \
   src/sljit/sljitNativeSPARC_common.c \
   src/sljit/sljitNativeSPARC_common.c \
-  src/sljit/sljitNativeTILEGX-encoder.c \
-  src/sljit/sljitNativeTILEGX_64.c \
   src/sljit/sljitNativeX86_32.c \
   src/sljit/sljitNativeX86_32.c \
   src/sljit/sljitNativeX86_64.c \
   src/sljit/sljitNativeX86_64.c \
   src/sljit/sljitNativeX86_common.c \
   src/sljit/sljitNativeX86_common.c \
   src/sljit/sljitProtExecAllocator.c \
   src/sljit/sljitProtExecAllocator.c \
-  src/sljit/sljitUtils.c
+  src/sljit/sljitUtils.c \
+  src/sljit/sljitWXExecAllocator.c
 
 
 # Some of the JIT sources are also in separate files that are #included.
 # Some of the JIT sources are also in separate files that are #included.
 
 
@@ -528,6 +540,10 @@ noinst_PROGRAMS += pcre2fuzzcheck
 pcre2fuzzcheck_SOURCES = src/pcre2_fuzzsupport.c
 pcre2fuzzcheck_SOURCES = src/pcre2_fuzzsupport.c
 pcre2fuzzcheck_CFLAGS = -DSTANDALONE $(AM_CFLAGS)
 pcre2fuzzcheck_CFLAGS = -DSTANDALONE $(AM_CFLAGS)
 pcre2fuzzcheck_LDADD = libpcre2-8.la
 pcre2fuzzcheck_LDADD = libpcre2-8.la
+if WITH_GCOV
+pcre2fuzzcheck_CFLAGS += $(GCOV_CFLAGS)
+pcre2fuzzcheck_LDADD += $(GCOV_LIBS)
+endif # WITH_GCOV
 endif # WITH FUZZ_SUPPORT
 endif # WITH FUZZ_SUPPORT
 endif # WITH_PCRE2_8
 endif # WITH_PCRE2_8
 
 
@@ -618,8 +634,10 @@ EXTRA_DIST += \
   testdata/grepoutput \
   testdata/grepoutput \
   testdata/grepoutput8 \
   testdata/grepoutput8 \
   testdata/grepoutputC \
   testdata/grepoutputC \
+  testdata/grepoutputCN \
   testdata/grepoutputN \
   testdata/grepoutputN \
   testdata/greppatN4 \
   testdata/greppatN4 \
+  testdata/testbtables \
   testdata/testinput1 \
   testdata/testinput1 \
   testdata/testinput2 \
   testdata/testinput2 \
   testdata/testinput3 \
   testdata/testinput3 \
@@ -657,7 +675,7 @@ EXTRA_DIST += \
   testdata/testoutput7 \
   testdata/testoutput7 \
   testdata/testoutput8-16-2 \
   testdata/testoutput8-16-2 \
   testdata/testoutput8-16-3 \
   testdata/testoutput8-16-3 \
-  testdata/testoutput8-16-3 \
+  testdata/testoutput8-16-4 \
   testdata/testoutput8-32-2 \
   testdata/testoutput8-32-2 \
   testdata/testoutput8-32-3 \
   testdata/testoutput8-32-3 \
   testdata/testoutput8-32-4 \
   testdata/testoutput8-32-4 \
@@ -841,9 +859,11 @@ endif # WITH_GCOV
 
 
 EXTRA_DIST += \
 EXTRA_DIST += \
   cmake/COPYING-CMAKE-SCRIPTS \
   cmake/COPYING-CMAKE-SCRIPTS \
+  cmake/FindEditline.cmake \
   cmake/FindPackageHandleStandardArgs.cmake \
   cmake/FindPackageHandleStandardArgs.cmake \
   cmake/FindReadline.cmake \
   cmake/FindReadline.cmake \
-  cmake/FindEditline.cmake \
+  cmake/pcre2-config-version.cmake.in \
+  cmake/pcre2-config.cmake.in \
   CMakeLists.txt \
   CMakeLists.txt \
   config-cmake.h.in
   config-cmake.h.in
 
 

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 421 - 284
regex.mod/pcre/Makefile.in


+ 162 - 2
regex.mod/pcre/NEWS

@@ -1,6 +1,166 @@
 News about PCRE2 releases
 News about PCRE2 releases
 -------------------------
 -------------------------
 
 
+
+Version 10.39 29-October-2021
+-----------------------------
+
+This release is happening soon after 10.38 because the bug fix is important.
+
+1. Fix incorrect detection of alternatives in first character search in JIT.
+
+2. Update to Unicode 14.0.0.
+
+3. Some code cleanups (see ChangeLog).
+
+
+Version 10.38 01-October-2021
+-----------------------------
+
+As well as some bug fixes and tidies (as always, see ChangeLog for details),
+the documentation is updated to list the new URLs, following the move of the
+source repository to GitHub and the mailing list to Google Groups.
+
+* The CMake build system can now build both static and shared libraries in one
+go.
+
+* Following Perl's lead, \K is now locked out in lookaround assertions by
+default, but an option is provided to re-enable the previous behaviour.
+
+
+Version 10.37 26-May-2021
+-------------------------
+
+A few more bug fixes and tidies. The only change of real note is the removal of
+the actual POSIX names regcomp etc. from the POSIX wrapper library because
+these have caused issues for some applications (see 10.33 #2 below).
+
+
+Version 10.36 04-December-2020
+------------------------------
+
+Again, mainly bug fixes and tidies. The only enhancements are the addition of
+GNU grep's -m (aka --max-count) option to pcre2grep, and also unifying the
+handling of substitution strings for both -O and callouts in pcre2grep, with
+the addition of $x{...} and $o{...} to allow for characters whose code points
+are greater than 255 in Unicode mode.
+
+NOTE: there is an outstanding issue with JIT support for MacOS on arm64
+hardware. For details, please see Bugzilla issue #2618.
+
+
+Version 10.35 15-April-2020
+---------------------------
+
+Bugfixes, tidies, and a few new enhancements.
+
+1. Capturing groups that contain recursive backreferences to themselves are no
+longer automatically atomic, because the restriction is no longer necessary
+as a result of the 10.30 restructuring.
+
+2. Several new options for pcre2_substitute().
+
+3. When Unicode is supported and PCRE2_UCP is set without PCRE2_UTF, Unicode
+character properties are used for upper/lower case computations on characters
+whose code points are greater than 127.
+
+4. The character tables (for low-valued characters) can now more easily be
+saved and restored in binary.
+
+5. Updated to Unicode 13.0.0.
+
+
+Version 10.34 21-November-2019
+------------------------------
+
+Another release with a few enhancements as well as bugfixes and tidies. The
+main new features are:
+
+1. There is now some support for matching in invalid UTF strings.
+
+2. Non-atomic positive lookarounds are implemented in the pcre2_match()
+interpreter, but not in JIT.
+
+3. Added two new functions: pcre2_get_match_data_size() and
+pcre2_maketables_free().
+
+4. Upgraded to Unicode 12.1.0.
+
+
+Version 10.33 16-April-2019
+---------------------------
+
+Yet more bugfixes, tidies, and a few enhancements, summarized here (see
+ChangeLog for the full list):
+
+1. Callouts from pcre2_substitute() are now available.
+
+2. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper
+functions that use the standard POSIX names. However, in pcre2posix.h the POSIX
+names are defined as macros. This should help avoid linking with the wrong
+library in some environments, while still exporting the POSIX names for
+pre-existing programs that use them.
+
+3. Some new options:
+
+   (a) PCRE2_EXTRA_ESCAPED_CR_IS_LF makes \r behave as \n.
+
+   (b) PCRE2_EXTRA_ALT_BSUX enables support for ECMAScript 6's \u{hh...}
+       construct.
+
+   (c) PCRE2_COPY_MATCHED_SUBJECT causes a copy of a matched subject to be
+       made, instead of just remembering a pointer.
+
+4. Some new Perl features:
+
+   (a) Perl 5.28's experimental alphabetic names for atomic groups and
+       lookaround assertions, for example, (*pla:...) and (*atomic:...).
+
+   (b) The new Perl "script run" features (*script_run:...) and
+       (*atomic_script_run:...) aka (*sr:...) and (*asr:...).
+
+   (c) When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in
+       capture group names.
+
+5. --disable-percent-zt disables the use of %zu and %td in formatting strings
+in pcre2test. They were already automatically disabled for VC and older C
+compilers.
+
+6. Some changes related to callouts in pcre2grep:
+
+   (a) Support for running an external program under VMS has been added, in
+       addition to Windows and fork() support.
+
+   (b) --disable-pcre2grep-callout-fork restricts the callout support in
+       to the inbuilt echo facility.
+
+
+Version 10.32 10-September-2018
+-------------------------------
+
+This is another mainly bugfix and tidying release with a few minor
+enhancements. These are the main ones:
+
+1. pcre2grep now supports the inclusion of binary zeros in patterns that are
+read from files via the -f option.
+
+2. ./configure now supports --enable-jit=auto, which automatically enables JIT
+if the hardware supports it.
+
+3. In pcre2_dfa_match(), internal recursive calls no longer use the stack for
+local workspace and local ovectors. Instead, an initial block of stack is
+reserved, but if this is insufficient, heap memory is used. The heap limit
+parameter now applies to pcre2_dfa_match().
+
+4. Updated to Unicode version 11.0.0.
+
+5. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
+
+6. Added support for \N{U+dddd}, but only in Unicode mode.
+
+7. Added support for (?^) to unset all imnsx options.
+
+
 Version 10.31 12-February-2018
 Version 10.31 12-February-2018
 ------------------------------
 ------------------------------
 
 
@@ -31,7 +191,7 @@ remembering backtracking positions. This makes --disable-stack-for-recursion a
 NOOP. The new implementation allows backtracking into recursive group calls in
 NOOP. The new implementation allows backtracking into recursive group calls in
 patterns, making it more compatible with Perl, and also fixes some other
 patterns, making it more compatible with Perl, and also fixes some other
 previously hard-to-do issues. For patterns that have a lot of backtracking, the
 previously hard-to-do issues. For patterns that have a lot of backtracking, the
-heap is now used, and there is explicit limit on the amount, settable by
+heap is now used, and there is an explicit limit on the amount, settable by
 pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
 pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
 but is renamed as "depth limit" (though the old names remain for
 but is renamed as "depth limit" (though the old names remain for
 compatibility).
 compatibility).
@@ -53,7 +213,7 @@ also supported.
 
 
 5. Additional compile options in the compile context are now available, and the
 5. Additional compile options in the compile context are now available, and the
 first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
 first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
-PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL.
+PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
 
 
 6. The newline type PCRE2_NEWLINE_NUL is now available.
 6. The newline type PCRE2_NEWLINE_NUL is now available.
 
 

+ 46 - 23
regex.mod/pcre/NON-AUTOTOOLS-BUILD

@@ -10,6 +10,7 @@ This document contains the following sections:
   Calling conventions in Windows environments
   Calling conventions in Windows environments
   Comments about Win32 builds
   Comments about Win32 builds
   Building PCRE2 on Windows with CMake
   Building PCRE2 on Windows with CMake
+  Building PCRE2 on Windows with Visual Studio
   Testing with RunTest.bat
   Testing with RunTest.bat
   Building PCRE2 on native z/OS and z/VM
   Building PCRE2 on native z/OS and z/VM
 
 
@@ -39,7 +40,11 @@ GENERIC INSTRUCTIONS FOR THE PCRE2 C LIBRARY
 
 
 The following are generic instructions for building the PCRE2 C library "by
 The following are generic instructions for building the PCRE2 C library "by
 hand". If you are going to use CMake, this section does not apply to you; you
 hand". If you are going to use CMake, this section does not apply to you; you
-can skip ahead to the CMake section.
+can skip ahead to the CMake section. Note that the settings concerned with
+8-bit, 16-bit, and 32-bit code units relate to the type of data string that
+PCRE2 processes. They are NOT referring to the underlying operating system bit
+width. You do not have to do anything special to compile in a 64-bit
+environment, for example.
 
 
  (1) Copy or rename the file src/config.h.generic as src/config.h, and edit the
  (1) Copy or rename the file src/config.h.generic as src/config.h, and edit the
      macro settings that it contains to whatever is appropriate for your
      macro settings that it contains to whatever is appropriate for your
@@ -47,7 +52,7 @@ can skip ahead to the CMake section.
      macro to specify what character(s) you want to be interpreted as line
      macro to specify what character(s) you want to be interpreted as line
      terminators by default.
      terminators by default.
 
 
-     When you compile any of the PCRE2 modules, you must specify
+     When you subsequently compile any of the PCRE2 modules, you must specify
      -DHAVE_CONFIG_H to your compiler so that src/config.h is included in the
      -DHAVE_CONFIG_H to your compiler so that src/config.h is included in the
      sources.
      sources.
 
 
@@ -61,6 +66,11 @@ can skip ahead to the CMake section.
      new release, you are strongly advised to review src/config.h.generic
      new release, you are strongly advised to review src/config.h.generic
      before re-using what you had previously.
      before re-using what you had previously.
 
 
+     Note also that the src/config.h.generic file is created from a config.h
+     that was generated by Autotools, which automatically includes settings of
+     a number of macros that are not actually used by PCRE2 (for example,
+     HAVE_MEMORY_H).
+
  (2) Copy or rename the file src/pcre2.h.generic as src/pcre2.h.
  (2) Copy or rename the file src/pcre2.h.generic as src/pcre2.h.
 
 
  (3) EITHER:
  (3) EITHER:
@@ -68,23 +78,23 @@ can skip ahead to the CMake section.
        src/pcre2_chartables.c.
        src/pcre2_chartables.c.
 
 
      OR:
      OR:
-       Compile src/dftables.c as a stand-alone program (using -DHAVE_CONFIG_H
-       if you have set up src/config.h), and then run it with the single
-       argument "src/pcre2_chartables.c". This generates a set of standard
-       character tables and writes them to that file. The tables are generated
-       using the default C locale for your system. If you want to use a locale
-       that is specified by LC_xxx environment variables, add the -L option to
-       the dftables command. You must use this method if you are building on a
-       system that uses EBCDIC code.
+       Compile src/pcre2_dftables.c as a stand-alone program (using
+       -DHAVE_CONFIG_H if you have set up src/config.h), and then run it with
+       the single argument "src/pcre2_chartables.c". This generates a set of
+       standard character tables and writes them to that file. The tables are
+       generated using the default C locale for your system. If you want to use
+       a locale that is specified by LC_xxx environment variables, add the -L
+       option to the pcre2_dftables command. You must use this method if you
+       are building on a system that uses EBCDIC code.
 
 
      The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
      The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
      specify alternative tables at run time.
      specify alternative tables at run time.
 
 
- (4) For an 8-bit library, compile the following source files from the src
-     directory, setting -DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also
-     set -DHAVE_CONFIG_H if you have set up src/config.h with your
-     configuration, or else use other -D settings to change the configuration
-     as required.
+ (4) For a library that supports 8-bit code units in the character strings that
+     it processes, compile the following source files from the src directory,
+     setting -DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also set
+     -DHAVE_CONFIG_H if you have set up src/config.h with your configuration,
+     or else use other -D settings to change the configuration as required.
 
 
        pcre2_auto_possess.c
        pcre2_auto_possess.c
        pcre2_chartables.c
        pcre2_chartables.c
@@ -103,6 +113,7 @@ can skip ahead to the CMake section.
        pcre2_newline.c
        pcre2_newline.c
        pcre2_ord2utf.c
        pcre2_ord2utf.c
        pcre2_pattern_info.c
        pcre2_pattern_info.c
+       pcre2_script_run.c
        pcre2_serialize.c
        pcre2_serialize.c
        pcre2_string_utils.c
        pcre2_string_utils.c
        pcre2_study.c
        pcre2_study.c
@@ -126,7 +137,7 @@ can skip ahead to the CMake section.
      src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
      src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
      these yourself.
      these yourself.
 
 
-     Not also that the pcre2_fuzzsupport.c file contains special code that is
+     Note also that the pcre2_fuzzsupport.c file contains special code that is
      useful to those who want to run fuzzing tests on the PCRE2 library. Unless
      useful to those who want to run fuzzing tests on the PCRE2 library. Unless
      you are doing that, you can ignore it.
      you are doing that, you can ignore it.
 
 
@@ -135,9 +146,9 @@ can skip ahead to the CMake section.
      If your system has static and shared libraries, you may have to do this
      If your system has static and shared libraries, you may have to do this
      once for each type.
      once for each type.
 
 
- (6) If you want to build a 16-bit library or 32-bit library (as well as, or
-     instead of the 8-bit library) just supply 16 or 32 as the value of
-     -DPCRE2_CODE_UNIT_WIDTH when you are compiling.
+ (6) If you want to build a library that supports 16-bit or 32-bit code units,
+     (as well as, or instead of the 8-bit library) just supply 16 or 32 as the
+     value of -DPCRE2_CODE_UNIT_WIDTH when you are compiling.
 
 
  (7) If you want to build the POSIX wrapper functions (which apply only to the
  (7) If you want to build the POSIX wrapper functions (which apply only to the
      8-bit library), ensure that you have the src/pcre2posix.h file and then
      8-bit library), ensure that you have the src/pcre2posix.h file and then
@@ -185,7 +196,7 @@ can skip ahead to the CMake section.
 
 
 STACK SIZE IN WINDOWS ENVIRONMENTS
 STACK SIZE IN WINDOWS ENVIRONMENTS
 
 
-Prior to release 10.30 the default system stack size of 1Mb in some Windows
+Prior to release 10.30 the default system stack size of 1MiB in some Windows
 environments caused issues with some tests. This should no longer be the case
 environments caused issues with some tests. This should no longer be the case
 for 10.30 and later releases.
 for 10.30 and later releases.
 
 
@@ -330,6 +341,18 @@ cache can be deleted by selecting "File > Delete Cache".
     available for review in Testing\Temporary under your build dir.
     available for review in Testing\Temporary under your build dir.
 
 
 
 
+BUILDING PCRE2 ON WINDOWS WITH VISUAL STUDIO
+
+The code currently cannot be compiled without an inttypes.h header, which is
+available only with Visual Studio 2013 or newer. However, this portable and
+permissively-licensed implementation of the stdint.h header could be used as an
+alternative:
+
+  http://www.azillionmonkeys.com/qed/pstdint.h
+
+Just rename it and drop it into the top level of the build tree.
+
+
 TESTING WITH RUNTEST.BAT
 TESTING WITH RUNTEST.BAT
 
 
 If configured with CMake, building the test project ("make test" or building
 If configured with CMake, building the test project ("make test" or building
@@ -382,6 +405,6 @@ Everything in that location, source and executable, is in EBCDIC and native
 z/OS file formats. The port provides an API for LE languages such as COBOL and
 z/OS file formats. The port provides an API for LE languages such as COBOL and
 for the z/OS and z/VM versions of the Rexx languages.
 for the z/OS and z/VM versions of the Rexx languages.
 
 
-===============================
-Last Updated: 13 September 2017
-===============================
+===========================
+Last Updated: 28 April 2021
+===========================

+ 1 - 1
regex.mod/pcre/PrepareRelease

@@ -190,7 +190,7 @@ files="\
   libpcre2-16.pc.in \
   libpcre2-16.pc.in \
   libpcre2-32.pc.in \
   libpcre2-32.pc.in \
   libpcre2-posix.pc.in \
   libpcre2-posix.pc.in \
-  src/dftables.c \
+  src/pcre2_dftables.c \
   src/pcre2.h.in \
   src/pcre2.h.in \
   src/pcre2_auto_possess.c \
   src/pcre2_auto_possess.c \
   src/pcre2_compile.c \
   src/pcre2_compile.c \

+ 107 - 85
regex.mod/pcre/README

@@ -1,19 +1,23 @@
 README file for PCRE2 (Perl-compatible regular expression library)
 README file for PCRE2 (Perl-compatible regular expression library)
 ------------------------------------------------------------------
 ------------------------------------------------------------------
 
 
-PCRE2 is a re-working of the original PCRE library to provide an entirely new
-API. The latest release of PCRE2 is always available in three alternative
-formats from:
+PCRE2 is a re-working of the original PCRE1 library to provide an entirely new
+API. Since its initial release in 2015, there has been further development of
+the code and it now differs from PCRE1 in more than just the API. There are new
+features, and the internals have been improved. The original PCRE1 library is
+now obsolete and no longer maintained. The latest release of PCRE2 is available
+in .tar.gz, tar.bz2, or .zip form from this GitHub repository:
 
 
-  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz
-  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2
-  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip
+https://github.com/PhilipHazel/pcre2/releases
 
 
-There is a mailing list for discussion about the development of PCRE (both the
-original and new APIs) at [email protected]. You can access the archives and
-subscribe or manage your subscription here:
+There is a mailing list for discussion about the development of PCRE2 at
+[email protected]. You can subscribe by sending an email to
+[email protected].
 
 
-   https://lists.exim.org/mailman/listinfo/pcre-dev
+You can access the archives and also subscribe or manage your subscription
+here:
+
+https://groups.google.com/pcre2-dev
 
 
 Please read the NEWS file if you are upgrading from a previous release. The
 Please read the NEWS file if you are upgrading from a previous release. The
 contents of this README file are:
 contents of this README file are:
@@ -39,13 +43,13 @@ The PCRE2 APIs
 PCRE2 is written in C, and it has its own API. There are three sets of
 PCRE2 is written in C, and it has its own API. There are three sets of
 functions, one for the 8-bit library, which processes strings of bytes, one for
 functions, one for the 8-bit library, which processes strings of bytes, one for
 the 16-bit library, which processes strings of 16-bit values, and one for the
 the 16-bit library, which processes strings of 16-bit values, and one for the
-32-bit library, which processes strings of 32-bit values. There are no C++
-wrappers.
+32-bit library, which processes strings of 32-bit values. Unlike PCRE1, there
+are no C++ wrappers.
 
 
 The distribution does contain a set of C wrapper functions for the 8-bit
 The distribution does contain a set of C wrapper functions for the 8-bit
 library that are based on the POSIX regular expression API (see the pcre2posix
 library that are based on the POSIX regular expression API (see the pcre2posix
-man page). These can be found in a library called libpcre2-posix. Note that
-this just provides a POSIX calling interface to PCRE2; the regular expressions
+man page). These are built into a library called libpcre2-posix. Note that this
+just provides a POSIX calling interface to PCRE2; the regular expressions
 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
 and does not give full access to all of PCRE2's facilities.
 and does not give full access to all of PCRE2's facilities.
 
 
@@ -53,20 +57,8 @@ The header file for the POSIX-style functions is called pcre2posix.h. The
 official POSIX name is regex.h, but I did not want to risk possible problems
 official POSIX name is regex.h, but I did not want to risk possible problems
 with existing files of that name by distributing it that way. To use PCRE2 with
 with existing files of that name by distributing it that way. To use PCRE2 with
 an existing program that uses the POSIX API, pcre2posix.h will have to be
 an existing program that uses the POSIX API, pcre2posix.h will have to be
-renamed or pointed at by a link.
-
-If you are using the POSIX interface to PCRE2 and there is already a POSIX
-regex library installed on your system, as well as worrying about the regex.h
-header file (as mentioned above), you must also take care when linking programs
-to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they
-may pick up the POSIX functions of the same name from the other library.
-
-One way of avoiding this confusion is to compile PCRE2 with the addition of
--Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
-compiler flags (CFLAGS if you are using "configure" -- see below). This has the
-effect of renaming the functions so that the names no longer clash. Of course,
-you have to do the same thing for your applications, or write them using the
-new names.
+renamed or pointed at by a link (or the program modified, of course). See the
+pcre2posix documentation for more details.
 
 
 
 
 Documentation for PCRE2
 Documentation for PCRE2
@@ -171,10 +163,14 @@ library. They are also documented in the pcre2build man page.
   give large performance improvements on certain platforms, add --enable-jit to
   give large performance improvements on certain platforms, add --enable-jit to
   the "configure" command. This support is available only for certain hardware
   the "configure" command. This support is available only for certain hardware
   architectures. If you try to enable it on an unsupported architecture, there
   architectures. If you try to enable it on an unsupported architecture, there
-  will be a compile time error. If you are running under SELinux you may also
-  want to add --enable-jit-sealloc, which enables the use of an execmem
-  allocator in JIT that is compatible with SELinux. This has no effect if JIT
-  is not enabled.
+  will be a compile time error. If in doubt, use --enable-jit=auto, which
+  enables JIT only if the current hardware is supported.
+
+. If you are enabling JIT under SELinux environment you may also want to add
+  --enable-jit-sealloc, which enables the use of an executable memory allocator
+  that is compatible with SELinux. Warning: this allocator is experimental!
+  It does not support fork() operation and may crash when no disk space is
+  available. This option has no effect if JIT is disabled.
 
 
 . If you do not want to make use of the default support for UTF-8 Unicode
 . If you do not want to make use of the default support for UTF-8 Unicode
   character strings in the 8-bit library, UTF-16 Unicode character strings in
   character strings in the 8-bit library, UTF-16 Unicode character strings in
@@ -239,9 +235,11 @@ library. They are also documented in the pcre2build man page.
   discussion in the pcre2api man page (search for pcre2_set_match_limit).
   discussion in the pcre2api man page (search for pcre2_set_match_limit).
 
 
 . There is a separate counter that limits the depth of nested backtracking
 . There is a separate counter that limits the depth of nested backtracking
-  during a matching process, which indirectly limits the amount of heap memory
-  that is used. This also has a default of ten million, which is essentially
-  "unlimited". You can change the default by setting, for example,
+  (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
+  matching process, which indirectly limits the amount of heap memory that is
+  used, and in the case of pcre2_dfa_match() the amount of stack as well. This
+  counter also has a default of ten million, which is essentially "unlimited".
+  You can change the default by setting, for example,
 
 
   --with-match-limit-depth=5000
   --with-match-limit-depth=5000
 
 
@@ -249,16 +247,17 @@ library. They are also documented in the pcre2build man page.
   pcre2_set_depth_limit).
   pcre2_set_depth_limit).
 
 
 . You can also set an explicit limit on the amount of heap memory used by
 . You can also set an explicit limit on the amount of heap memory used by
-  the pcre2_match() interpreter:
+  the pcre2_match() and pcre2_dfa_match() interpreters:
 
 
   --with-heap-limit=500
   --with-heap-limit=500
 
 
-  The units are kilobytes. This limit does not apply when the JIT optimization
-  (which has its own memory control features) is used. There is more discussion
-  on the pcre2api man page (search for pcre2_set_heap_limit).
+  The units are kibibytes (units of 1024 bytes). This limit does not apply when
+  the JIT optimization (which has its own memory control features) is used.
+  There is more discussion on the pcre2api man page (search for
+  pcre2_set_heap_limit).
 
 
 . In the 8-bit library, the default maximum compiled pattern size is around
 . In the 8-bit library, the default maximum compiled pattern size is around
-  64K bytes. You can increase this by adding --with-link-size=3 to the
+  64 kibibytes. You can increase this by adding --with-link-size=3 to the
   "configure" command. PCRE2 then uses three bytes instead of two for offsets
   "configure" command. PCRE2 then uses three bytes instead of two for offsets
   to different parts of the compiled pattern. In the 16-bit library,
   to different parts of the compiled pattern. In the 16-bit library,
   --with-link-size=3 is the same as --with-link-size=4, which (in both
   --with-link-size=3 is the same as --with-link-size=4, which (in both
@@ -272,9 +271,9 @@ library. They are also documented in the pcre2build man page.
 
 
   --enable-rebuild-chartables
   --enable-rebuild-chartables
 
 
-  a program called dftables is compiled and run in the default C locale when
-  you obey "make". It builds a source file called pcre2_chartables.c. If you do
-  not specify this option, pcre2_chartables.c is created as a copy of
+  a program called pcre2_dftables is compiled and run in the default C locale
+  when you obey "make". It builds a source file called pcre2_chartables.c. If
+  you do not specify this option, pcre2_chartables.c is created as a copy of
   pcre2_chartables.c.dist. See "Character tables" below for further
   pcre2_chartables.c.dist. See "Character tables" below for further
   information.
   information.
 
 
@@ -300,8 +299,8 @@ library. They are also documented in the pcre2build man page.
   unaddressable. This allows it to detect invalid memory accesses, and is
   unaddressable. This allows it to detect invalid memory accesses, and is
   mostly useful for debugging PCRE2 itself.
   mostly useful for debugging PCRE2 itself.
 
 
-. In environments where the gcc compiler is used and lcov version 1.6 or above
-  is installed, if you specify
+. In environments where the gcc compiler is used and lcov is installed, if you
+  specify
 
 
   --enable-coverage
   --enable-coverage
 
 
@@ -315,10 +314,14 @@ library. They are also documented in the pcre2build man page.
 . When JIT support is enabled, pcre2grep automatically makes use of it, unless
 . When JIT support is enabled, pcre2grep automatically makes use of it, unless
   you add --disable-pcre2grep-jit to the "configure" command.
   you add --disable-pcre2grep-jit to the "configure" command.
 
 
-. On non-Windows sytems there is support for calling external scripts during
-  matching in the pcre2grep command via PCRE2's callout facility with string
-  arguments. This support can be disabled by adding --disable-pcre2grep-callout
-  to the "configure" command.
+. There is support for calling external programs during matching in the
+  pcre2grep command, using PCRE2's callout facility with string arguments. This
+  support can be disabled by adding --disable-pcre2grep-callout to the
+  "configure" command. There are two kinds of callout: one that generates
+  output from inbuilt code, and another that calls an external program. The
+  latter has special support for Windows and VMS; otherwise it assumes the
+  existence of the fork() function. This facility can be disabled by adding
+  --disable-pcre2grep-callout-fork to the "configure" command.
 
 
 . The pcre2grep program currently supports only 8-bit data files, and so
 . The pcre2grep program currently supports only 8-bit data files, and so
   requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
   requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
@@ -371,6 +374,15 @@ library. They are also documented in the pcre2build man page.
   tgetflag, or tgoto, this is the problem, and linking with the ncurses library
   tgetflag, or tgoto, this is the problem, and linking with the ncurses library
   should fix it.
   should fix it.
 
 
+. The C99 standard defines formatting modifiers z and t for size_t and
+  ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in
+  environments other than Microsoft Visual Studio versions earlier than 2013
+  when __STDC_VERSION__ is defined and has a value greater than or equal to
+  199901L (indicating C99). However, there is at least one environment that
+  claims to be C99 but does not support these modifiers. If
+  --disable-percent-zt is specified, no use is made of the z or t modifiers.
+  Instead of %td or %zu, %lu is used, with a cast for size_t values.
+
 . There is a special option called --enable-fuzz-support for use by people who
 . There is a special option called --enable-fuzz-support for use by people who
   want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
   want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
   library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
   library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
@@ -538,11 +550,11 @@ Cross-compiling using autotools
 
 
 You can specify CC and CFLAGS in the normal way to the "configure" command, in
 You can specify CC and CFLAGS in the normal way to the "configure" command, in
 order to cross-compile PCRE2 for some other host. However, you should NOT
 order to cross-compile PCRE2 for some other host. However, you should NOT
-specify --enable-rebuild-chartables, because if you do, the dftables.c source
-file is compiled and run on the local host, in order to generate the inbuilt
-character tables (the pcre2_chartables.c file). This will probably not work,
-because dftables.c needs to be compiled with the local compiler, not the cross
-compiler.
+specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c
+source file is compiled and run on the local host, in order to generate the
+inbuilt character tables (the pcre2_chartables.c file). This will probably not
+work, because pcre2_dftables.c needs to be compiled with the local compiler,
+not the cross compiler.
 
 
 When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
 When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
 created by making a copy of pcre2_chartables.c.dist, which is a default set of
 created by making a copy of pcre2_chartables.c.dist, which is a default set of
@@ -550,17 +562,18 @@ tables that assumes ASCII code. Cross-compiling with the default tables should
 not be a problem.
 not be a problem.
 
 
 If you need to modify the character tables when cross-compiling, you should
 If you need to modify the character tables when cross-compiling, you should
-move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
-and run it on the local host to make a new version of pcre2_chartables.c.dist.
-Then when you cross-compile PCRE2 this new version of the tables will be used.
+move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
+hand and run it on the local host to make a new version of
+pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
+at build time" for more details.
 
 
 
 
 Making new tarballs
 Making new tarballs
 -------------------
 -------------------
 
 
-The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
-zip formats. The command "make distcheck" does the same, but then does a trial
-build of the new distribution to ensure that it works.
+The command "make dist" creates two PCRE2 tarballs, in tar.gz and zip formats.
+The command "make distcheck" does the same, but then does a trial build of the
+new distribution to ensure that it works.
 
 
 If you have modified any of the man page sources in the doc directory, you
 If you have modified any of the man page sources in the doc directory, you
 should first run the PrepareRelease script before making a distribution. This
 should first run the PrepareRelease script before making a distribution. This
@@ -711,8 +724,8 @@ compile context.
 The source file called pcre2_chartables.c contains the default set of tables.
 The source file called pcre2_chartables.c contains the default set of tables.
 By default, this is created as a copy of pcre2_chartables.c.dist, which
 By default, this is created as a copy of pcre2_chartables.c.dist, which
 contains tables for ASCII coding. However, if --enable-rebuild-chartables is
 contains tables for ASCII coding. However, if --enable-rebuild-chartables is
-specified for ./configure, a different version of pcre2_chartables.c is built
-by the program dftables (compiled from dftables.c), which uses the ANSI C
+specified for ./configure, a new version of pcre2_chartables.c is built by the
+program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C
 character handling functions such as isalnum(), isalpha(), isupper(),
 character handling functions such as isalnum(), isalpha(), isupper(),
 islower(), etc. to build the table sources. This means that the default C
 islower(), etc. to build the table sources. This means that the default C
 locale that is set for your system will control the contents of these default
 locale that is set for your system will control the contents of these default
@@ -722,32 +735,40 @@ file does not get automatically re-generated. The best way to do this is to
 move pcre2_chartables.c.dist out of the way and replace it with your customized
 move pcre2_chartables.c.dist out of the way and replace it with your customized
 tables.
 tables.
 
 
-When the dftables program is run as a result of --enable-rebuild-chartables,
-it uses the default C locale that is set on your system. It does not pay
-attention to the LC_xxx environment variables. In other words, it uses the
-system's default locale rather than whatever the compiling user happens to have
-set. If you really do want to build a source set of character tables in a
-locale that is specified by the LC_xxx variables, you can run the dftables
-program by hand with the -L option. For example:
+When the pcre2_dftables program is run as a result of specifying
+--enable-rebuild-chartables, it uses the default C locale that is set on your
+system. It does not pay attention to the LC_xxx environment variables. In other
+words, it uses the system's default locale rather than whatever the compiling
+user happens to have set. If you really do want to build a source set of
+character tables in a locale that is specified by the LC_xxx variables, you can
+run the pcre2_dftables program by hand with the -L option. For example:
 
 
-  ./dftables -L pcre2_chartables.c.special
+  ./pcre2_dftables -L pcre2_chartables.c.special
 
 
-The first two 256-byte tables provide lower casing and case flipping functions,
-respectively. The next table consists of three 32-byte bit maps which identify
-digits, "word" characters, and white space, respectively. These are used when
-building 32-byte bit maps that represent character classes for code points less
-than 256. The final 256-byte table has bits indicating various character types,
-as follows:
+The second argument names the file where the source code for the tables is
+written. The first two 256-byte tables provide lower casing and case flipping
+functions, respectively. The next table consists of a number of 32-byte bit
+maps which identify certain character classes such as digits, "word"
+characters, white space, etc. These are used when building 32-byte bit maps
+that represent character classes for code points less than 256. The final
+256-byte table has bits indicating various character types, as follows:
 
 
     1   white space character
     1   white space character
     2   letter
     2   letter
-    4   decimal digit
-    8   hexadecimal digit
+    4   lower case letter
+    8   decimal digit
    16   alphanumeric or '_'
    16   alphanumeric or '_'
-  128   regular expression metacharacter or binary zero
 
 
-You should not alter the set of characters that contain the 128 bit, as that
-will cause PCRE2 to malfunction.
+You can also specify -b (with or without -L) when running pcre2_dftables. This
+causes the tables to be written in binary instead of as source code. A set of
+binary tables can be loaded into memory by an application and passed to
+pcre2_compile() in the same way as tables created dynamically by calling
+pcre2_maketables(). The tables are just a string of bytes, independent of
+hardware characteristics such as endianness. This means they can be bundled
+with an application that runs in different environments, to ensure consistent
+behaviour.
+
+See also the pcre2build section "Creating character tables at build time".
 
 
 
 
 File manifest
 File manifest
@@ -758,7 +779,7 @@ The distribution should contain the files listed below.
 (A) Source files for the PCRE2 library functions and their headers are found in
 (A) Source files for the PCRE2 library functions and their headers are found in
     the src directory:
     the src directory:
 
 
-  src/dftables.c           auxiliary program for building pcre2_chartables.c
+  src/pcre2_dftables.c     auxiliary program for building pcre2_chartables.c
                            when --enable-rebuild-chartables is specified
                            when --enable-rebuild-chartables is specified
 
 
   src/pcre2_chartables.c.dist  a default set of character tables that assume
   src/pcre2_chartables.c.dist  a default set of character tables that assume
@@ -784,6 +805,7 @@ The distribution should contain the files listed below.
   src/pcre2_newline.c      )
   src/pcre2_newline.c      )
   src/pcre2_ord2utf.c      )
   src/pcre2_ord2utf.c      )
   src/pcre2_pattern_info.c )
   src/pcre2_pattern_info.c )
+  src/pcre2_script_run.c   )
   src/pcre2_serialize.c    )
   src/pcre2_serialize.c    )
   src/pcre2_string_utils.c )
   src/pcre2_string_utils.c )
   src/pcre2_study.c        )
   src/pcre2_study.c        )
@@ -881,6 +903,6 @@ The distribution should contain the files listed below.
                           )   environments
                           )   environments
 
 
 Philip Hazel
 Philip Hazel
-Email local part: ph10
-Email domain: cam.ac.uk
-Last updated: 12 September 2017
+Email local part: Philip.Hazel
+Email domain: gmail.com
+Last updated: 29 October 2021

+ 102 - 34
regex.mod/pcre/RunGrepTest

@@ -4,6 +4,12 @@
 # itself. What we are checking here is the file handling and options that are
 # itself. What we are checking here is the file handling and options that are
 # supported by pcre2grep. This script must be run in the build directory.
 # supported by pcre2grep. This script must be run in the build directory.
 
 
+# CODING CONVENTIONS:
+# * Put printf arguments in single, not double quotes to avoid unwanted
+#     escaping.
+# * Use \0 for binary zero in printf, not \x0, for the benefit of older
+#     versions (and use octal for other special values).
+
 # Set the C locale, so that sort(1) behaves predictably.
 # Set the C locale, so that sort(1) behaves predictably.
 
 
 LC_ALL=C
 LC_ALL=C
@@ -164,11 +170,11 @@ echo "---------------------------- Test 14 -----------------------------" >>test
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
 echo "---------------------------- Test 15 -----------------------------" >>testtrygrep
 echo "---------------------------- Test 15 -----------------------------" >>testtrygrep
-(cd $srcdir; $valgrind $vjs $pcre2grep 'abc^*' ./testdata/grepinput) 2>>testtrygrep >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep 'abc^*' ./testdata/grepinput) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
 echo "---------------------------- Test 16 -----------------------------" >>testtrygrep
 echo "---------------------------- Test 16 -----------------------------" >>testtrygrep
-(cd $srcdir; $valgrind $vjs $pcre2grep abc ./testdata/grepinput ./testdata/nonexistfile) 2>>testtrygrep >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep abc ./testdata/grepinput ./testdata/nonexistfile) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
 echo "---------------------------- Test 17 -----------------------------" >>testtrygrep
 echo "---------------------------- Test 17 -----------------------------" >>testtrygrep
@@ -290,7 +296,7 @@ echo "---------------------------- Test 45 ------------------------------" >>tes
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
 echo "---------------------------- Test 46 ------------------------------" >>testtrygrep
 echo "---------------------------- Test 46 ------------------------------" >>testtrygrep
-(cd $srcdir; $valgrind $vjs $pcre2grep -eabc -e '(unclosed' ./testdata/grepinput) 2>>testtrygrep >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -eabc -e '(unclosed' ./testdata/grepinput) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
 echo "---------------------------- Test 47 ------------------------------" >>testtrygrep
 echo "---------------------------- Test 47 ------------------------------" >>testtrygrep
@@ -552,7 +558,7 @@ echo "RC=$?" >>testtrygrep
 echo "---------------------------- Test 107 -----------------------------" >>testtrygrep
 echo "---------------------------- Test 107 -----------------------------" >>testtrygrep
 echo "a" >testtemp1grep
 echo "a" >testtemp1grep
 echo "aaaaa" >>testtemp1grep
 echo "aaaaa" >>testtemp1grep
-(cd $srcdir; $valgrind $vjs $pcre2grep  --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
+(cd $srcdir; $valgrind $vjs $pcre2grep  --line-offsets --allow-lookaround-bsk '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
 echo "---------------------------- Test 108 ------------------------------" >>testtrygrep
 echo "---------------------------- Test 108 ------------------------------" >>testtrygrep
@@ -600,7 +606,7 @@ echo "---------------------------- Test 118 -----------------------------" >>tes
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
 echo "---------------------------- Test 119 -----------------------------" >>testtrygrep
 echo "---------------------------- Test 119 -----------------------------" >>testtrygrep
-printf "123\n456\n789\n---abc\ndef\nxyz\n---\n" >testNinputgrep
+printf '123\n456\n789\n---abc\ndef\nxyz\n---\n' >testNinputgrep
 $valgrind $vjs $pcre2grep -Mo '(\n|[^-])*---' testNinputgrep >>testtrygrep
 $valgrind $vjs $pcre2grep -Mo '(\n|[^-])*---' testNinputgrep >>testtrygrep
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
@@ -631,16 +637,49 @@ echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
 echo "---------------------------- Test 125 -----------------------------" >>testtrygrep
 echo "---------------------------- Test 125 -----------------------------" >>testtrygrep
-printf "abcd\n" >testNinputgrep
-$valgrind $vjs $pcre2grep --colour=always '(?<=\K.)' testNinputgrep >>testtrygrep
+printf 'abcd\n' >testNinputgrep
+$valgrind $vjs $pcre2grep --colour=always --allow-lookaround-bsk '(?<=\K.)' testNinputgrep >>testtrygrep
+echo "RC=$?" >>testtrygrep
+$valgrind $vjs $pcre2grep --colour=always --allow-lookaround-bsk '(?=.\K)' testNinputgrep >>testtrygrep
+echo "RC=$?" >>testtrygrep
+$valgrind $vjs $pcre2grep --colour=always --allow-lookaround-bsk '(?<=\K[ac])' testNinputgrep >>testtrygrep
+echo "RC=$?" >>testtrygrep
+$valgrind $vjs $pcre2grep --colour=always --allow-lookaround-bsk '(?=[ac]\K)' testNinputgrep >>testtrygrep
+echo "RC=$?" >>testtrygrep
+
+echo "---------------------------- Test 126 -----------------------------" >>testtrygrep
+printf 'Next line pattern has binary zero\nABC\0XYZ\n' >testtemp1grep
+printf 'ABC\0XYZ\nABCDEF\nDEFABC\n' >testtemp2grep
+$valgrind $vjs $pcre2grep -a -f testtemp1grep testtemp2grep >>testtrygrep
+echo "RC=$?" >>testtrygrep
+
+echo "---------------------------- Test 127 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -o --om-capture=0 'pattern()()()()' testdata/grepinput) >>testtrygrep
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
-$valgrind $vjs $pcre2grep --colour=always '(?=.\K)' testNinputgrep >>testtrygrep
+
+echo "---------------------------- Test 128 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -o1 --om-capture=0 'pattern()()()()' testdata/grepinput) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
-$valgrind $vjs $pcre2grep --colour=always '(?<=\K[ac])' testNinputgrep >>testtrygrep
+
+echo "---------------------------- Test 129 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -m 2 'fox' testdata/grepinput) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
-$valgrind $vjs $pcre2grep --colour=always '(?=[ac]\K)' testNinputgrep >>testtrygrep
+
+echo "---------------------------- Test 130 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -o -m2 'fox' testdata/grepinput) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep
 echo "RC=$?" >>testtrygrep
 
 
+echo "---------------------------- Test 131 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -oc -m2 'fox' testdata/grepinput) >>testtrygrep 2>&1
+echo "RC=$?" >>testtrygrep
+
+echo "---------------------------- Test 132 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -m1 -A3 '^match'; echo '---'; head -1) <$srcdir/testdata/grepinput >>testtrygrep 2>&1
+echo "RC=$?" >>testtrygrep
+
+echo "---------------------------- Test 133 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -m1 -O '=$x{41}$x423$o{103}$o1045=' 'fox') <$srcdir/testdata/grepinputv >>testtrygrep 2>&1
+echo "RC=$?" >>testtrygrep
 
 
 # Now compare the results.
 # Now compare the results.
 
 
@@ -662,7 +701,21 @@ if [ $utf8 -ne 0 ] ; then
   echo "RC=$?" >>testtrygrep
   echo "RC=$?" >>testtrygrep
 
 
   echo "---------------------------- Test U3 ------------------------------" >>testtrygrep
   echo "---------------------------- Test U3 ------------------------------" >>testtrygrep
-  (cd $srcdir; $valgrind $vjs $pcre2grep --line-offsets -u --newline=any '(?<=\K\x{17f})' ./testdata/grepinput8) >>testtrygrep
+  (cd $srcdir; $valgrind $vjs $pcre2grep --line-offsets -u --newline=any --allow-lookaround-bsk '(?<=\K\x{17f})' ./testdata/grepinput8) >>testtrygrep
+  echo "RC=$?" >>testtrygrep
+
+  echo "---------------------------- Test U4 ------------------------------" >>testtrygrep
+  printf 'A\341\200\200\200CD\342\200\200Z\n' >testtemp1grep
+  (cd $srcdir; $valgrind $vjs $pcre2grep -u -o '....' $builddir/testtemp1grep) >>testtrygrep 2>&1
+  echo "RC=$?" >>testtrygrep
+
+  echo "---------------------------- Test U5 ------------------------------" >>testtrygrep
+  printf 'A\341\200\200\200CD\342\200\200Z\n' >testtemp1grep
+  (cd $srcdir; $valgrind $vjs $pcre2grep -U -o '....' $builddir/testtemp1grep) >>testtrygrep
+  echo "RC=$?" >>testtrygrep
+
+  echo "---------------------------- Test U6 -----------------------------" >>testtrygrep
+  (cd $srcdir; $valgrind $vjs $pcre2grep -u -m1 -O '=$x{1d3}$o{744}=' 'fox') <$srcdir/testdata/grepinputv >>testtrygrep 2>&1
   echo "RC=$?" >>testtrygrep
   echo "RC=$?" >>testtrygrep
 
 
   $cf $srcdir/testdata/grepoutput8 testtrygrep
   $cf $srcdir/testdata/grepoutput8 testtrygrep
@@ -681,55 +734,70 @@ fi
 # starts with a hyphen. These tests are run in the build directory.
 # starts with a hyphen. These tests are run in the build directory.
 
 
 echo "Testing pcre2grep newline settings"
 echo "Testing pcre2grep newline settings"
-printf "abc\rdef\r\nghi\njkl" >testNinputgrep
+printf 'abc\rdef\r\nghi\njkl' >testNinputgrep
 
 
-printf "%c--------------------------- Test N1 ------------------------------\r\n" - >testtrygrep
+printf '%c--------------------------- Test N1 ------------------------------\r\n' - >testtrygrep
 $valgrind $vjs $pcre2grep -n -N CR "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
 $valgrind $vjs $pcre2grep -n -N CR "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
 
 
-printf "%c--------------------------- Test N2 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N2 ------------------------------\r\n' - >>testtrygrep
 $valgrind $vjs $pcre2grep -n --newline=crlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
 $valgrind $vjs $pcre2grep -n --newline=crlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
 
 
-printf "%c--------------------------- Test N3 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N3 ------------------------------\r\n' - >>testtrygrep
 pattern=`printf 'def\rjkl'`
 pattern=`printf 'def\rjkl'`
 $valgrind $vjs $pcre2grep -n --newline=cr -F "$pattern" testNinputgrep >>testtrygrep
 $valgrind $vjs $pcre2grep -n --newline=cr -F "$pattern" testNinputgrep >>testtrygrep
 
 
-printf "%c--------------------------- Test N4 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N4 ------------------------------\r\n' - >>testtrygrep
 $valgrind $vjs $pcre2grep -n --newline=crlf -F -f $srcdir/testdata/greppatN4 testNinputgrep >>testtrygrep
 $valgrind $vjs $pcre2grep -n --newline=crlf -F -f $srcdir/testdata/greppatN4 testNinputgrep >>testtrygrep
 
 
-printf "%c--------------------------- Test N5 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N5 ------------------------------\r\n' - >>testtrygrep
 $valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
 $valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
 
 
-printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N6 ------------------------------\r\n' - >>testtrygrep
 $valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
 $valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
 
 
-# It seems inpossible to handle NUL characters easily in Solaris (aka SunOS).
-# The version of sed explicitly doesn't like them. For the moment, we just
-# don't run this test under SunOS. Fudge the output so that the comparison
-# works. A similar problem has also been reported for MacOS (Darwin).
-
-printf "%c--------------------------- Test N7 ------------------------------\r\n" - >>testtrygrep
-uname=`uname`
-if [ "$uname" != "SunOS" -a "$uname" != "Darwin" ] ; then
-  printf "abc\0def" >testNinputgrep
-  $valgrind $vjs $pcre2grep -na --newline=nul "^(abc|def)" testNinputgrep | sed 's/\x00/ZERO/' >>testtrygrep
-  echo "" >>testtrygrep
+# This next test involves NUL characters. It seems impossible to handle them
+# easily in many operating systems. An earlier version of this script used sed
+# to translate NUL into the string ZERO, but this didn't work on Solaris (aka
+# SunOS), where the version of sed explicitly doesn't like them, and also MacOS
+# (Darwin), OpenBSD, FreeBSD, NetBSD, and some Linux distributions like Alpine,
+# even when using GNU sed. A user suggested using tr instead, which
+# necessitates translating to a single character (@). However, on (some
+# versions of?) Solaris, the normal "tr" cannot handle binary zeros, but if
+# /usr/xpg4/bin/tr is available, it can do so, so test for that.
+
+if [ -x /usr/xpg4/bin/tr ] ; then
+  tr=/usr/xpg4/bin/tr
 else
 else
-  echo '1:abcZERO2:def' >>testtrygrep
+  tr=tr
 fi
 fi
 
 
+printf '%c--------------------------- Test N7 ------------------------------\r\n' - >>testtrygrep
+printf 'abc\0def' >testNinputgrep
+$valgrind $vjs $pcre2grep -na --newline=nul "^(abc|def)" testNinputgrep | $tr '\000' '@' >>testtrygrep
+echo "" >>testtrygrep
+
 $cf $srcdir/testdata/grepoutputN testtrygrep
 $cf $srcdir/testdata/grepoutputN testtrygrep
 if [ $? != 0 ] ; then exit 1; fi
 if [ $? != 0 ] ; then exit 1; fi
 
 
-# If pcre2grep supports script callouts, run some tests on them.
+# If pcre2grep supports script callouts, run some tests on them. It is possible
+# to restrict these callouts to the non-fork case, either for security, or for
+# environments that do not support fork(). This is handled by comparing to a
+# different output.
 
 
-if $valgrind $vjs $pcre2grep --help | $valgrind $vjs $pcre2grep -q 'Callout scripts in patterns are supported'; then
+if $valgrind $vjs $pcre2grep --help | $valgrind $vjs $pcre2grep -q 'callout scripts in patterns are supported'; then
   echo "Testing pcre2grep script callouts"
   echo "Testing pcre2grep script callouts"
   $valgrind $vjs $pcre2grep '(T)(..(.))(?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4) ($14) ($0)")()' $srcdir/testdata/grepinputv >testtrygrep
   $valgrind $vjs $pcre2grep '(T)(..(.))(?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4) ($14) ($0)")()' $srcdir/testdata/grepinputv >testtrygrep
   $valgrind $vjs $pcre2grep '(T)(..(.))()()()()()()()(..)(?C"/bin/echo|Arg1: [$11] [${11}]")' $srcdir/testdata/grepinputv >>testtrygrep
   $valgrind $vjs $pcre2grep '(T)(..(.))()()()()()()()(..)(?C"/bin/echo|Arg1: [$11] [${11}]")' $srcdir/testdata/grepinputv >>testtrygrep
   $valgrind $vjs $pcre2grep '(T)(?C"|$0:$1$n")' $srcdir/testdata/grepinputv >>testtrygrep
   $valgrind $vjs $pcre2grep '(T)(?C"|$0:$1$n")' $srcdir/testdata/grepinputv >>testtrygrep
   $valgrind $vjs $pcre2grep '(T)(?C"|$1$n")(*F)' $srcdir/testdata/grepinputv >>testtrygrep
   $valgrind $vjs $pcre2grep '(T)(?C"|$1$n")(*F)' $srcdir/testdata/grepinputv >>testtrygrep
-  # The above has no newline, which 'diff -ub' ignores, so add one.
-  $cf $srcdir/testdata/grepoutputC testtrygrep
+  $valgrind $vjs $pcre2grep -m1 '(T)(?C"|$0:$1:$x{41}$o{101}$n")' $srcdir/testdata/grepinputv >>testtrygrep
+
+  if $valgrind $vjs $pcre2grep --help | $valgrind $vjs $pcre2grep -q 'Non-fork callout scripts in patterns are supported'; then
+    $cf $srcdir/testdata/grepoutputCN testtrygrep
+  else
+    $cf $srcdir/testdata/grepoutputC testtrygrep
+  fi
+
   if [ $? != 0 ] ; then exit 1; fi
   if [ $? != 0 ] ; then exit 1; fi
 else
 else
   echo "Script callouts are not supported"
   echo "Script callouts are not supported"

+ 7 - 2
regex.mod/pcre/RunGrepTest.bat

@@ -653,14 +653,19 @@ if ERRORLEVEL 1 exit /b 1
 
 
 :: If pcre2grep supports script callouts, run some tests on them.
 :: If pcre2grep supports script callouts, run some tests on them.
 
 
-%pcre2grep% --help | %pcre2grep% -q "Callout scripts in patterns are supported"
+%pcre2grep% --help | %pcre2grep% -q "callout scripts in patterns are supported"
 if %ERRORLEVEL% equ 0 (
 if %ERRORLEVEL% equ 0 (
   echo Testing pcre2grep script callouts
   echo Testing pcre2grep script callouts
   %pcre2grep% "(T)(..(.))(?C'cmd|/c echo|Arg1: [$1] [$2] [$3]|Arg2: ^$|${1}^$| ($4) ($14) ($0)')()" %srcdir%/testdata/grepinputv >testtrygrep
   %pcre2grep% "(T)(..(.))(?C'cmd|/c echo|Arg1: [$1] [$2] [$3]|Arg2: ^$|${1}^$| ($4) ($14) ($0)')()" %srcdir%/testdata/grepinputv >testtrygrep
   %pcre2grep% "(T)(..(.))()()()()()()()(..)(?C'cmd|/c echo|Arg1: [$11] [${11}]')" %srcdir%/testdata/grepinputv >>testtrygrep
   %pcre2grep% "(T)(..(.))()()()()()()()(..)(?C'cmd|/c echo|Arg1: [$11] [${11}]')" %srcdir%/testdata/grepinputv >>testtrygrep
   %pcre2grep% "(T)(?C'|$0:$1$n')" %srcdir%/testdata/grepinputv >>testtrygrep
   %pcre2grep% "(T)(?C'|$0:$1$n')" %srcdir%/testdata/grepinputv >>testtrygrep
   %pcre2grep% "(T)(?C'|$1$n')(*F)" %srcdir%/testdata/grepinputv >>testtrygrep
   %pcre2grep% "(T)(?C'|$1$n')(*F)" %srcdir%/testdata/grepinputv >>testtrygrep
-  %cf% %srcdir%\testdata\grepoutputC testtrygrep %cfout%
+  %pcre2grep% --help | %pcre2grep% -q "Non-script callout scripts in patterns are supported"
+  if %ERRORLEVEL% equ 0 (
+    %cf% %srcdir%\testdata\grepoutputCN testtrygrep %cfout%
+  ) else (
+    %cf% %srcdir%\testdata\grepoutputC testtrygrep %cfout%
+  )
   if ERRORLEVEL 1 exit /b 1
   if ERRORLEVEL 1 exit /b 1
 ) else (
 ) else (
   echo Script callouts are not supported
   echo Script callouts are not supported

+ 11 - 6
regex.mod/pcre/RunTest

@@ -319,7 +319,7 @@ fi
 # If it is possible to set the system stack size and -bigstack was given,
 # If it is possible to set the system stack size and -bigstack was given,
 # set up a large stack.
 # set up a large stack.
 
 
-$sim ./pcre2test -S 1 /dev/null /dev/null
+$sim ./pcre2test -S 64 /dev/null /dev/null
 if [ $? -eq 0 -a "$bigstack" != "" ] ; then
 if [ $? -eq 0 -a "$bigstack" != "" ] ; then
   setstack="-S 64"
   setstack="-S 64"
 else
 else
@@ -493,15 +493,20 @@ for bmode in "$test8" "$test16" "$test32"; do
     done
     done
   fi
   fi
 
 
-  # PCRE2 tests that are not Perl-compatible: API, errors, internals
+  # PCRE2 tests that are not Perl-compatible: API, errors, internals. We copy
+  # the testbtables file to the current directory for use by this test.
 
 
   if [ $do2 = yes ] ; then
   if [ $do2 = yes ] ; then
     echo $title2 "(excluding UTF-$bits)"
     echo $title2 "(excluding UTF-$bits)"
+    cp $testdata/testbtables .
     for opt in "" $jitopt; do
     for opt in "" $jitopt; do
       $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput2 testtry
       $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput2 testtry
-      if [ $? = 0 ] ; then
-        $sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -65,-62,-2,-1,0,100,101,191,200 >>testtry
+      saverc=$?
+      if [ $saverc = 0 ] ; then
+        $sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -70,-62,-2,-1,0,100,101,191,200 >>testtry
         checkresult $? 2 "$opt"
         checkresult $? 2 "$opt"
+      else
+        checkresult $saverc 2 "$opt"
       fi
       fi
     done
     done
   fi
   fi
@@ -843,7 +848,7 @@ for bmode in "$test8" "$test16" "$test32"; do
     checkresult $? 24 ""
     checkresult $? 24 ""
   fi
   fi
 
 
-  # UTF pattern converson tests
+  # UTF pattern conversion tests
 
 
   if [ "$do25" = yes ] ; then
   if [ "$do25" = yes ] ; then
     echo $title25
     echo $title25
@@ -859,6 +864,6 @@ for bmode in "$test8" "$test16" "$test32"; do
 done
 done
 
 
 # Clean up local working files
 # Clean up local working files
-rm -f testSinput test3input testsaved1 testsaved2 test3output test3outputA test3outputB teststdout teststderr testtry
+rm -f testbtables testSinput test3input testsaved1 testsaved2 test3output test3outputA test3outputB teststdout teststderr testtry
 
 
 # End
 # End

+ 525 - 523
regex.mod/pcre/RunTest.bat

@@ -1,523 +1,525 @@
-@echo off
-@rem
-@rem MS Windows batch file to run pcre2test on testfiles with the correct
-@rem options. This file must use CRLF linebreaks to function properly,
-@rem and requires both pcre2test and pcre2grep.
-@rem
-@rem ------------------------ HISTORY ----------------------------------
-@rem This file was originally contributed to PCRE1 by Ralf Junker, and touched
-@rem up by Daniel Richard G. Tests 10-12 added by Philip H.
-@rem Philip H also changed test 3 to use "wintest" files.
-@rem
-@rem Updated by Tom Fortmann to support explicit test numbers on the command
-@rem line. Added argument validation and added error reporting.
-@rem
-@rem Sheri Pierce added logic to skip feature dependent tests
-@rem tests 4 5 7 10 12 14 19 and 22 require Unicode support
-@rem 8 requires Unicode and link size 2
-@rem 16 requires absence of jit support
-@rem 17 requires presence of jit support
-@rem Sheri P also added override tests for study and jit testing
-@rem Zoltan Herczeg added libpcre16 support
-@rem Zoltan Herczeg added libpcre32 support
-@rem -------------------------------------------------------------------
-@rem
-@rem The file was converted for PCRE2 by PH, February 2015.
-@rem Updated for new test 14 (moving others up a number), August 2015.
-@rem Tidied and updated for new tests 21, 22, 23 by PH, October 2015.
-@rem PH added missing "set type" for test 22, April 2016.
-
-
-setlocal enabledelayedexpansion
-if [%srcdir%]==[] (
-if exist testdata\ set srcdir=.)
-if [%srcdir%]==[] (
-if exist ..\testdata\ set srcdir=..)
-if [%srcdir%]==[] (
-if exist ..\..\testdata\ set srcdir=..\..)
-if NOT exist %srcdir%\testdata\ (
-Error: echo distribution testdata folder not found!
-call :conferror
-exit /b 1
-goto :eof
-)
-
-if [%pcre2test%]==[] set pcre2test=.\pcre2test.exe
-
-echo source dir is %srcdir%
-echo pcre2test=%pcre2test%
-
-if NOT exist %pcre2test% (
-echo Error: %pcre2test% not found!
-echo.
-call :conferror
-exit /b 1
-)
-
-%pcre2test% -C linksize >NUL
-set link_size=%ERRORLEVEL%
-%pcre2test% -C pcre2-8 >NUL
-set support8=%ERRORLEVEL%
-%pcre2test% -C pcre2-16 >NUL
-set support16=%ERRORLEVEL%
-%pcre2test% -C pcre2-32 >NUL
-set support32=%ERRORLEVEL%
-%pcre2test% -C unicode >NUL
-set unicode=%ERRORLEVEL%
-%pcre2test% -C jit >NUL
-set jit=%ERRORLEVEL%
-%pcre2test% -C backslash-C >NUL
-set supportBSC=%ERRORLEVEL%
-
-if %support8% EQU 1 (
-if not exist testout8 md testout8
-if not exist testoutjit8 md testoutjit8
-)
-
-if %support16% EQU 1 (
-if not exist testout16 md testout16
-if not exist testoutjit16 md testoutjit16
-)
-
-if %support16% EQU 1 (
-if not exist testout32 md testout32
-if not exist testoutjit32 md testoutjit32
-)
-
-set do1=no
-set do2=no
-set do3=no
-set do4=no
-set do5=no
-set do6=no
-set do7=no
-set do8=no
-set do9=no
-set do10=no
-set do11=no
-set do12=no
-set do13=no
-set do14=no
-set do15=no
-set do16=no
-set do17=no
-set do18=no
-set do19=no
-set do20=no
-set do21=no
-set do22=no
-set do23=no
-set all=yes
-
-for %%a in (%*) do (
-  set valid=no
-  for %%v in (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23) do if %%v == %%a set valid=yes
-  if "!valid!" == "yes" (
-    set do%%a=yes
-    set all=no
-) else (
-    echo Invalid test number - %%a!
-        echo Usage %0 [ test_number ] ...
-        echo Where test_number is one or more optional test numbers 1 through 23, default is all tests.
-        exit /b 1
-)
-)
-set failed="no"
-
-if "%all%" == "yes" (
-  set do1=yes
-  set do2=yes
-  set do3=yes
-  set do4=yes
-  set do5=yes
-  set do6=yes
-  set do7=yes
-  set do8=yes
-  set do9=yes
-  set do10=yes
-  set do11=yes
-  set do12=yes
-  set do13=yes
-  set do14=yes
-  set do15=yes
-  set do16=yes
-  set do17=yes
-  set do18=yes
-  set do19=yes
-  set do20=yes
-  set do21=yes
-  set do22=yes
-  set do23=yes
-)
-
-@echo RunTest.bat's pcre2test output is written to newly created subfolders
-@echo named testout{8,16,32} and testoutjit{8,16,32}.
-@echo.
-
-set mode=
-set bits=8
-
-:nextMode
-if "%mode%" == "" (
-  if %support8% EQU 0 goto modeSkip
-  echo.
-  echo ---- Testing 8-bit library ----
-  echo.
-)
-if "%mode%" == "-16" (
-  if %support16% EQU 0 goto modeSkip
-  echo.
-  echo ---- Testing 16-bit library ----
-  echo.
-)
-if "%mode%" == "-32" (
-  if %support32% EQU 0 goto modeSkip
-  echo.
-  echo ---- Testing 32-bit library ----
-  echo.
-)
-if "%do1%" == "yes" call :do1
-if "%do2%" == "yes" call :do2
-if "%do3%" == "yes" call :do3
-if "%do4%" == "yes" call :do4
-if "%do5%" == "yes" call :do5
-if "%do6%" == "yes" call :do6
-if "%do7%" == "yes" call :do7
-if "%do8%" == "yes" call :do8
-if "%do9%" == "yes" call :do9
-if "%do10%" == "yes" call :do10
-if "%do11%" == "yes" call :do11
-if "%do12%" == "yes" call :do12
-if "%do13%" == "yes" call :do13
-if "%do14%" == "yes" call :do14
-if "%do15%" == "yes" call :do15
-if "%do16%" == "yes" call :do16
-if "%do17%" == "yes" call :do17
-if "%do18%" == "yes" call :do18
-if "%do19%" == "yes" call :do19
-if "%do20%" == "yes" call :do20
-if "%do21%" == "yes" call :do21
-if "%do22%" == "yes" call :do22
-if "%do23%" == "yes" call :do23
-:modeSkip
-if "%mode%" == "" (
-  set mode=-16
-  set bits=16
-  goto nextMode
-)
-if "%mode%" == "-16" (
-  set mode=-32
-  set bits=32
-  goto nextMode
-)
-
-@rem If mode is -32, testing is finished
-if %failed% == "yes" (
-echo In above output, one or more of the various tests failed!
-exit /b 1
-)
-echo All OK
-goto :eof
-
-:runsub
-@rem Function to execute pcre2test and compare the output
-@rem Arguments are as follows:
-@rem
-@rem       1 = test number
-@rem       2 = outputdir
-@rem       3 = test name use double quotes
-@rem   4 - 9 = pcre2test options
-
-if [%1] == [] (
-  echo Missing test number argument!
-  exit /b 1
-)
-
-if [%2] == [] (
-  echo Missing outputdir!
-  exit /b 1
-)
-
-if [%3] == [] (
-  echo Missing test name argument!
-  exit /b 1
-)
-
-if %1 == 8 (
-  set outnum=8-%bits%-%link_size%
-) else (
-  set outnum=%1
-)
-set testinput=testinput%1
-set testoutput=testoutput%outnum%
-if exist %srcdir%\testdata\win%testinput% (
-  set testinput=wintestinput%1
-  set testoutput=wintestoutput%outnum%
-)
-
-echo Test %1: %3
-%pcre2test% %mode% %4 %5 %6 %7 %8 %9 %srcdir%\testdata\%testinput% >%2%bits%\%testoutput%
-if errorlevel 1 (
-  echo.          failed executing command-line:
-  echo.            %pcre2test% %mode% %4 %5 %6 %7 %8 %9 %srcdir%\testdata\%testinput% ^>%2%bits%\%testoutput%
-  set failed="yes"
-  goto :eof
-) else if [%1]==[2] (
-  %pcre2test% %mode% %4 %5 %6 %7 %8 %9 -error -63,-62,-2,-1,0,100,188,189,190,191 >>%2%bits%\%testoutput%
-)
-
-set type=
-if [%1]==[11] (
-  set type=-%bits%
-)
-if [%1]==[12] (
-  set type=-%bits%
-)
-if [%1]==[14] (
-  set type=-%bits%
-)
-if [%1]==[22] (
-  set type=-%bits%
-)
-
-fc /n %srcdir%\testdata\%testoutput%%type% %2%bits%\%testoutput% >NUL
-
-if errorlevel 1 (
-  echo.          failed comparison: fc /n %srcdir%\testdata\%testoutput% %2%bits%\%testoutput%
-  if [%1]==[3] (
-    echo.
-    echo ** Test 3 failure usually means french locale is not
-    echo ** available on the system, rather than a bug or problem with PCRE2.
-    echo.
-    goto :eof
-)
-
-  set failed="yes"
-  goto :eof
-)
-
-echo.          Passed.
-goto :eof
-
-:do1
-call :runsub 1 testout "Main non-UTF, non-UCP functionality (Compatible with Perl >= 5.10)" -q
-if %jit% EQU 1 call :runsub 1 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do2
-  call :runsub 2 testout "API, errors, internals, and non-Perl stuff" -q
-  if %jit% EQU 1 call :runsub 2 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do3
-  call :runsub 3 testout "Locale-specific features" -q
-  if %jit% EQU 1 call :runsub 3 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do4
-if %unicode% EQU 0 (
-  echo Test 4 Skipped due to absence of Unicode support.
-  goto :eof
-)
-  call :runsub 4 testout "UTF-%bits% and Unicode property support - (Compatible with Perl >= 5.10)" -q
-  if %jit% EQU 1 call :runsub 4 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do5
-if %unicode% EQU 0 (
-  echo Test 5 Skipped due to absence of Unicode support.
-  goto :eof
-)
-  call :runsub 5 testout "API, internals, and non-Perl stuff for UTF-%bits% and UCP" -q
-  if %jit% EQU 1 call :runsub 5 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do6
-  call :runsub 6 testout "DFA matching main non-UTF, non-UCP functionality" -q
-goto :eof
-
-:do7
-if %unicode% EQU 0 (
-  echo Test 7 Skipped due to absence of Unicode support.
-  goto :eof
-)
-  call :runsub 7 testout "DFA matching with UTF-%bits% and Unicode property support" -q
-  goto :eof
-
-:do8
-if NOT %link_size% EQU 2 (
-  echo Test 8 Skipped because link size is not 2.
-  goto :eof
-)
-if %unicode% EQU 0 (
-  echo Test 8 Skipped due to absence of Unicode support.
-  goto :eof
-)
-  call :runsub 8 testout "Internal offsets and code size tests" -q
-goto :eof
-
-:do9
-if NOT %bits% EQU 8 (
-  echo Test 9 Skipped when running 16/32-bit tests.
-  goto :eof
-)
-  call :runsub 9 testout "Specials for the basic 8-bit library" -q
-  if %jit% EQU 1 call :runsub 9 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do10
-if NOT %bits% EQU 8 (
-  echo Test 10 Skipped when running 16/32-bit tests.
-  goto :eof
-)
-if %unicode% EQU 0 (
-  echo Test 10 Skipped due to absence of Unicode support.
-  goto :eof
-)
-  call :runsub 10 testout "Specials for the 8-bit library with Unicode support" -q
-  if %jit% EQU 1 call :runsub 10 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do11
-if %bits% EQU 8 (
-  echo Test 11 Skipped when running 8-bit tests.
-  goto :eof
-)
-  call :runsub 11 testout "Specials for the basic 16/32-bit library" -q
-  if %jit% EQU 1 call :runsub 11 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do12
-if %bits% EQU 8 (
-  echo Test 12 Skipped when running 8-bit tests.
-  goto :eof
-)
-if %unicode% EQU 0 (
-  echo Test 12 Skipped due to absence of Unicode support.
-  goto :eof
-)
-  call :runsub 12 testout "Specials for the 16/32-bit library with Unicode support" -q
-  if %jit% EQU 1 call :runsub 12 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do13
-if %bits% EQU 8 (
-  echo Test 13 Skipped when running 8-bit tests.
-  goto :eof
-)
-  call :runsub 13 testout "DFA specials for the basic 16/32-bit library" -q
-goto :eof
-
-:do14
-if %unicode% EQU 0 (
-  echo Test 14 Skipped due to absence of Unicode support.
-  goto :eof
-)
-  call :runsub 14 testout "DFA specials for UTF and UCP support" -q
-  goto :eof
-
-:do15
-call :runsub 15 testout "Non-JIT limits and other non_JIT tests" -q
-goto :eof
-
-:do16
-if %jit% EQU 1 (
-  echo Test 16 Skipped due to presence of JIT support.
-  goto :eof
-)
-  call :runsub 16 testout "JIT-specific features when JIT is not available" -q
-goto :eof
-
-:do17
-if %jit% EQU 0 (
-  echo Test 17 Skipped due to absence of JIT support.
-  goto :eof
-)
-  call :runsub 17 testout "JIT-specific features when JIT is available" -q
-goto :eof
-
-:do18
-if %bits% EQU 16 (
-  echo Test 18 Skipped when running 16-bit tests.
-  goto :eof
-)
-if %bits% EQU 32 (
-  echo Test 18 Skipped when running 32-bit tests.
-  goto :eof
-)
-  call :runsub 18 testout "POSIX interface, excluding UTF-8 and UCP" -q
-goto :eof
-
-:do19
-if %bits% EQU 16 (
-  echo Test 19 Skipped when running 16-bit tests.
-  goto :eof
-)
-if %bits% EQU 32 (
-  echo Test 19 Skipped when running 32-bit tests.
-  goto :eof
-)
-if %unicode% EQU 0 (
-  echo Test 19 Skipped due to absence of Unicode support.
-  goto :eof
-)
-  call :runsub 19 testout "POSIX interface with UTF-8 and UCP" -q
-goto :eof
-
-:do20
-call :runsub 20 testout "Serialization tests" -q
-goto :eof
-
-:do21
-if %supportBSC% EQU 0 (
-  echo Test 21 Skipped due to absence of backslash-C support.
-  goto :eof
-)
-  call :runsub 21 testout "Backslash-C tests without UTF" -q
-  call :runsub 21 testout "Backslash-C tests without UTF (DFA)" -q -dfa
-  if %jit% EQU 1 call :runsub 21 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do22
-if %supportBSC% EQU 0 (
-  echo Test 22 Skipped due to absence of backslash-C support.
-  goto :eof
-)
-if %unicode% EQU 0 (
-  echo Test 22 Skipped due to absence of Unicode support.
-  goto :eof
-)
-  call :runsub 22 testout "Backslash-C tests with UTF" -q
-  if %jit% EQU 1 call :runsub 22 testoutjit "Test with JIT Override" -q -jit
-goto :eof
-
-:do23
-if %supportBSC% EQU 1 (
-  echo Test 23 Skipped due to presence of backslash-C support.
-  goto :eof
-)
-  call :runsub 23 testout "Backslash-C disabled test" -q
-goto :eof
-
-:conferror
-@echo.
-@echo Either your build is incomplete or you have a configuration error.
-@echo.
-@echo If configured with cmake and executed via "make test" or the MSVC "RUN_TESTS"
-@echo project, pcre2_test.bat defines variables and automatically calls RunTest.bat.
-@echo For manual testing of all available features, after configuring with cmake
-@echo and building, you can run the built pcre2_test.bat. For best results with
-@echo cmake builds and tests avoid directories with full path names that include
-@echo spaces for source or build.
-@echo.
-@echo Otherwise, if the build dir is in a subdir of the source dir, testdata needed
-@echo for input and verification should be found automatically when (from the
-@echo location of the the built exes) you call RunTest.bat. By default RunTest.bat
-@echo runs all tests compatible with the linked pcre2 library but it can be given
-@echo a test number as an argument.
-@echo.
-@echo If the build dir is not under the source dir you can either copy your exes
-@echo to the source folder or copy RunTest.bat and the testdata folder to the
-@echo location of your built exes and then run RunTest.bat.
-@echo.
-goto :eof
+@echo off
+@rem
+@rem MS Windows batch file to run pcre2test on testfiles with the correct
+@rem options. This file must use CRLF linebreaks to function properly,
+@rem and requires both pcre2test and pcre2grep.
+@rem
+@rem ------------------------ HISTORY ----------------------------------
+@rem This file was originally contributed to PCRE1 by Ralf Junker, and touched
+@rem up by Daniel Richard G. Tests 10-12 added by Philip H.
+@rem Philip H also changed test 3 to use "wintest" files.
+@rem
+@rem Updated by Tom Fortmann to support explicit test numbers on the command
+@rem line. Added argument validation and added error reporting.
+@rem
+@rem Sheri Pierce added logic to skip feature dependent tests
+@rem tests 4 5 7 10 12 14 19 and 22 require Unicode support
+@rem 8 requires Unicode and link size 2
+@rem 16 requires absence of jit support
+@rem 17 requires presence of jit support
+@rem Sheri P also added override tests for study and jit testing
+@rem Zoltan Herczeg added libpcre16 support
+@rem Zoltan Herczeg added libpcre32 support
+@rem -------------------------------------------------------------------
+@rem
+@rem The file was converted for PCRE2 by PH, February 2015.
+@rem Updated for new test 14 (moving others up a number), August 2015.
+@rem Tidied and updated for new tests 21, 22, 23 by PH, October 2015.
+@rem PH added missing "set type" for test 22, April 2016.
+@rem PH added copy command for new testbtables file, November 2020
+
+
+setlocal enabledelayedexpansion
+if [%srcdir%]==[] (
+if exist testdata\ set srcdir=.)
+if [%srcdir%]==[] (
+if exist ..\testdata\ set srcdir=..)
+if [%srcdir%]==[] (
+if exist ..\..\testdata\ set srcdir=..\..)
+if NOT exist %srcdir%\testdata\ (
+Error: echo distribution testdata folder not found!
+call :conferror
+exit /b 1
+goto :eof
+)
+
+if [%pcre2test%]==[] set pcre2test=.\pcre2test.exe
+
+echo source dir is %srcdir%
+echo pcre2test=%pcre2test%
+
+if NOT exist %pcre2test% (
+echo Error: %pcre2test% not found!
+echo.
+call :conferror
+exit /b 1
+)
+
+%pcre2test% -C linksize >NUL
+set link_size=%ERRORLEVEL%
+%pcre2test% -C pcre2-8 >NUL
+set support8=%ERRORLEVEL%
+%pcre2test% -C pcre2-16 >NUL
+set support16=%ERRORLEVEL%
+%pcre2test% -C pcre2-32 >NUL
+set support32=%ERRORLEVEL%
+%pcre2test% -C unicode >NUL
+set unicode=%ERRORLEVEL%
+%pcre2test% -C jit >NUL
+set jit=%ERRORLEVEL%
+%pcre2test% -C backslash-C >NUL
+set supportBSC=%ERRORLEVEL%
+
+if %support8% EQU 1 (
+if not exist testout8 md testout8
+if not exist testoutjit8 md testoutjit8
+)
+
+if %support16% EQU 1 (
+if not exist testout16 md testout16
+if not exist testoutjit16 md testoutjit16
+)
+
+if %support16% EQU 1 (
+if not exist testout32 md testout32
+if not exist testoutjit32 md testoutjit32
+)
+
+set do1=no
+set do2=no
+set do3=no
+set do4=no
+set do5=no
+set do6=no
+set do7=no
+set do8=no
+set do9=no
+set do10=no
+set do11=no
+set do12=no
+set do13=no
+set do14=no
+set do15=no
+set do16=no
+set do17=no
+set do18=no
+set do19=no
+set do20=no
+set do21=no
+set do22=no
+set do23=no
+set all=yes
+
+for %%a in (%*) do (
+  set valid=no
+  for %%v in (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23) do if %%v == %%a set valid=yes
+  if "!valid!" == "yes" (
+    set do%%a=yes
+    set all=no
+) else (
+    echo Invalid test number - %%a!
+        echo Usage %0 [ test_number ] ...
+        echo Where test_number is one or more optional test numbers 1 through 23, default is all tests.
+        exit /b 1
+)
+)
+set failed="no"
+
+if "%all%" == "yes" (
+  set do1=yes
+  set do2=yes
+  set do3=yes
+  set do4=yes
+  set do5=yes
+  set do6=yes
+  set do7=yes
+  set do8=yes
+  set do9=yes
+  set do10=yes
+  set do11=yes
+  set do12=yes
+  set do13=yes
+  set do14=yes
+  set do15=yes
+  set do16=yes
+  set do17=yes
+  set do18=yes
+  set do19=yes
+  set do20=yes
+  set do21=yes
+  set do22=yes
+  set do23=yes
+)
+
+@echo RunTest.bat's pcre2test output is written to newly created subfolders
+@echo named testout{8,16,32} and testoutjit{8,16,32}.
+@echo.
+
+set mode=
+set bits=8
+
+:nextMode
+if "%mode%" == "" (
+  if %support8% EQU 0 goto modeSkip
+  echo.
+  echo ---- Testing 8-bit library ----
+  echo.
+)
+if "%mode%" == "-16" (
+  if %support16% EQU 0 goto modeSkip
+  echo.
+  echo ---- Testing 16-bit library ----
+  echo.
+)
+if "%mode%" == "-32" (
+  if %support32% EQU 0 goto modeSkip
+  echo.
+  echo ---- Testing 32-bit library ----
+  echo.
+)
+if "%do1%" == "yes" call :do1
+if "%do2%" == "yes" call :do2
+if "%do3%" == "yes" call :do3
+if "%do4%" == "yes" call :do4
+if "%do5%" == "yes" call :do5
+if "%do6%" == "yes" call :do6
+if "%do7%" == "yes" call :do7
+if "%do8%" == "yes" call :do8
+if "%do9%" == "yes" call :do9
+if "%do10%" == "yes" call :do10
+if "%do11%" == "yes" call :do11
+if "%do12%" == "yes" call :do12
+if "%do13%" == "yes" call :do13
+if "%do14%" == "yes" call :do14
+if "%do15%" == "yes" call :do15
+if "%do16%" == "yes" call :do16
+if "%do17%" == "yes" call :do17
+if "%do18%" == "yes" call :do18
+if "%do19%" == "yes" call :do19
+if "%do20%" == "yes" call :do20
+if "%do21%" == "yes" call :do21
+if "%do22%" == "yes" call :do22
+if "%do23%" == "yes" call :do23
+:modeSkip
+if "%mode%" == "" (
+  set mode=-16
+  set bits=16
+  goto nextMode
+)
+if "%mode%" == "-16" (
+  set mode=-32
+  set bits=32
+  goto nextMode
+)
+
+@rem If mode is -32, testing is finished
+if %failed% == "yes" (
+echo In above output, one or more of the various tests failed!
+exit /b 1
+)
+echo All OK
+goto :eof
+
+:runsub
+@rem Function to execute pcre2test and compare the output
+@rem Arguments are as follows:
+@rem
+@rem       1 = test number
+@rem       2 = outputdir
+@rem       3 = test name use double quotes
+@rem   4 - 9 = pcre2test options
+
+if [%1] == [] (
+  echo Missing test number argument!
+  exit /b 1
+)
+
+if [%2] == [] (
+  echo Missing outputdir!
+  exit /b 1
+)
+
+if [%3] == [] (
+  echo Missing test name argument!
+  exit /b 1
+)
+
+if %1 == 8 (
+  set outnum=8-%bits%-%link_size%
+) else (
+  set outnum=%1
+)
+set testinput=testinput%1
+set testoutput=testoutput%outnum%
+if exist %srcdir%\testdata\win%testinput% (
+  set testinput=wintestinput%1
+  set testoutput=wintestoutput%outnum%
+)
+
+echo Test %1: %3
+%pcre2test% %mode% %4 %5 %6 %7 %8 %9 %srcdir%\testdata\%testinput% >%2%bits%\%testoutput%
+if errorlevel 1 (
+  echo.          failed executing command-line:
+  echo.            %pcre2test% %mode% %4 %5 %6 %7 %8 %9 %srcdir%\testdata\%testinput% ^>%2%bits%\%testoutput%
+  set failed="yes"
+  goto :eof
+) else if [%1]==[2] (
+  %pcre2test% %mode% %4 %5 %6 %7 %8 %9 -error -70,-62,-2,-1,0,100,101,191,200 >>%2%bits%\%testoutput%
+)
+
+set type=
+if [%1]==[11] (
+  set type=-%bits%
+)
+if [%1]==[12] (
+  set type=-%bits%
+)
+if [%1]==[14] (
+  set type=-%bits%
+)
+if [%1]==[22] (
+  set type=-%bits%
+)
+
+fc /n %srcdir%\testdata\%testoutput%%type% %2%bits%\%testoutput% >NUL
+
+if errorlevel 1 (
+  echo.          failed comparison: fc /n %srcdir%\testdata\%testoutput% %2%bits%\%testoutput%
+  if [%1]==[3] (
+    echo.
+    echo ** Test 3 failure usually means french locale is not
+    echo ** available on the system, rather than a bug or problem with PCRE2.
+    echo.
+    goto :eof
+)
+
+  set failed="yes"
+  goto :eof
+)
+
+echo.          Passed.
+goto :eof
+
+:do1
+call :runsub 1 testout "Main non-UTF, non-UCP functionality (Compatible with Perl >= 5.10)" -q
+if %jit% EQU 1 call :runsub 1 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do2
+  copy /y %srcdir%\testdata\testbtables testbtables
 
+  call :runsub 2 testout "API, errors, internals, and non-Perl stuff" -q
+  if %jit% EQU 1 call :runsub 2 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do3
+  call :runsub 3 testout "Locale-specific features" -q
+  if %jit% EQU 1 call :runsub 3 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do4
+if %unicode% EQU 0 (
+  echo Test 4 Skipped due to absence of Unicode support.
+  goto :eof
+)
+  call :runsub 4 testout "UTF-%bits% and Unicode property support - (Compatible with Perl >= 5.10)" -q
+  if %jit% EQU 1 call :runsub 4 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do5
+if %unicode% EQU 0 (
+  echo Test 5 Skipped due to absence of Unicode support.
+  goto :eof
+)
+  call :runsub 5 testout "API, internals, and non-Perl stuff for UTF-%bits% and UCP" -q
+  if %jit% EQU 1 call :runsub 5 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do6
+  call :runsub 6 testout "DFA matching main non-UTF, non-UCP functionality" -q
+goto :eof
+
+:do7
+if %unicode% EQU 0 (
+  echo Test 7 Skipped due to absence of Unicode support.
+  goto :eof
+)
+  call :runsub 7 testout "DFA matching with UTF-%bits% and Unicode property support" -q
+  goto :eof
+
+:do8
+if NOT %link_size% EQU 2 (
+  echo Test 8 Skipped because link size is not 2.
+  goto :eof
+)
+if %unicode% EQU 0 (
+  echo Test 8 Skipped due to absence of Unicode support.
+  goto :eof
+)
+  call :runsub 8 testout "Internal offsets and code size tests" -q
+goto :eof
+
+:do9
+if NOT %bits% EQU 8 (
+  echo Test 9 Skipped when running 16/32-bit tests.
+  goto :eof
+)
+  call :runsub 9 testout "Specials for the basic 8-bit library" -q
+  if %jit% EQU 1 call :runsub 9 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do10
+if NOT %bits% EQU 8 (
+  echo Test 10 Skipped when running 16/32-bit tests.
+  goto :eof
+)
+if %unicode% EQU 0 (
+  echo Test 10 Skipped due to absence of Unicode support.
+  goto :eof
+)
+  call :runsub 10 testout "Specials for the 8-bit library with Unicode support" -q
+  if %jit% EQU 1 call :runsub 10 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do11
+if %bits% EQU 8 (
+  echo Test 11 Skipped when running 8-bit tests.
+  goto :eof
+)
+  call :runsub 11 testout "Specials for the basic 16/32-bit library" -q
+  if %jit% EQU 1 call :runsub 11 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do12
+if %bits% EQU 8 (
+  echo Test 12 Skipped when running 8-bit tests.
+  goto :eof
+)
+if %unicode% EQU 0 (
+  echo Test 12 Skipped due to absence of Unicode support.
+  goto :eof
+)
+  call :runsub 12 testout "Specials for the 16/32-bit library with Unicode support" -q
+  if %jit% EQU 1 call :runsub 12 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do13
+if %bits% EQU 8 (
+  echo Test 13 Skipped when running 8-bit tests.
+  goto :eof
+)
+  call :runsub 13 testout "DFA specials for the basic 16/32-bit library" -q
+goto :eof
+
+:do14
+if %unicode% EQU 0 (
+  echo Test 14 Skipped due to absence of Unicode support.
+  goto :eof
+)
+  call :runsub 14 testout "DFA specials for UTF and UCP support" -q
+  goto :eof
+
+:do15
+call :runsub 15 testout "Non-JIT limits and other non_JIT tests" -q
+goto :eof
+
+:do16
+if %jit% EQU 1 (
+  echo Test 16 Skipped due to presence of JIT support.
+  goto :eof
+)
+  call :runsub 16 testout "JIT-specific features when JIT is not available" -q
+goto :eof
+
+:do17
+if %jit% EQU 0 (
+  echo Test 17 Skipped due to absence of JIT support.
+  goto :eof
+)
+  call :runsub 17 testout "JIT-specific features when JIT is available" -q
+goto :eof
+
+:do18
+if %bits% EQU 16 (
+  echo Test 18 Skipped when running 16-bit tests.
+  goto :eof
+)
+if %bits% EQU 32 (
+  echo Test 18 Skipped when running 32-bit tests.
+  goto :eof
+)
+  call :runsub 18 testout "POSIX interface, excluding UTF-8 and UCP" -q
+goto :eof
+
+:do19
+if %bits% EQU 16 (
+  echo Test 19 Skipped when running 16-bit tests.
+  goto :eof
+)
+if %bits% EQU 32 (
+  echo Test 19 Skipped when running 32-bit tests.
+  goto :eof
+)
+if %unicode% EQU 0 (
+  echo Test 19 Skipped due to absence of Unicode support.
+  goto :eof
+)
+  call :runsub 19 testout "POSIX interface with UTF-8 and UCP" -q
+goto :eof
+
+:do20
+call :runsub 20 testout "Serialization tests" -q
+goto :eof
+
+:do21
+if %supportBSC% EQU 0 (
+  echo Test 21 Skipped due to absence of backslash-C support.
+  goto :eof
+)
+  call :runsub 21 testout "Backslash-C tests without UTF" -q
+  call :runsub 21 testout "Backslash-C tests without UTF (DFA)" -q -dfa
+  if %jit% EQU 1 call :runsub 21 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do22
+if %supportBSC% EQU 0 (
+  echo Test 22 Skipped due to absence of backslash-C support.
+  goto :eof
+)
+if %unicode% EQU 0 (
+  echo Test 22 Skipped due to absence of Unicode support.
+  goto :eof
+)
+  call :runsub 22 testout "Backslash-C tests with UTF" -q
+  if %jit% EQU 1 call :runsub 22 testoutjit "Test with JIT Override" -q -jit
+goto :eof
+
+:do23
+if %supportBSC% EQU 1 (
+  echo Test 23 Skipped due to presence of backslash-C support.
+  goto :eof
+)
+  call :runsub 23 testout "Backslash-C disabled test" -q
+goto :eof
+
+:conferror
+@echo.
+@echo Either your build is incomplete or you have a configuration error.
+@echo.
+@echo If configured with cmake and executed via "make test" or the MSVC "RUN_TESTS"
+@echo project, pcre2_test.bat defines variables and automatically calls RunTest.bat.
+@echo For manual testing of all available features, after configuring with cmake
+@echo and building, you can run the built pcre2_test.bat. For best results with
+@echo cmake builds and tests avoid directories with full path names that include
+@echo spaces for source or build.
+@echo.
+@echo Otherwise, if the build dir is in a subdir of the source dir, testdata needed
+@echo for input and verification should be found automatically when (from the
+@echo location of the the built exes) you call RunTest.bat. By default RunTest.bat
+@echo runs all tests compatible with the linked pcre2 library but it can be given
+@echo a test number as an argument.
+@echo.
+@echo If the build dir is not under the source dir you can either copy your exes
+@echo to the source folder or copy RunTest.bat and the testdata folder to the
+@echo location of your built exes and then run RunTest.bat.
+@echo.
+goto :eof

+ 185 - 119
regex.mod/pcre/aclocal.m4

@@ -1,6 +1,6 @@
-# generated automatically by aclocal 1.15.1 -*- Autoconf -*-
+# generated automatically by aclocal 1.16.5 -*- Autoconf -*-
 
 
-# Copyright (C) 1996-2017 Free Software Foundation, Inc.
+# Copyright (C) 1996-2021 Free Software Foundation, Inc.
 
 
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -14,14 +14,14 @@
 m4_ifndef([AC_CONFIG_MACRO_DIRS], [m4_defun([_AM_CONFIG_MACRO_DIRS], [])m4_defun([AC_CONFIG_MACRO_DIRS], [_AM_CONFIG_MACRO_DIRS($@)])])
 m4_ifndef([AC_CONFIG_MACRO_DIRS], [m4_defun([_AM_CONFIG_MACRO_DIRS], [])m4_defun([AC_CONFIG_MACRO_DIRS], [_AM_CONFIG_MACRO_DIRS($@)])])
 m4_ifndef([AC_AUTOCONF_VERSION],
 m4_ifndef([AC_AUTOCONF_VERSION],
   [m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl
   [m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl
-m4_if(m4_defn([AC_AUTOCONF_VERSION]), [2.69],,
-[m4_warning([this file was generated for autoconf 2.69.
+m4_if(m4_defn([AC_AUTOCONF_VERSION]), [2.71],,
+[m4_warning([this file was generated for autoconf 2.71.
 You have another version of autoconf.  It may work, but is not guaranteed to.
 You have another version of autoconf.  It may work, but is not guaranteed to.
 If you have problems, you may need to regenerate the build system entirely.
 If you have problems, you may need to regenerate the build system entirely.
 To do so, use the procedure documented by the package, typically 'autoreconf'.])])
 To do so, use the procedure documented by the package, typically 'autoreconf'.])])
 
 
 # pkg.m4 - Macros to locate and utilise pkg-config.   -*- Autoconf -*-
 # pkg.m4 - Macros to locate and utilise pkg-config.   -*- Autoconf -*-
-# serial 12 (pkg-config-0.29.2)
+# serial 11 (pkg-config-0.29.1)
 
 
 dnl Copyright © 2004 Scott James Remnant <[email protected]>.
 dnl Copyright © 2004 Scott James Remnant <[email protected]>.
 dnl Copyright © 2012-2015 Dan Nicholson <[email protected]>
 dnl Copyright © 2012-2015 Dan Nicholson <[email protected]>
@@ -63,7 +63,7 @@ dnl
 dnl See the "Since" comment for each macro you use to see what version
 dnl See the "Since" comment for each macro you use to see what version
 dnl of the macros you require.
 dnl of the macros you require.
 m4_defun([PKG_PREREQ],
 m4_defun([PKG_PREREQ],
-[m4_define([PKG_MACROS_VERSION], [0.29.2])
+[m4_define([PKG_MACROS_VERSION], [0.29.1])
 m4_if(m4_version_compare(PKG_MACROS_VERSION, [$1]), -1,
 m4_if(m4_version_compare(PKG_MACROS_VERSION, [$1]), -1,
     [m4_fatal([pkg.m4 version $1 or higher is required but ]PKG_MACROS_VERSION[ found])])
     [m4_fatal([pkg.m4 version $1 or higher is required but ]PKG_MACROS_VERSION[ found])])
 ])dnl PKG_PREREQ
 ])dnl PKG_PREREQ
@@ -164,7 +164,7 @@ AC_ARG_VAR([$1][_CFLAGS], [C compiler flags for $1, overriding pkg-config])dnl
 AC_ARG_VAR([$1][_LIBS], [linker flags for $1, overriding pkg-config])dnl
 AC_ARG_VAR([$1][_LIBS], [linker flags for $1, overriding pkg-config])dnl
 
 
 pkg_failed=no
 pkg_failed=no
-AC_MSG_CHECKING([for $2])
+AC_MSG_CHECKING([for $1])
 
 
 _PKG_CONFIG([$1][_CFLAGS], [cflags], [$2])
 _PKG_CONFIG([$1][_CFLAGS], [cflags], [$2])
 _PKG_CONFIG([$1][_LIBS], [libs], [$2])
 _PKG_CONFIG([$1][_LIBS], [libs], [$2])
@@ -174,11 +174,11 @@ and $1[]_LIBS to avoid the need to call pkg-config.
 See the pkg-config man page for more details.])
 See the pkg-config man page for more details.])
 
 
 if test $pkg_failed = yes; then
 if test $pkg_failed = yes; then
-        AC_MSG_RESULT([no])
+   	AC_MSG_RESULT([no])
         _PKG_SHORT_ERRORS_SUPPORTED
         _PKG_SHORT_ERRORS_SUPPORTED
         if test $_pkg_short_errors_supported = yes; then
         if test $_pkg_short_errors_supported = yes; then
 	        $1[]_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "$2" 2>&1`
 	        $1[]_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "$2" 2>&1`
-        else
+        else 
 	        $1[]_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "$2" 2>&1`
 	        $1[]_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "$2" 2>&1`
         fi
         fi
 	# Put the nasty error message in config.log where it belongs
 	# Put the nasty error message in config.log where it belongs
@@ -195,7 +195,7 @@ installed software in a non-standard prefix.
 _PKG_TEXT])[]dnl
 _PKG_TEXT])[]dnl
         ])
         ])
 elif test $pkg_failed = untried; then
 elif test $pkg_failed = untried; then
-        AC_MSG_RESULT([no])
+     	AC_MSG_RESULT([no])
 	m4_default([$4], [AC_MSG_FAILURE(
 	m4_default([$4], [AC_MSG_FAILURE(
 [The pkg-config script could not be found or is too old.  Make sure it
 [The pkg-config script could not be found or is too old.  Make sure it
 is in your PATH or set the PKG_CONFIG environment variable to the full
 is in your PATH or set the PKG_CONFIG environment variable to the full
@@ -296,7 +296,75 @@ AS_VAR_COPY([$1], [pkg_cv_][$1])
 AS_VAR_IF([$1], [""], [$5], [$4])dnl
 AS_VAR_IF([$1], [""], [$5], [$4])dnl
 ])dnl PKG_CHECK_VAR
 ])dnl PKG_CHECK_VAR
 
 
-# Copyright (C) 2002-2017 Free Software Foundation, Inc.
+dnl PKG_WITH_MODULES(VARIABLE-PREFIX, MODULES,
+dnl   [ACTION-IF-FOUND],[ACTION-IF-NOT-FOUND],
+dnl   [DESCRIPTION], [DEFAULT])
+dnl ------------------------------------------
+dnl
+dnl Prepare a "--with-" configure option using the lowercase
+dnl [VARIABLE-PREFIX] name, merging the behaviour of AC_ARG_WITH and
+dnl PKG_CHECK_MODULES in a single macro.
+AC_DEFUN([PKG_WITH_MODULES],
+[
+m4_pushdef([with_arg], m4_tolower([$1]))
+
+m4_pushdef([description],
+           [m4_default([$5], [build with ]with_arg[ support])])
+
+m4_pushdef([def_arg], [m4_default([$6], [auto])])
+m4_pushdef([def_action_if_found], [AS_TR_SH([with_]with_arg)=yes])
+m4_pushdef([def_action_if_not_found], [AS_TR_SH([with_]with_arg)=no])
+
+m4_case(def_arg,
+            [yes],[m4_pushdef([with_without], [--without-]with_arg)],
+            [m4_pushdef([with_without],[--with-]with_arg)])
+
+AC_ARG_WITH(with_arg,
+     AS_HELP_STRING(with_without, description[ @<:@default=]def_arg[@:>@]),,
+    [AS_TR_SH([with_]with_arg)=def_arg])
+
+AS_CASE([$AS_TR_SH([with_]with_arg)],
+            [yes],[PKG_CHECK_MODULES([$1],[$2],$3,$4)],
+            [auto],[PKG_CHECK_MODULES([$1],[$2],
+                                        [m4_n([def_action_if_found]) $3],
+                                        [m4_n([def_action_if_not_found]) $4])])
+
+m4_popdef([with_arg])
+m4_popdef([description])
+m4_popdef([def_arg])
+
+])dnl PKG_WITH_MODULES
+
+dnl PKG_HAVE_WITH_MODULES(VARIABLE-PREFIX, MODULES,
+dnl   [DESCRIPTION], [DEFAULT])
+dnl -----------------------------------------------
+dnl
+dnl Convenience macro to trigger AM_CONDITIONAL after PKG_WITH_MODULES
+dnl check._[VARIABLE-PREFIX] is exported as make variable.
+AC_DEFUN([PKG_HAVE_WITH_MODULES],
+[
+PKG_WITH_MODULES([$1],[$2],,,[$3],[$4])
+
+AM_CONDITIONAL([HAVE_][$1],
+               [test "$AS_TR_SH([with_]m4_tolower([$1]))" = "yes"])
+])dnl PKG_HAVE_WITH_MODULES
+
+dnl PKG_HAVE_DEFINE_WITH_MODULES(VARIABLE-PREFIX, MODULES,
+dnl   [DESCRIPTION], [DEFAULT])
+dnl ------------------------------------------------------
+dnl
+dnl Convenience macro to run AM_CONDITIONAL and AC_DEFINE after
+dnl PKG_WITH_MODULES check. HAVE_[VARIABLE-PREFIX] is exported as make
+dnl and preprocessor variable.
+AC_DEFUN([PKG_HAVE_DEFINE_WITH_MODULES],
+[
+PKG_HAVE_WITH_MODULES([$1],[$2],[$3],[$4])
+
+AS_IF([test "$AS_TR_SH([with_]m4_tolower([$1]))" = "yes"],
+        [AC_DEFINE([HAVE_][$1], 1, [Enable ]m4_tolower([$1])[ support])])
+])dnl PKG_HAVE_DEFINE_WITH_MODULES
+
+# Copyright (C) 2002-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -308,10 +376,10 @@ AS_VAR_IF([$1], [""], [$5], [$4])dnl
 # generated from the m4 files accompanying Automake X.Y.
 # generated from the m4 files accompanying Automake X.Y.
 # (This private macro should not be called outside this file.)
 # (This private macro should not be called outside this file.)
 AC_DEFUN([AM_AUTOMAKE_VERSION],
 AC_DEFUN([AM_AUTOMAKE_VERSION],
-[am__api_version='1.15'
+[am__api_version='1.16'
 dnl Some users find AM_AUTOMAKE_VERSION and mistake it for a way to
 dnl Some users find AM_AUTOMAKE_VERSION and mistake it for a way to
 dnl require some minimum version.  Point them to the right macro.
 dnl require some minimum version.  Point them to the right macro.
-m4_if([$1], [1.15.1], [],
+m4_if([$1], [1.16.5], [],
       [AC_FATAL([Do not call $0, use AM_INIT_AUTOMAKE([$1]).])])dnl
       [AC_FATAL([Do not call $0, use AM_INIT_AUTOMAKE([$1]).])])dnl
 ])
 ])
 
 
@@ -327,12 +395,12 @@ m4_define([_AM_AUTOCONF_VERSION], [])
 # Call AM_AUTOMAKE_VERSION and AM_AUTOMAKE_VERSION so they can be traced.
 # Call AM_AUTOMAKE_VERSION and AM_AUTOMAKE_VERSION so they can be traced.
 # This function is AC_REQUIREd by AM_INIT_AUTOMAKE.
 # This function is AC_REQUIREd by AM_INIT_AUTOMAKE.
 AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION],
 AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION],
-[AM_AUTOMAKE_VERSION([1.15.1])dnl
+[AM_AUTOMAKE_VERSION([1.16.5])dnl
 m4_ifndef([AC_AUTOCONF_VERSION],
 m4_ifndef([AC_AUTOCONF_VERSION],
   [m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl
   [m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl
 _AM_AUTOCONF_VERSION(m4_defn([AC_AUTOCONF_VERSION]))])
 _AM_AUTOCONF_VERSION(m4_defn([AC_AUTOCONF_VERSION]))])
 
 
-# Copyright (C) 2011-2017 Free Software Foundation, Inc.
+# Copyright (C) 2011-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -394,7 +462,7 @@ AC_SUBST([AR])dnl
 
 
 # AM_AUX_DIR_EXPAND                                         -*- Autoconf -*-
 # AM_AUX_DIR_EXPAND                                         -*- Autoconf -*-
 
 
-# Copyright (C) 2001-2017 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -446,7 +514,7 @@ am_aux_dir=`cd "$ac_aux_dir" && pwd`
 
 
 # AM_CONDITIONAL                                            -*- Autoconf -*-
 # AM_CONDITIONAL                                            -*- Autoconf -*-
 
 
-# Copyright (C) 1997-2017 Free Software Foundation, Inc.
+# Copyright (C) 1997-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -477,7 +545,7 @@ AC_CONFIG_COMMANDS_PRE(
 Usually this means the macro was only invoked conditionally.]])
 Usually this means the macro was only invoked conditionally.]])
 fi])])
 fi])])
 
 
-# Copyright (C) 1999-2017 Free Software Foundation, Inc.
+# Copyright (C) 1999-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -668,13 +736,12 @@ _AM_SUBST_NOTMAKE([am__nodep])dnl
 
 
 # Generate code to set up dependency tracking.              -*- Autoconf -*-
 # Generate code to set up dependency tracking.              -*- Autoconf -*-
 
 
-# Copyright (C) 1999-2017 Free Software Foundation, Inc.
+# Copyright (C) 1999-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
 # with or without modifications, as long as this notice is preserved.
 # with or without modifications, as long as this notice is preserved.
 
 
-
 # _AM_OUTPUT_DEPENDENCY_COMMANDS
 # _AM_OUTPUT_DEPENDENCY_COMMANDS
 # ------------------------------
 # ------------------------------
 AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
 AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
@@ -682,49 +749,43 @@ AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
   # Older Autoconf quotes --file arguments for eval, but not when files
   # Older Autoconf quotes --file arguments for eval, but not when files
   # are listed without --file.  Let's play safe and only enable the eval
   # are listed without --file.  Let's play safe and only enable the eval
   # if we detect the quoting.
   # if we detect the quoting.
-  case $CONFIG_FILES in
-  *\'*) eval set x "$CONFIG_FILES" ;;
-  *)   set x $CONFIG_FILES ;;
-  esac
+  # TODO: see whether this extra hack can be removed once we start
+  # requiring Autoconf 2.70 or later.
+  AS_CASE([$CONFIG_FILES],
+          [*\'*], [eval set x "$CONFIG_FILES"],
+          [*], [set x $CONFIG_FILES])
   shift
   shift
-  for mf
+  # Used to flag and report bootstrapping failures.
+  am_rc=0
+  for am_mf
   do
   do
     # Strip MF so we end up with the name of the file.
     # Strip MF so we end up with the name of the file.
-    mf=`echo "$mf" | sed -e 's/:.*$//'`
-    # Check whether this is an Automake generated Makefile or not.
-    # We used to match only the files named 'Makefile.in', but
-    # some people rename them; so instead we look at the file content.
-    # Grep'ing the first line is not enough: some people post-process
-    # each Makefile.in and add a new line on top of each file to say so.
-    # Grep'ing the whole file is not good either: AIX grep has a line
+    am_mf=`AS_ECHO(["$am_mf"]) | sed -e 's/:.*$//'`
+    # Check whether this is an Automake generated Makefile which includes
+    # dependency-tracking related rules and includes.
+    # Grep'ing the whole file directly is not great: AIX grep has a line
     # limit of 2048, but all sed's we know have understand at least 4000.
     # limit of 2048, but all sed's we know have understand at least 4000.
-    if sed -n 's,^#.*generated by automake.*,X,p' "$mf" | grep X >/dev/null 2>&1; then
-      dirpart=`AS_DIRNAME("$mf")`
-    else
-      continue
-    fi
-    # Extract the definition of DEPDIR, am__include, and am__quote
-    # from the Makefile without running 'make'.
-    DEPDIR=`sed -n 's/^DEPDIR = //p' < "$mf"`
-    test -z "$DEPDIR" && continue
-    am__include=`sed -n 's/^am__include = //p' < "$mf"`
-    test -z "$am__include" && continue
-    am__quote=`sed -n 's/^am__quote = //p' < "$mf"`
-    # Find all dependency output files, they are included files with
-    # $(DEPDIR) in their names.  We invoke sed twice because it is the
-    # simplest approach to changing $(DEPDIR) to its actual value in the
-    # expansion.
-    for file in `sed -n "
-      s/^$am__include $am__quote\(.*(DEPDIR).*\)$am__quote"'$/\1/p' <"$mf" | \
-	 sed -e 's/\$(DEPDIR)/'"$DEPDIR"'/g'`; do
-      # Make sure the directory exists.
-      test -f "$dirpart/$file" && continue
-      fdir=`AS_DIRNAME(["$file"])`
-      AS_MKDIR_P([$dirpart/$fdir])
-      # echo "creating $dirpart/$file"
-      echo '# dummy' > "$dirpart/$file"
-    done
+    sed -n 's,^am--depfiles:.*,X,p' "$am_mf" | grep X >/dev/null 2>&1 \
+      || continue
+    am_dirpart=`AS_DIRNAME(["$am_mf"])`
+    am_filepart=`AS_BASENAME(["$am_mf"])`
+    AM_RUN_LOG([cd "$am_dirpart" \
+      && sed -e '/# am--include-marker/d' "$am_filepart" \
+        | $MAKE -f - am--depfiles]) || am_rc=$?
   done
   done
+  if test $am_rc -ne 0; then
+    AC_MSG_FAILURE([Something went wrong bootstrapping makefile fragments
+    for automatic dependency tracking.  If GNU make was not used, consider
+    re-running the configure script with MAKE="gmake" (or whatever is
+    necessary).  You can also try re-running configure with the
+    '--disable-dependency-tracking' option to at least be able to build
+    the package (albeit without support for automatic dependency tracking).])
+  fi
+  AS_UNSET([am_dirpart])
+  AS_UNSET([am_filepart])
+  AS_UNSET([am_mf])
+  AS_UNSET([am_rc])
+  rm -f conftest-deps.mk
 }
 }
 ])# _AM_OUTPUT_DEPENDENCY_COMMANDS
 ])# _AM_OUTPUT_DEPENDENCY_COMMANDS
 
 
@@ -733,18 +794,17 @@ AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
 # -----------------------------
 # -----------------------------
 # This macro should only be invoked once -- use via AC_REQUIRE.
 # This macro should only be invoked once -- use via AC_REQUIRE.
 #
 #
-# This code is only required when automatic dependency tracking
-# is enabled.  FIXME.  This creates each '.P' file that we will
-# need in order to bootstrap the dependency handling code.
+# This code is only required when automatic dependency tracking is enabled.
+# This creates each '.Po' and '.Plo' makefile fragment that we'll need in
+# order to bootstrap the dependency handling code.
 AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS],
 AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS],
 [AC_CONFIG_COMMANDS([depfiles],
 [AC_CONFIG_COMMANDS([depfiles],
      [test x"$AMDEP_TRUE" != x"" || _AM_OUTPUT_DEPENDENCY_COMMANDS],
      [test x"$AMDEP_TRUE" != x"" || _AM_OUTPUT_DEPENDENCY_COMMANDS],
-     [AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir"])
-])
+     [AMDEP_TRUE="$AMDEP_TRUE" MAKE="${MAKE-make}"])])
 
 
 # Do all the work for Automake.                             -*- Autoconf -*-
 # Do all the work for Automake.                             -*- Autoconf -*-
 
 
-# Copyright (C) 1996-2017 Free Software Foundation, Inc.
+# Copyright (C) 1996-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -772,6 +832,10 @@ m4_defn([AC_PROG_CC])
 # release and drop the old call support.
 # release and drop the old call support.
 AC_DEFUN([AM_INIT_AUTOMAKE],
 AC_DEFUN([AM_INIT_AUTOMAKE],
 [AC_PREREQ([2.65])dnl
 [AC_PREREQ([2.65])dnl
+m4_ifdef([_$0_ALREADY_INIT],
+  [m4_fatal([$0 expanded multiple times
+]m4_defn([_$0_ALREADY_INIT]))],
+  [m4_define([_$0_ALREADY_INIT], m4_expansion_stack)])dnl
 dnl Autoconf wants to disallow AM_ names.  We explicitly allow
 dnl Autoconf wants to disallow AM_ names.  We explicitly allow
 dnl the ones we care about.
 dnl the ones we care about.
 m4_pattern_allow([^AM_[A-Z]+FLAGS$])dnl
 m4_pattern_allow([^AM_[A-Z]+FLAGS$])dnl
@@ -808,7 +872,7 @@ m4_ifval([$3], [_AM_SET_OPTION([no-define])])dnl
 [_AM_SET_OPTIONS([$1])dnl
 [_AM_SET_OPTIONS([$1])dnl
 dnl Diagnose old-style AC_INIT with new-style AM_AUTOMAKE_INIT.
 dnl Diagnose old-style AC_INIT with new-style AM_AUTOMAKE_INIT.
 m4_if(
 m4_if(
-  m4_ifdef([AC_PACKAGE_NAME], [ok]):m4_ifdef([AC_PACKAGE_VERSION], [ok]),
+  m4_ifset([AC_PACKAGE_NAME], [ok]):m4_ifset([AC_PACKAGE_VERSION], [ok]),
   [ok:ok],,
   [ok:ok],,
   [m4_fatal([AC_INIT should be called with package and version arguments])])dnl
   [m4_fatal([AC_INIT should be called with package and version arguments])])dnl
  AC_SUBST([PACKAGE], ['AC_PACKAGE_TARNAME'])dnl
  AC_SUBST([PACKAGE], ['AC_PACKAGE_TARNAME'])dnl
@@ -831,8 +895,8 @@ AC_REQUIRE([AM_PROG_INSTALL_STRIP])dnl
 AC_REQUIRE([AC_PROG_MKDIR_P])dnl
 AC_REQUIRE([AC_PROG_MKDIR_P])dnl
 # For better backward compatibility.  To be removed once Automake 1.9.x
 # For better backward compatibility.  To be removed once Automake 1.9.x
 # dies out for good.  For more background, see:
 # dies out for good.  For more background, see:
-# <http://lists.gnu.org/archive/html/automake/2012-07/msg00001.html>
-# <http://lists.gnu.org/archive/html/automake/2012-07/msg00014.html>
+# <https://lists.gnu.org/archive/html/automake/2012-07/msg00001.html>
+# <https://lists.gnu.org/archive/html/automake/2012-07/msg00014.html>
 AC_SUBST([mkdir_p], ['$(MKDIR_P)'])
 AC_SUBST([mkdir_p], ['$(MKDIR_P)'])
 # We need awk for the "check" target (and possibly the TAP driver).  The
 # We need awk for the "check" target (and possibly the TAP driver).  The
 # system "awk" is bad on some platforms.
 # system "awk" is bad on some platforms.
@@ -860,6 +924,20 @@ AC_PROVIDE_IFELSE([AC_PROG_OBJCXX],
 		  [m4_define([AC_PROG_OBJCXX],
 		  [m4_define([AC_PROG_OBJCXX],
 			     m4_defn([AC_PROG_OBJCXX])[_AM_DEPENDENCIES([OBJCXX])])])dnl
 			     m4_defn([AC_PROG_OBJCXX])[_AM_DEPENDENCIES([OBJCXX])])])dnl
 ])
 ])
+# Variables for tags utilities; see am/tags.am
+if test -z "$CTAGS"; then
+  CTAGS=ctags
+fi
+AC_SUBST([CTAGS])
+if test -z "$ETAGS"; then
+  ETAGS=etags
+fi
+AC_SUBST([ETAGS])
+if test -z "$CSCOPE"; then
+  CSCOPE=cscope
+fi
+AC_SUBST([CSCOPE])
+
 AC_REQUIRE([AM_SILENT_RULES])dnl
 AC_REQUIRE([AM_SILENT_RULES])dnl
 dnl The testsuite driver may need to know about EXEEXT, so add the
 dnl The testsuite driver may need to know about EXEEXT, so add the
 dnl 'am__EXEEXT' conditional if _AM_COMPILER_EXEEXT was seen.  This
 dnl 'am__EXEEXT' conditional if _AM_COMPILER_EXEEXT was seen.  This
@@ -899,7 +977,7 @@ END
 Aborting the configuration process, to ensure you take notice of the issue.
 Aborting the configuration process, to ensure you take notice of the issue.
 
 
 You can download and install GNU coreutils to get an 'rm' implementation
 You can download and install GNU coreutils to get an 'rm' implementation
-that behaves properly: <http://www.gnu.org/software/coreutils/>.
+that behaves properly: <https://www.gnu.org/software/coreutils/>.
 
 
 If you want to complete the configuration process using your problematic
 If you want to complete the configuration process using your problematic
 'rm' anyway, export the environment variable ACCEPT_INFERIOR_RM_PROGRAM
 'rm' anyway, export the environment variable ACCEPT_INFERIOR_RM_PROGRAM
@@ -941,7 +1019,7 @@ for _am_header in $config_headers :; do
 done
 done
 echo "timestamp for $_am_arg" >`AS_DIRNAME(["$_am_arg"])`/stamp-h[]$_am_stamp_count])
 echo "timestamp for $_am_arg" >`AS_DIRNAME(["$_am_arg"])`/stamp-h[]$_am_stamp_count])
 
 
-# Copyright (C) 2001-2017 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -962,7 +1040,7 @@ if test x"${install_sh+set}" != xset; then
 fi
 fi
 AC_SUBST([install_sh])])
 AC_SUBST([install_sh])])
 
 
-# Copyright (C) 2003-2017 Free Software Foundation, Inc.
+# Copyright (C) 2003-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -983,7 +1061,7 @@ AC_SUBST([am__leading_dot])])
 
 
 # Check to see how 'make' treats includes.	            -*- Autoconf -*-
 # Check to see how 'make' treats includes.	            -*- Autoconf -*-
 
 
-# Copyright (C) 2001-2017 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -991,49 +1069,42 @@ AC_SUBST([am__leading_dot])])
 
 
 # AM_MAKE_INCLUDE()
 # AM_MAKE_INCLUDE()
 # -----------------
 # -----------------
-# Check to see how make treats includes.
+# Check whether make has an 'include' directive that can support all
+# the idioms we need for our automatic dependency tracking code.
 AC_DEFUN([AM_MAKE_INCLUDE],
 AC_DEFUN([AM_MAKE_INCLUDE],
-[am_make=${MAKE-make}
-cat > confinc << 'END'
+[AC_MSG_CHECKING([whether ${MAKE-make} supports the include directive])
+cat > confinc.mk << 'END'
 am__doit:
 am__doit:
-	@echo this is the am__doit target
+	@echo this is the am__doit target >confinc.out
 .PHONY: am__doit
 .PHONY: am__doit
 END
 END
-# If we don't find an include directive, just comment out the code.
-AC_MSG_CHECKING([for style of include used by $am_make])
 am__include="#"
 am__include="#"
 am__quote=
 am__quote=
-_am_result=none
-# First try GNU make style include.
-echo "include confinc" > confmf
-# Ignore all kinds of additional output from 'make'.
-case `$am_make -s -f confmf 2> /dev/null` in #(
-*the\ am__doit\ target*)
-  am__include=include
-  am__quote=
-  _am_result=GNU
-  ;;
-esac
-# Now try BSD make style include.
-if test "$am__include" = "#"; then
-   echo '.include "confinc"' > confmf
-   case `$am_make -s -f confmf 2> /dev/null` in #(
-   *the\ am__doit\ target*)
-     am__include=.include
-     am__quote="\""
-     _am_result=BSD
-     ;;
-   esac
-fi
-AC_SUBST([am__include])
-AC_SUBST([am__quote])
-AC_MSG_RESULT([$_am_result])
-rm -f confinc confmf
-])
+# BSD make does it like this.
+echo '.include "confinc.mk" # ignored' > confmf.BSD
+# Other make implementations (GNU, Solaris 10, AIX) do it like this.
+echo 'include confinc.mk # ignored' > confmf.GNU
+_am_result=no
+for s in GNU BSD; do
+  AM_RUN_LOG([${MAKE-make} -f confmf.$s && cat confinc.out])
+  AS_CASE([$?:`cat confinc.out 2>/dev/null`],
+      ['0:this is the am__doit target'],
+      [AS_CASE([$s],
+          [BSD], [am__include='.include' am__quote='"'],
+          [am__include='include' am__quote=''])])
+  if test "$am__include" != "#"; then
+    _am_result="yes ($s style)"
+    break
+  fi
+done
+rm -f confinc.* confmf.*
+AC_MSG_RESULT([${_am_result}])
+AC_SUBST([am__include])])
+AC_SUBST([am__quote])])
 
 
 # Fake the existence of programs that GNU maintainers use.  -*- Autoconf -*-
 # Fake the existence of programs that GNU maintainers use.  -*- Autoconf -*-
 
 
-# Copyright (C) 1997-2017 Free Software Foundation, Inc.
+# Copyright (C) 1997-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -1054,12 +1125,7 @@ AC_DEFUN([AM_MISSING_HAS_RUN],
 [AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl
 [AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl
 AC_REQUIRE_AUX_FILE([missing])dnl
 AC_REQUIRE_AUX_FILE([missing])dnl
 if test x"${MISSING+set}" != xset; then
 if test x"${MISSING+set}" != xset; then
-  case $am_aux_dir in
-  *\ * | *\	*)
-    MISSING="\${SHELL} \"$am_aux_dir/missing\"" ;;
-  *)
-    MISSING="\${SHELL} $am_aux_dir/missing" ;;
-  esac
+  MISSING="\${SHELL} '$am_aux_dir/missing'"
 fi
 fi
 # Use eval to expand $SHELL
 # Use eval to expand $SHELL
 if eval "$MISSING --is-lightweight"; then
 if eval "$MISSING --is-lightweight"; then
@@ -1072,7 +1138,7 @@ fi
 
 
 # Helper functions for option handling.                     -*- Autoconf -*-
 # Helper functions for option handling.                     -*- Autoconf -*-
 
 
-# Copyright (C) 2001-2017 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -1101,7 +1167,7 @@ AC_DEFUN([_AM_SET_OPTIONS],
 AC_DEFUN([_AM_IF_OPTION],
 AC_DEFUN([_AM_IF_OPTION],
 [m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])])
 [m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])])
 
 
-# Copyright (C) 1999-2017 Free Software Foundation, Inc.
+# Copyright (C) 1999-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -1148,7 +1214,7 @@ AC_LANG_POP([C])])
 # For backward compatibility.
 # For backward compatibility.
 AC_DEFUN_ONCE([AM_PROG_CC_C_O], [AC_REQUIRE([AC_PROG_CC])])
 AC_DEFUN_ONCE([AM_PROG_CC_C_O], [AC_REQUIRE([AC_PROG_CC])])
 
 
-# Copyright (C) 2001-2017 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -1167,7 +1233,7 @@ AC_DEFUN([AM_RUN_LOG],
 
 
 # Check to make sure that the build environment is sane.    -*- Autoconf -*-
 # Check to make sure that the build environment is sane.    -*- Autoconf -*-
 
 
-# Copyright (C) 1996-2017 Free Software Foundation, Inc.
+# Copyright (C) 1996-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -1248,7 +1314,7 @@ AC_CONFIG_COMMANDS_PRE(
 rm -f conftest.file
 rm -f conftest.file
 ])
 ])
 
 
-# Copyright (C) 2009-2017 Free Software Foundation, Inc.
+# Copyright (C) 2009-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -1308,7 +1374,7 @@ AC_SUBST([AM_BACKSLASH])dnl
 _AM_SUBST_NOTMAKE([AM_BACKSLASH])dnl
 _AM_SUBST_NOTMAKE([AM_BACKSLASH])dnl
 ])
 ])
 
 
-# Copyright (C) 2001-2017 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -1336,7 +1402,7 @@ fi
 INSTALL_STRIP_PROGRAM="\$(install_sh) -c -s"
 INSTALL_STRIP_PROGRAM="\$(install_sh) -c -s"
 AC_SUBST([INSTALL_STRIP_PROGRAM])])
 AC_SUBST([INSTALL_STRIP_PROGRAM])])
 
 
-# Copyright (C) 2006-2017 Free Software Foundation, Inc.
+# Copyright (C) 2006-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,
@@ -1355,7 +1421,7 @@ AC_DEFUN([AM_SUBST_NOTMAKE], [_AM_SUBST_NOTMAKE($@)])
 
 
 # Check how to create a tarball.                            -*- Autoconf -*-
 # Check how to create a tarball.                            -*- Autoconf -*-
 
 
-# Copyright (C) 2004-2017 Free Software Foundation, Inc.
+# Copyright (C) 2004-2021 Free Software Foundation, Inc.
 #
 #
 # This file is free software; the Free Software Foundation
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # gives unlimited permission to copy and/or distribute it,

+ 10 - 9
regex.mod/pcre/ar-lib

@@ -2,9 +2,9 @@
 # Wrapper for Microsoft lib.exe
 # Wrapper for Microsoft lib.exe
 
 
 me=ar-lib
 me=ar-lib
-scriptversion=2012-03-01.08; # UTC
+scriptversion=2019-07-04.01; # UTC
 
 
-# Copyright (C) 2010-2017 Free Software Foundation, Inc.
+# Copyright (C) 2010-2021 Free Software Foundation, Inc.
 # Written by Peter Rosin <[email protected]>.
 # Written by Peter Rosin <[email protected]>.
 #
 #
 # This program is free software; you can redistribute it and/or modify
 # This program is free software; you can redistribute it and/or modify
@@ -18,7 +18,7 @@ scriptversion=2012-03-01.08; # UTC
 # GNU General Public License for more details.
 # GNU General Public License for more details.
 #
 #
 # You should have received a copy of the GNU General Public License
 # You should have received a copy of the GNU General Public License
-# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
 
 
 # As a special exception to the GNU General Public License, if you
 # As a special exception to the GNU General Public License, if you
 # distribute this file as part of a program that contains a
 # distribute this file as part of a program that contains a
@@ -53,7 +53,7 @@ func_file_conv ()
 	  MINGW*)
 	  MINGW*)
 	    file_conv=mingw
 	    file_conv=mingw
 	    ;;
 	    ;;
-	  CYGWIN*)
+	  CYGWIN* | MSYS*)
 	    file_conv=cygwin
 	    file_conv=cygwin
 	    ;;
 	    ;;
 	  *)
 	  *)
@@ -65,7 +65,7 @@ func_file_conv ()
 	mingw)
 	mingw)
 	  file=`cmd //C echo "$file " | sed -e 's/"\(.*\) " *$/\1/'`
 	  file=`cmd //C echo "$file " | sed -e 's/"\(.*\) " *$/\1/'`
 	  ;;
 	  ;;
-	cygwin)
+	cygwin | msys)
 	  file=`cygpath -m "$file" || echo "$file"`
 	  file=`cygpath -m "$file" || echo "$file"`
 	  ;;
 	  ;;
 	wine)
 	wine)
@@ -224,10 +224,11 @@ elif test -n "$extract"; then
       esac
       esac
     done
     done
   else
   else
-    $AR -NOLOGO -LIST "$archive" | sed -e 's/\\/\\\\/g' | while read member
-    do
-      $AR -NOLOGO -EXTRACT:"$member" "$archive" || exit $?
-    done
+    $AR -NOLOGO -LIST "$archive" | tr -d '\r' | sed -e 's/\\/\\\\/g' \
+      | while read member
+        do
+          $AR -NOLOGO -EXTRACT:"$member" "$archive" || exit $?
+        done
   fi
   fi
 
 
 elif test -n "$quick$replace"; then
 elif test -n "$quick$replace"; then

+ 15 - 0
regex.mod/pcre/cmake/pcre2-config-version.cmake.in

@@ -0,0 +1,15 @@
+set(PACKAGE_VERSION_MAJOR @PCRE2_MAJOR@)
+set(PACKAGE_VERSION_MINOR @PCRE2_MINOR@)
+set(PACKAGE_VERSION_PATCH 0)
+set(PACKAGE_VERSION @PCRE2_MAJOR@.@[email protected])
+
+# Check whether the requested PACKAGE_FIND_VERSION is compatible
+if(PACKAGE_VERSION VERSION_LESS PACKAGE_FIND_VERSION OR
+   PACKAGE_VERSION_MAJOR GREATER PACKAGE_FIND_VERSION_MAJOR)
+  set(PACKAGE_VERSION_COMPATIBLE FALSE)
+else()
+  set(PACKAGE_VERSION_COMPATIBLE TRUE)
+  if(PACKAGE_VERSION VERSION_EQUAL PACKAGE_FIND_VERSION)
+    set(PACKAGE_VERSION_EXACT TRUE)
+  endif()
+endif()

+ 145 - 0
regex.mod/pcre/cmake/pcre2-config.cmake.in

@@ -0,0 +1,145 @@
+# pcre2-config.cmake
+# ----------------
+#
+# Finds the PCRE2 library, specify the starting search path in PCRE2_ROOT.
+#
+# Static vs. shared
+# -----------------
+# To make use of the static library instead of the shared one, one needs
+# to set the variable PCRE2_USE_STATIC_LIBS to ON before calling find_package.
+# Example:
+#   set(PCRE2_USE_STATIC_LIBS ON)
+#   find_package(PCRE2 CONFIG COMPONENTS 8BIT)
+#
+# This will define the following variables:
+#
+#   PCRE2_FOUND   - True if the system has the PCRE2 library.
+#   PCRE2_VERSION - The version of the PCRE2 library which was found.
+#
+# and the following imported targets:
+#
+#   PCRE2::8BIT  - The 8 bit PCRE2 library.
+#   PCRE2::16BIT - The 16 bit PCRE2 library.
+#   PCRE2::32BIT - The 32 bit PCRE2 library.
+#   PCRE2::POSIX - The POSIX PCRE2 library.
+
+set(PCRE2_NON_STANDARD_LIB_PREFIX @NON_STANDARD_LIB_PREFIX@)
+set(PCRE2_NON_STANDARD_LIB_SUFFIX @NON_STANDARD_LIB_SUFFIX@)
+set(PCRE2_8BIT_NAME pcre2-8)
+set(PCRE2_16BIT_NAME pcre2-16)
+set(PCRE2_32BIT_NAME pcre2-32)
+set(PCRE2_POSIX_NAME pcre2-posix)
+find_path(PCRE2_INCLUDE_DIR NAMES pcre2.h DOC "PCRE2 include directory")
+if (PCRE2_USE_STATIC_LIBS)
+  if (MSVC)
+    set(PCRE2_8BIT_NAME pcre2-8-static)
+    set(PCRE2_16BIT_NAME pcre2-16-static)
+    set(PCRE2_32BIT_NAME pcre2-32-static)
+    set(PCRE2_POSIX_NAME pcre2-posix-static)
+  endif ()
+
+  set(PCRE2_PREFIX ${CMAKE_STATIC_LIBRARY_PREFIX})
+  set(PCRE2_SUFFIX ${CMAKE_STATIC_LIBRARY_SUFFIX})
+else ()
+  set(PCRE2_PREFIX ${CMAKE_SHARED_LIBRARY_PREFIX})
+  if (MINGW AND PCRE2_NON_STANDARD_LIB_PREFIX)
+    set(PCRE2_PREFIX "")
+  endif ()
+
+  set(PCRE2_SUFFIX ${CMAKE_SHARED_LIBRARY_SUFFIX})
+  if (MINGW AND PCRE2_NON_STANDARD_LIB_SUFFIX)
+    set(PCRE2_SUFFIX "-0.dll")
+  endif ()
+endif ()
+find_library(PCRE2_8BIT_LIBRARY NAMES ${PCRE2_PREFIX}${PCRE2_8BIT_NAME}${PCRE2_SUFFIX} ${PCRE2_PREFIX}${PCRE2_8BIT_NAME}d${PCRE2_SUFFIX} DOC "8 bit PCRE2 library")
+find_library(PCRE2_16BIT_LIBRARY NAMES ${PCRE2_PREFIX}${PCRE2_16BIT_NAME}${PCRE2_SUFFIX} ${PCRE2_PREFIX}${PCRE2_8BIT_NAME}d${PCRE2_SUFFIX} DOC "16 bit PCRE2 library")
+find_library(PCRE2_32BIT_LIBRARY NAMES ${PCRE2_PREFIX}${PCRE2_32BIT_NAME}${PCRE2_SUFFIX} ${PCRE2_PREFIX}${PCRE2_8BIT_NAME}d${PCRE2_SUFFIX} DOC "32 bit PCRE2 library")
+find_library(PCRE2_POSIX_LIBRARY NAMES ${PCRE2_PREFIX}${PCRE2_POSIX_NAME}${PCRE2_SUFFIX} ${PCRE2_PREFIX}${PCRE2_8BIT_NAME}d${PCRE2_SUFFIX} DOC "8 bit POSIX PCRE2 library")
+unset(PCRE2_NON_STANDARD_LIB_PREFIX)
+unset(PCRE2_NON_STANDARD_LIB_SUFFIX)
+unset(PCRE2_8BIT_NAME)
+unset(PCRE2_16BIT_NAME)
+unset(PCRE2_32BIT_NAME)
+unset(PCRE2_POSIX_NAME)
+
+# Set version
+if (PCRE2_INCLUDE_DIR)
+  set(PCRE2_VERSION "@PCRE2_MAJOR@.@[email protected]")
+endif ()
+
+# Which components have been found.
+if (PCRE2_8BIT_LIBRARY)
+  set(PCRE2_8BIT_FOUND TRUE)
+endif ()
+if (PCRE2_16BIT_LIBRARY)
+  set(PCRE2_16BIT_FOUND TRUE)
+endif ()
+if (PCRE2_32BIT_LIBRARY)
+  set(PCRE2_32BIT_FOUND TRUE)
+endif ()
+if (PCRE2_POSIX_LIBRARY)
+  set(PCRE2_POSIX_FOUND TRUE)
+endif ()
+
+# Check if at least one component has been specified.
+list(LENGTH PCRE2_FIND_COMPONENTS PCRE2_NCOMPONENTS)
+if (PCRE2_NCOMPONENTS LESS 1)
+  message(FATAL_ERROR "No components have been specified. This is not allowed. Please, specify at least one component.")
+endif ()
+unset(PCRE2_NCOMPONENTS)
+
+# When POSIX component has been specified make sure that also 8BIT component is specified.
+set(PCRE2_8BIT_COMPONENT FALSE)
+set(PCRE2_POSIX_COMPONENT FALSE)
+foreach(component ${PCRE2_FIND_COMPONENTS})
+  if (component STREQUAL "8BIT")
+    set(PCRE2_8BIT_COMPONENT TRUE)
+  elseif (component STREQUAL "POSIX")
+    set(PCRE2_POSIX_COMPONENT TRUE)
+  endif ()
+endforeach()
+
+if (PCRE2_POSIX_COMPONENT AND NOT PCRE2_8BIT_COMPONENT)
+  message(FATAL_ERROR "The component POSIX is specified while the 8BIT one is not. This is not allowed. Please, also specify the 8BIT component.")
+endif()
+unset(PCRE2_8BIT_COMPONENT)
+unset(PCRE2_POSIX_COMPONENT)
+
+include(FindPackageHandleStandardArgs)
+set(${CMAKE_FIND_PACKAGE_NAME}_CONFIG "${CMAKE_CURRENT_LIST_FILE}")
+find_package_handle_standard_args(PCRE2
+  FOUND_VAR PCRE2_FOUND
+  REQUIRED_VARS PCRE2_INCLUDE_DIR
+  HANDLE_COMPONENTS
+  VERSION_VAR PCRE2_VERSION
+  CONFIG_MODE
+)
+
+set(PCRE2_LIBRARIES)
+if (PCRE2_FOUND)
+  foreach(component ${PCRE2_FIND_COMPONENTS})
+    if (PCRE2_USE_STATIC_LIBS)
+      add_library(PCRE2::${component} STATIC IMPORTED)
+      target_compile_definitions(PCRE2::${component} INTERFACE PCRE2_STATIC)
+    else ()
+      add_library(PCRE2::${component} SHARED IMPORTED)
+    endif ()
+    set_target_properties(PCRE2::${component} PROPERTIES
+      IMPORTED_LOCATION "${PCRE2_${component}_LIBRARY}"
+      INTERFACE_INCLUDE_DIRECTORIES "${PCRE2_INCLUDE_DIR}"
+    )
+    if (component STREQUAL "POSIX")
+      set_target_properties(PCRE2::${component} PROPERTIES
+        INTERFACE_LINK_LIBRARIES "PCRE2::8BIT"
+        LINK_LIBRARIES "PCRE2::8BIT"
+      )
+    endif ()
+
+    set(PCRE2_LIBRARIES ${PCRE2_LIBRARIES} ${PCRE2_${component}_LIBRARY})
+    mark_as_advanced(PCRE2_${component}_LIBRARY)
+  endforeach()
+endif ()
+
+mark_as_advanced(
+  PCRE2_INCLUDE_DIR
+)

+ 6 - 6
regex.mod/pcre/compile

@@ -1,9 +1,9 @@
 #! /bin/sh
 #! /bin/sh
 # Wrapper for compilers which do not understand '-c -o'.
 # Wrapper for compilers which do not understand '-c -o'.
 
 
-scriptversion=2016-01-11.22; # UTC
+scriptversion=2018-03-07.03; # UTC
 
 
-# Copyright (C) 1999-2017 Free Software Foundation, Inc.
+# Copyright (C) 1999-2021 Free Software Foundation, Inc.
 # Written by Tom Tromey <[email protected]>.
 # Written by Tom Tromey <[email protected]>.
 #
 #
 # This program is free software; you can redistribute it and/or modify
 # This program is free software; you can redistribute it and/or modify
@@ -17,7 +17,7 @@ scriptversion=2016-01-11.22; # UTC
 # GNU General Public License for more details.
 # GNU General Public License for more details.
 #
 #
 # You should have received a copy of the GNU General Public License
 # You should have received a copy of the GNU General Public License
-# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
 
 
 # As a special exception to the GNU General Public License, if you
 # As a special exception to the GNU General Public License, if you
 # distribute this file as part of a program that contains a
 # distribute this file as part of a program that contains a
@@ -53,7 +53,7 @@ func_file_conv ()
 	  MINGW*)
 	  MINGW*)
 	    file_conv=mingw
 	    file_conv=mingw
 	    ;;
 	    ;;
-	  CYGWIN*)
+	  CYGWIN* | MSYS*)
 	    file_conv=cygwin
 	    file_conv=cygwin
 	    ;;
 	    ;;
 	  *)
 	  *)
@@ -67,7 +67,7 @@ func_file_conv ()
 	mingw/*)
 	mingw/*)
 	  file=`cmd //C echo "$file " | sed -e 's/"\(.*\) " *$/\1/'`
 	  file=`cmd //C echo "$file " | sed -e 's/"\(.*\) " *$/\1/'`
 	  ;;
 	  ;;
-	cygwin/*)
+	cygwin/* | msys/*)
 	  file=`cygpath -m "$file" || echo "$file"`
 	  file=`cygpath -m "$file" || echo "$file"`
 	  ;;
 	  ;;
 	wine/*)
 	wine/*)
@@ -340,7 +340,7 @@ exit $ret
 # Local Variables:
 # Local Variables:
 # mode: shell-script
 # mode: shell-script
 # sh-indentation: 2
 # sh-indentation: 2
-# eval: (add-hook 'write-file-hooks 'time-stamp)
+# eval: (add-hook 'before-save-hook 'time-stamp)
 # time-stamp-start: "scriptversion="
 # time-stamp-start: "scriptversion="
 # time-stamp-format: "%:y-%02m-%02d.%02H"
 # time-stamp-format: "%:y-%02m-%02d.%02H"
 # time-stamp-time-zone: "UTC0"
 # time-stamp-time-zone: "UTC0"

+ 7 - 4
regex.mod/pcre/config-cmake.h.in

@@ -1,8 +1,7 @@
 /* config.h for CMake builds */
 /* config.h for CMake builds */
 
 
+#cmakedefine HAVE_ATTRIBUTE_UNINITIALIZED 1
 #cmakedefine HAVE_DIRENT_H 1
 #cmakedefine HAVE_DIRENT_H 1
-#cmakedefine HAVE_INTTYPES_H 1    
-#cmakedefine HAVE_STDINT_H 1                                                   
 #cmakedefine HAVE_STRERROR 1
 #cmakedefine HAVE_STRERROR 1
 #cmakedefine HAVE_SYS_STAT_H 1
 #cmakedefine HAVE_SYS_STAT_H 1
 #cmakedefine HAVE_SYS_TYPES_H 1
 #cmakedefine HAVE_SYS_TYPES_H 1
@@ -10,14 +9,16 @@
 #cmakedefine HAVE_WINDOWS_H 1
 #cmakedefine HAVE_WINDOWS_H 1
 
 
 #cmakedefine HAVE_BCOPY 1
 #cmakedefine HAVE_BCOPY 1
+#cmakedefine HAVE_MEMFD_CREATE 1
 #cmakedefine HAVE_MEMMOVE 1
 #cmakedefine HAVE_MEMMOVE 1
-
-#cmakedefine PCRE2_STATIC 1
+#cmakedefine HAVE_SECURE_GETENV 1
+#cmakedefine HAVE_STRERROR 1
 
 
 #cmakedefine SUPPORT_PCRE2_8 1
 #cmakedefine SUPPORT_PCRE2_8 1
 #cmakedefine SUPPORT_PCRE2_16 1
 #cmakedefine SUPPORT_PCRE2_16 1
 #cmakedefine SUPPORT_PCRE2_32 1
 #cmakedefine SUPPORT_PCRE2_32 1
 #cmakedefine PCRE2_DEBUG 1
 #cmakedefine PCRE2_DEBUG 1
+#cmakedefine DISABLE_PERCENT_ZT 1
 
 
 #cmakedefine SUPPORT_LIBBZ2 1
 #cmakedefine SUPPORT_LIBBZ2 1
 #cmakedefine SUPPORT_LIBEDIT 1
 #cmakedefine SUPPORT_LIBEDIT 1
@@ -27,6 +28,8 @@
 #cmakedefine SUPPORT_JIT 1
 #cmakedefine SUPPORT_JIT 1
 #cmakedefine SLJIT_PROT_EXECUTABLE_ALLOCATOR 1
 #cmakedefine SLJIT_PROT_EXECUTABLE_ALLOCATOR 1
 #cmakedefine SUPPORT_PCRE2GREP_JIT 1
 #cmakedefine SUPPORT_PCRE2GREP_JIT 1
+#cmakedefine SUPPORT_PCRE2GREP_CALLOUT 1
+#cmakedefine SUPPORT_PCRE2GREP_CALLOUT_FORK 1
 #cmakedefine SUPPORT_UNICODE 1
 #cmakedefine SUPPORT_UNICODE 1
 #cmakedefine SUPPORT_VALGRIND 1
 #cmakedefine SUPPORT_VALGRIND 1
 
 

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 432 - 375
regex.mod/pcre/config.guess


Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 1008 - 1070
regex.mod/pcre/config.sub


Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 217 - 336
regex.mod/pcre/configure


+ 134 - 41
regex.mod/pcre/configure.ac

@@ -9,21 +9,21 @@ dnl The PCRE2_PRERELEASE feature is for identifying release candidates. It might
 dnl be defined as -RC2, for example. For real releases, it should be empty.
 dnl be defined as -RC2, for example. For real releases, it should be empty.
 
 
 m4_define(pcre2_major, [10])
 m4_define(pcre2_major, [10])
-m4_define(pcre2_minor, [31])
+m4_define(pcre2_minor, [39])
 m4_define(pcre2_prerelease, [])
 m4_define(pcre2_prerelease, [])
-m4_define(pcre2_date, [2018-02-12])
+m4_define(pcre2_date, [2021-10-29])
+
+# Libtool shared library interface versions (current:revision:age)
+m4_define(libpcre2_8_version,     [10:4:10])
+m4_define(libpcre2_16_version,    [10:4:10])
+m4_define(libpcre2_32_version,    [10:4:10])
+m4_define(libpcre2_posix_version, [3:1:0])
 
 
 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # 50 lines of this file. Please update that if the variables above are moved.
 # 50 lines of this file. Please update that if the variables above are moved.
 
 
-# Libtool shared library interface versions (current:revision:age)
-m4_define(libpcre2_8_version,     [7:0:7])
-m4_define(libpcre2_16_version,    [7:0:7])
-m4_define(libpcre2_32_version,    [7:0:7])
-m4_define(libpcre2_posix_version, [2:0:0])
-
-AC_PREREQ(2.57)
-AC_INIT(PCRE2, pcre2_major.pcre2_minor[]pcre2_prerelease, , pcre2)
+AC_PREREQ([2.60])
+AC_INIT([PCRE2],pcre2_major.pcre2_minor[]pcre2_prerelease,[],[pcre2])
 AC_CONFIG_SRCDIR([src/pcre2.h.in])
 AC_CONFIG_SRCDIR([src/pcre2.h.in])
 AM_INIT_AUTOMAKE([dist-bzip2 dist-zip])
 AM_INIT_AUTOMAKE([dist-bzip2 dist-zip])
 m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
 m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
@@ -64,14 +64,31 @@ m4_ifdef([AM_PROG_AR], [AM_PROG_AR])
 AC_TYPE_INT64_T
 AC_TYPE_INT64_T
 
 
 AC_PROG_INSTALL
 AC_PROG_INSTALL
-AC_LIBTOOL_WIN32_DLL
-LT_INIT
+LT_INIT([win32-dll])
 AC_PROG_LN_S
 AC_PROG_LN_S
 
 
 # Check for GCC visibility feature
 # Check for GCC visibility feature
 
 
 PCRE2_VISIBILITY
 PCRE2_VISIBILITY
 
 
+# Check for Clang __attribute__((uninitialized)) feature
+
+AC_MSG_CHECKING([for __attribute__((uninitialized))])
+AC_LANG_PUSH([C])
+tmp_CFLAGS=$CFLAGS
+CFLAGS="$CFLAGS -Werror"
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM(,
+                   [[char buf[128] __attribute__((uninitialized));(void)buf]])],
+                   [pcre2_cc_cv_attribute_uninitialized=yes],
+                   [pcre2_cc_cv_attribute_uninitialized=no])
+AC_MSG_RESULT([$pcre2_cc_cv_attribute_uninitialized])
+if test "$pcre2_cc_cv_attribute_uninitialized" = yes; then
+  AC_DEFINE([HAVE_ATTRIBUTE_UNINITIALIZED], 1, [Define this if your compiler
+             supports __attribute__((uninitialized))])
+fi
+CFLAGS=$tmp_CFLAGS
+AC_LANG_POP([C])
+
 # Versioning
 # Versioning
 
 
 PCRE2_MAJOR="pcre2_major"
 PCRE2_MAJOR="pcre2_major"
@@ -131,7 +148,7 @@ AC_ARG_ENABLE(pcre2-32,
               , enable_pcre2_32=unset)
               , enable_pcre2_32=unset)
 AC_SUBST(enable_pcre2_32)
 AC_SUBST(enable_pcre2_32)
 
 
-# Handle --dnable-debug (disabled by default)
+# Handle --enable-debug (disabled by default)
 AC_ARG_ENABLE(debug,
 AC_ARG_ENABLE(debug,
               AS_HELP_STRING([--enable-debug],
               AS_HELP_STRING([--enable-debug],
                              [enable debugging code]),
                              [enable debugging code]),
@@ -143,11 +160,33 @@ AC_ARG_ENABLE(jit,
                              [enable Just-In-Time compiling support]),
                              [enable Just-In-Time compiling support]),
               , enable_jit=no)
               , enable_jit=no)
 
 
-# Handle --enable-jit-sealloc (disabled by default)
-AC_ARG_ENABLE(jit-sealloc,
-              AS_HELP_STRING([--enable-jit-sealloc],
-                             [enable SELinux compatible execmem allocator in JIT]),
-              , enable_jit_sealloc=no)
+# This code enables JIT if the hardware supports it.
+if test "$enable_jit" = "auto"; then
+  AC_LANG(C)
+  SAVE_CPPFLAGS=$CPPFLAGS
+  CPPFLAGS=-I$srcdir
+  AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
+  #define SLJIT_CONFIG_AUTO 1
+  #include "src/sljit/sljitConfigInternal.h"
+  #if (defined SLJIT_CONFIG_UNSUPPORTED && SLJIT_CONFIG_UNSUPPORTED)
+  #error unsupported
+  #endif]])], enable_jit=yes, enable_jit=no)
+  CPPFLAGS=$SAVE_CPPFLAGS
+  echo checking for JIT support on this hardware... $enable_jit
+fi
+
+# Handle --enable-jit-sealloc (disabled by default and only experimental)
+case $host_os in
+  linux* | netbsd*)
+    AC_ARG_ENABLE(jit-sealloc,
+      AS_HELP_STRING([--enable-jit-sealloc],
+        [enable SELinux compatible execmem allocator in JIT (experimental)]),
+        ,enable_jit_sealloc=no)
+    ;;
+  *)
+    enable_jit_sealloc=unsupported
+    ;;
+esac
 
 
 # Handle --disable-pcre2grep-jit (enabled by default)
 # Handle --disable-pcre2grep-jit (enabled by default)
 AC_ARG_ENABLE(pcre2grep-jit,
 AC_ARG_ENABLE(pcre2grep-jit,
@@ -161,6 +200,12 @@ AC_ARG_ENABLE(pcre2grep-callout,
                              [disable callout script support in pcre2grep]),
                              [disable callout script support in pcre2grep]),
               , enable_pcre2grep_callout=yes)
               , enable_pcre2grep_callout=yes)
 
 
+# Handle --disable-pcre2grep-callout-fork (enabled by default)
+AC_ARG_ENABLE(pcre2grep-callout-fork,
+              AS_HELP_STRING([--disable-pcre2grep-callout-fork],
+                             [disable callout script fork support in pcre2grep]),
+              , enable_pcre2grep_callout_fork=yes)
+
 # Handle --enable-rebuild-chartables
 # Handle --enable-rebuild-chartables
 AC_ARG_ENABLE(rebuild-chartables,
 AC_ARG_ENABLE(rebuild-chartables,
               AS_HELP_STRING([--enable-rebuild-chartables],
               AS_HELP_STRING([--enable-rebuild-chartables],
@@ -276,7 +321,7 @@ AC_ARG_WITH(parens-nest-limit,
 # Handle --with-heap-limit
 # Handle --with-heap-limit
 AC_ARG_WITH(heap-limit,
 AC_ARG_WITH(heap-limit,
             AS_HELP_STRING([--with-heap-limit=N],
             AS_HELP_STRING([--with-heap-limit=N],
-                           [default limit on heap memory (kilobytes, default=20000000)]),
+                           [default limit on heap memory (kibibytes, default=20000000)]),
             , with_heap_limit=20000000)
             , with_heap_limit=20000000)
 
 
 # Handle --with-match-limit=N
 # Handle --with-match-limit=N
@@ -331,6 +376,12 @@ AC_ARG_ENABLE(stack-for-recursion,,
 #                              [don't use stack recursion when matching]),
 #                              [don't use stack recursion when matching]),
 #               , enable_stack_for_recursion=yes)
 #               , enable_stack_for_recursion=yes)
 
 
+# Handle --disable-percent_zt (set as "auto" by default)
+AC_ARG_ENABLE(percent-zt,
+              AS_HELP_STRING([--disable-percent-zt],
+                             [disable the use of z and t formatting modifiers]),
+              , enable_percent_zt=auto)
+
 # Set the default value for pcre2-8
 # Set the default value for pcre2-8
 if test "x$enable_pcre2_8" = "xunset"
 if test "x$enable_pcre2_8" = "xunset"
 then
 then
@@ -372,7 +423,7 @@ case "$enable_newline" in
   anycrlf) ac_pcre2_newline_value=5 ;;
   anycrlf) ac_pcre2_newline_value=5 ;;
   nul)     ac_pcre2_newline_value=6 ;;
   nul)     ac_pcre2_newline_value=6 ;;
   *)
   *)
-  AC_MSG_ERROR([invalid argument \"$enable_newline\" to --enable-newline option])
+  AC_MSG_ERROR([invalid argument "$enable_newline" to --enable-newline option])
   ;;
   ;;
 esac
 esac
 
 
@@ -401,7 +452,7 @@ fi
 case "$with_link_size" in
 case "$with_link_size" in
   2|3|4) ;;
   2|3|4) ;;
   *)
   *)
-  AC_MSG_ERROR([invalid argument \"$with_link_size\" to --with-link-size option])
+  AC_MSG_ERROR([invalid argument "$with_link_size" to --with-link-size option])
   ;;
   ;;
 esac
 esac
 
 
@@ -423,10 +474,10 @@ to set the macro values. In this case, you do not have to set -DHAVE_CONFIG_H,
 but if you do, default values will be taken from config.h for non-boolean
 but if you do, default values will be taken from config.h for non-boolean
 macros that are not defined on the command line.
 macros that are not defined on the command line.
 
 
-Boolean macros such as HAVE_STDLIB_H and SUPPORT_PCRE2_8 should either be defined
-(conventionally to 1) for TRUE, and not defined at all for FALSE. All such
-macros are listed as a commented #undef in config.h.generic. Macros such as
-MATCH_LIMIT, whose actual value is relevant, have defaults defined, but are
+Boolean macros such as HAVE_STDLIB_H and SUPPORT_PCRE2_8 should either be
+defined (conventionally to 1) for TRUE, and not defined at all for FALSE. All
+such macros are listed as a commented #undef in config.h.generic. Macros such
+as MATCH_LIMIT, whose actual value is relevant, have defaults defined, but are
 surrounded by #ifndef/#endif lines so that the value can be overridden by -D.
 surrounded by #ifndef/#endif lines so that the value can be overridden by -D.
 
 
 PCRE2 uses memmove() if HAVE_MEMMOVE is defined; otherwise it uses bcopy() if
 PCRE2 uses memmove() if HAVE_MEMMOVE is defined; otherwise it uses bcopy() if
@@ -434,7 +485,6 @@ HAVE_BCOPY is defined. If your system has neither bcopy() nor memmove(), make
 sure both macros are undefined; an emulation function will then be used. */])
 sure both macros are undefined; an emulation function will then be used. */])
 
 
 # Checks for header files.
 # Checks for header files.
-AC_HEADER_STDC
 AC_CHECK_HEADERS(limits.h sys/types.h sys/stat.h dirent.h)
 AC_CHECK_HEADERS(limits.h sys/types.h sys/stat.h dirent.h)
 AC_CHECK_HEADERS([windows.h], [HAVE_WINDOWS_H=1])
 AC_CHECK_HEADERS([windows.h], [HAVE_WINDOWS_H=1])
 AC_CHECK_HEADERS([sys/wait.h], [HAVE_SYS_WAIT_H=1])
 AC_CHECK_HEADERS([sys/wait.h], [HAVE_SYS_WAIT_H=1])
@@ -462,7 +512,7 @@ AC_TYPE_SIZE_T
 
 
 # Checks for library functions.
 # Checks for library functions.
 
 
-AC_CHECK_FUNCS(bcopy memmove strerror mkostemp secure_getenv)
+AC_CHECK_FUNCS(bcopy memfd_create memmove mkostemp realpath secure_getenv strerror)
 
 
 # Check for the availability of libz (aka zlib)
 # Check for the availability of libz (aka zlib)
 
 
@@ -575,6 +625,14 @@ if test "$enable_debug" = "yes"; then
     Define to any value to include debugging code.])
     Define to any value to include debugging code.])
 fi
 fi
 
 
+if test "$enable_percent_zt" = "no"; then
+  AC_DEFINE([DISABLE_PERCENT_ZT], [], [
+    Define to any value to disable the use of the z and t modifiers in
+    formatting settings such as %zu or %td (this is rarely needed).])
+else
+  enable_percent_zt=auto
+fi
+
 # Unless running under Windows, JIT support requires pthreads.
 # Unless running under Windows, JIT support requires pthreads.
 
 
 if test "$enable_jit" = "yes"; then
 if test "$enable_jit" = "yes"; then
@@ -604,13 +662,21 @@ if test "$enable_pcre2grep_jit" = "yes"; then
 fi
 fi
 
 
 if test "$enable_pcre2grep_callout" = "yes"; then
 if test "$enable_pcre2grep_callout" = "yes"; then
-  if test "$HAVE_WINDOWS_H" != "1"; then
-    if test "$HAVE_SYS_WAIT_H" != "1"; then
-      AC_MSG_ERROR([Callout script support needs sys/wait.h.])
+  if test "$enable_pcre2grep_callout_fork" = "yes"; then
+    if test "$HAVE_WINDOWS_H" != "1"; then
+      if test "$HAVE_SYS_WAIT_H" != "1"; then
+        AC_MSG_ERROR([Callout script support needs sys/wait.h.])
+      fi
     fi
     fi
+    AC_DEFINE([SUPPORT_PCRE2GREP_CALLOUT_FORK], [], [
+      Define to any value to enable fork support in pcre2grep callout scripts.
+      This will have no effect unless SUPPORT_PCRE2GREP_CALLOUT is also
+      defined.])
   fi
   fi
   AC_DEFINE([SUPPORT_PCRE2GREP_CALLOUT], [], [
   AC_DEFINE([SUPPORT_PCRE2GREP_CALLOUT], [], [
     Define to any value to enable callout script support in pcre2grep.])
     Define to any value to enable callout script support in pcre2grep.])
+else
+  enable_pcre2grep_callout_fork="no"
 fi
 fi
 
 
 if test "$enable_unicode" = "yes"; then
 if test "$enable_unicode" = "yes"; then
@@ -694,8 +760,8 @@ fi
 AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [
 AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [
   The value of LINK_SIZE determines the number of bytes used to store
   The value of LINK_SIZE determines the number of bytes used to store
   links as offsets within the compiled regex. The default is 2, which
   links as offsets within the compiled regex. The default is 2, which
-  allows for compiled patterns up to 64K long. This covers the vast
-  majority of cases. However, PCRE2 can also be compiled to use 3 or 4
+  allows for compiled patterns up to 65535 code units long. This covers the
+  vast majority of cases. However, PCRE2 can also be compiled to use 3 or 4
   bytes instead. This allows for longer patterns in extreme cases.])
   bytes instead. This allows for longer patterns in extreme cases.])
 
 
 AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
 AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
@@ -706,10 +772,11 @@ AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
 AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [
 AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [
   The value of MATCH_LIMIT determines the default number of times the
   The value of MATCH_LIMIT determines the default number of times the
   pcre2_match() function can record a backtrack position during a single
   pcre2_match() function can record a backtrack position during a single
-  matching attempt. There is a runtime interface for setting a different limit.
-  The limit exists in order to catch runaway regular expressions that take for
-  ever to determine that they do not match. The default is set very large so
-  that it does not accidentally catch legitimate cases.])
+  matching attempt. The value is also used to limit a loop counter in
+  pcre2_dfa_match(). There is a runtime interface for setting a different
+  limit. The limit exists in order to catch runaway regular expressions that
+  take for ever to determine that they do not match. The default is set very
+  large so that it does not accidentally catch legitimate cases.])
 
 
 # --with-match-limit-recursion is an obsolete synonym for --with-match-limit-depth
 # --with-match-limit-recursion is an obsolete synonym for --with-match-limit-depth
 
 
@@ -733,11 +800,15 @@ AC_DEFINE_UNQUOTED([MATCH_LIMIT_DEPTH], [$with_match_limit_depth], [
   the maximum amount of heap memory that is used. The value of
   the maximum amount of heap memory that is used. The value of
   MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it must
   MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it must
   be less than the value of MATCH_LIMIT. The default is to use the same value
   be less than the value of MATCH_LIMIT. The default is to use the same value
-  as MATCH_LIMIT. There is a runtime method for setting a different limit.])
+  as MATCH_LIMIT. There is a runtime method for setting a different limit. In
+  the case of pcre2_dfa_match(), this limit controls the depth of the internal
+  nested function calls that are used for pattern recursions, lookarounds, and
+  atomic groups.])
 
 
 AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
 AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
-  This limits the amount of memory that pcre2_match() may use while matching
-  a pattern. The value is in kilobytes.])
+  This limits the amount of memory that may be used while matching
+  a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
+  not apply to JIT matching. The value is in kibibytes (units of 1024 bytes).])
 
 
 AC_DEFINE([MAX_NAME_SIZE], [32], [
 AC_DEFINE([MAX_NAME_SIZE], [32], [
   This limit is parameterized just in case anybody ever wants to
   This limit is parameterized just in case anybody ever wants to
@@ -817,7 +888,7 @@ AC_SUBST(EXTRA_LIBPCRE2_POSIX_LDFLAGS)
 
 
 # When we run 'make distcheck', use these arguments. Turning off compiler
 # When we run 'make distcheck', use these arguments. Turning off compiler
 # optimization makes it run faster.
 # optimization makes it run faster.
-DISTCHECK_CONFIGURE_FLAGS="CFLAGS='' CXXFLAGS='' --enable-pcre2-16 --enable-pcre2-32 --enable-jit --enable-utf"
+DISTCHECK_CONFIGURE_FLAGS="CFLAGS='' CXXFLAGS='' --enable-pcre2-16 --enable-pcre2-32 --enable-jit"
 AC_SUBST(DISTCHECK_CONFIGURE_FLAGS)
 AC_SUBST(DISTCHECK_CONFIGURE_FLAGS)
 
 
 # Check that, if --enable-pcre2grep-libz or --enable-pcre2grep-libbz2 is
 # Check that, if --enable-pcre2grep-libz or --enable-pcre2grep-libbz2 is
@@ -933,7 +1004,27 @@ fi # enable_coverage
 
 
 AM_CONDITIONAL([WITH_GCOV],[test "x$enable_coverage" = "xyes"])
 AM_CONDITIONAL([WITH_GCOV],[test "x$enable_coverage" = "xyes"])
 
 
+AC_MSG_CHECKING([whether Intel CET is enabled])
+AC_LANG_PUSH([C])
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM(,
+                   [[#ifndef __CET__
+# error CET is not enabled
+#endif]])],
+                   [pcre2_cc_cv_intel_cet_enabled=yes],
+                   [pcre2_cc_cv_intel_cet_enabled=no])
+AC_MSG_RESULT([$pcre2_cc_cv_intel_cet_enabled])
+if test "$pcre2_cc_cv_intel_cet_enabled" = yes; then
+  CET_CFLAGS="-mshstk"
+  AC_SUBST([CET_CFLAGS])
+fi
+AC_LANG_POP([C])
+
+# LIB_POSTFIX is used by CMakeLists.txt for Windows debug builds.
+# Pass empty LIB_POSTFIX to *.pc files and pcre2-config here.
+AC_SUBST(LIB_POSTFIX)
+
 # Produce these files, in addition to config.h.
 # Produce these files, in addition to config.h.
+
 AC_CONFIG_FILES(
 AC_CONFIG_FILES(
 	Makefile
 	Makefile
 	libpcre2-8.pc
 	libpcre2-8.pc
@@ -1000,13 +1091,14 @@ $PACKAGE-$VERSION configuration summary:
     Rebuild char tables ................ : ${enable_rebuild_chartables}
     Rebuild char tables ................ : ${enable_rebuild_chartables}
     Internal link size ................. : ${with_link_size}
     Internal link size ................. : ${with_link_size}
     Nested parentheses limit ........... : ${with_parens_nest_limit}
     Nested parentheses limit ........... : ${with_parens_nest_limit}
-    Heap limit ......................... : ${with_heap_limit} kilobytes
+    Heap limit ......................... : ${with_heap_limit} kibibytes
     Match limit ........................ : ${with_match_limit}
     Match limit ........................ : ${with_match_limit}
     Match depth limit .................. : ${with_match_limit_depth}
     Match depth limit .................. : ${with_match_limit_depth}
     Build shared libs .................. : ${enable_shared}
     Build shared libs .................. : ${enable_shared}
     Build static libs .................. : ${enable_static}
     Build static libs .................. : ${enable_static}
     Use JIT in pcre2grep ............... : ${enable_pcre2grep_jit}
     Use JIT in pcre2grep ............... : ${enable_pcre2grep_jit}
     Enable callouts in pcre2grep ....... : ${enable_pcre2grep_callout}
     Enable callouts in pcre2grep ....... : ${enable_pcre2grep_callout}
+    Enable fork in pcre2grep callouts .. : ${enable_pcre2grep_callout_fork}
     Initial buffer size for pcre2grep .. : ${with_pcre2grep_bufsize}
     Initial buffer size for pcre2grep .. : ${with_pcre2grep_bufsize}
     Maximum buffer size for pcre2grep .. : ${with_pcre2grep_max_bufsize}
     Maximum buffer size for pcre2grep .. : ${with_pcre2grep_max_bufsize}
     Link pcre2grep with libz ........... : ${enable_pcre2grep_libz}
     Link pcre2grep with libz ........... : ${enable_pcre2grep_libz}
@@ -1016,6 +1108,7 @@ $PACKAGE-$VERSION configuration summary:
     Valgrind support ................... : ${enable_valgrind}
     Valgrind support ................... : ${enable_valgrind}
     Code coverage ...................... : ${enable_coverage}
     Code coverage ...................... : ${enable_coverage}
     Fuzzer support ..................... : ${enable_fuzz_support}
     Fuzzer support ..................... : ${enable_fuzz_support}
+    Use %zu and %td .................... : ${enable_percent_zt}
 
 
 EOF
 EOF
 
 

+ 4 - 4
regex.mod/pcre/depcomp

@@ -1,9 +1,9 @@
 #! /bin/sh
 #! /bin/sh
 # depcomp - compile a program generating dependencies as side-effects
 # depcomp - compile a program generating dependencies as side-effects
 
 
-scriptversion=2016-01-11.22; # UTC
+scriptversion=2018-03-07.03; # UTC
 
 
-# Copyright (C) 1999-2017 Free Software Foundation, Inc.
+# Copyright (C) 1999-2021 Free Software Foundation, Inc.
 
 
 # This program is free software; you can redistribute it and/or modify
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
 # it under the terms of the GNU General Public License as published by
@@ -16,7 +16,7 @@ scriptversion=2016-01-11.22; # UTC
 # GNU General Public License for more details.
 # GNU General Public License for more details.
 
 
 # You should have received a copy of the GNU General Public License
 # You should have received a copy of the GNU General Public License
-# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
 
 
 # As a special exception to the GNU General Public License, if you
 # As a special exception to the GNU General Public License, if you
 # distribute this file as part of a program that contains a
 # distribute this file as part of a program that contains a
@@ -783,7 +783,7 @@ exit 0
 # Local Variables:
 # Local Variables:
 # mode: shell-script
 # mode: shell-script
 # sh-indentation: 2
 # sh-indentation: 2
-# eval: (add-hook 'write-file-hooks 'time-stamp)
+# eval: (add-hook 'before-save-hook 'time-stamp)
 # time-stamp-start: "scriptversion="
 # time-stamp-start: "scriptversion="
 # time-stamp-format: "%:y-%02m-%02d.%02H"
 # time-stamp-format: "%:y-%02m-%02d.%02H"
 # time-stamp-time-zone: "UTC0"
 # time-stamp-time-zone: "UTC0"

+ 46 - 23
regex.mod/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt

@@ -10,6 +10,7 @@ This document contains the following sections:
   Calling conventions in Windows environments
   Calling conventions in Windows environments
   Comments about Win32 builds
   Comments about Win32 builds
   Building PCRE2 on Windows with CMake
   Building PCRE2 on Windows with CMake
+  Building PCRE2 on Windows with Visual Studio
   Testing with RunTest.bat
   Testing with RunTest.bat
   Building PCRE2 on native z/OS and z/VM
   Building PCRE2 on native z/OS and z/VM
 
 
@@ -39,7 +40,11 @@ GENERIC INSTRUCTIONS FOR THE PCRE2 C LIBRARY
 
 
 The following are generic instructions for building the PCRE2 C library "by
 The following are generic instructions for building the PCRE2 C library "by
 hand". If you are going to use CMake, this section does not apply to you; you
 hand". If you are going to use CMake, this section does not apply to you; you
-can skip ahead to the CMake section.
+can skip ahead to the CMake section. Note that the settings concerned with
+8-bit, 16-bit, and 32-bit code units relate to the type of data string that
+PCRE2 processes. They are NOT referring to the underlying operating system bit
+width. You do not have to do anything special to compile in a 64-bit
+environment, for example.
 
 
  (1) Copy or rename the file src/config.h.generic as src/config.h, and edit the
  (1) Copy or rename the file src/config.h.generic as src/config.h, and edit the
      macro settings that it contains to whatever is appropriate for your
      macro settings that it contains to whatever is appropriate for your
@@ -47,7 +52,7 @@ can skip ahead to the CMake section.
      macro to specify what character(s) you want to be interpreted as line
      macro to specify what character(s) you want to be interpreted as line
      terminators by default.
      terminators by default.
 
 
-     When you compile any of the PCRE2 modules, you must specify
+     When you subsequently compile any of the PCRE2 modules, you must specify
      -DHAVE_CONFIG_H to your compiler so that src/config.h is included in the
      -DHAVE_CONFIG_H to your compiler so that src/config.h is included in the
      sources.
      sources.
 
 
@@ -61,6 +66,11 @@ can skip ahead to the CMake section.
      new release, you are strongly advised to review src/config.h.generic
      new release, you are strongly advised to review src/config.h.generic
      before re-using what you had previously.
      before re-using what you had previously.
 
 
+     Note also that the src/config.h.generic file is created from a config.h
+     that was generated by Autotools, which automatically includes settings of
+     a number of macros that are not actually used by PCRE2 (for example,
+     HAVE_MEMORY_H).
+
  (2) Copy or rename the file src/pcre2.h.generic as src/pcre2.h.
  (2) Copy or rename the file src/pcre2.h.generic as src/pcre2.h.
 
 
  (3) EITHER:
  (3) EITHER:
@@ -68,23 +78,23 @@ can skip ahead to the CMake section.
        src/pcre2_chartables.c.
        src/pcre2_chartables.c.
 
 
      OR:
      OR:
-       Compile src/dftables.c as a stand-alone program (using -DHAVE_CONFIG_H
-       if you have set up src/config.h), and then run it with the single
-       argument "src/pcre2_chartables.c". This generates a set of standard
-       character tables and writes them to that file. The tables are generated
-       using the default C locale for your system. If you want to use a locale
-       that is specified by LC_xxx environment variables, add the -L option to
-       the dftables command. You must use this method if you are building on a
-       system that uses EBCDIC code.
+       Compile src/pcre2_dftables.c as a stand-alone program (using
+       -DHAVE_CONFIG_H if you have set up src/config.h), and then run it with
+       the single argument "src/pcre2_chartables.c". This generates a set of
+       standard character tables and writes them to that file. The tables are
+       generated using the default C locale for your system. If you want to use
+       a locale that is specified by LC_xxx environment variables, add the -L
+       option to the pcre2_dftables command. You must use this method if you
+       are building on a system that uses EBCDIC code.
 
 
      The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
      The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
      specify alternative tables at run time.
      specify alternative tables at run time.
 
 
- (4) For an 8-bit library, compile the following source files from the src
-     directory, setting -DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also
-     set -DHAVE_CONFIG_H if you have set up src/config.h with your
-     configuration, or else use other -D settings to change the configuration
-     as required.
+ (4) For a library that supports 8-bit code units in the character strings that
+     it processes, compile the following source files from the src directory,
+     setting -DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also set
+     -DHAVE_CONFIG_H if you have set up src/config.h with your configuration,
+     or else use other -D settings to change the configuration as required.
 
 
        pcre2_auto_possess.c
        pcre2_auto_possess.c
        pcre2_chartables.c
        pcre2_chartables.c
@@ -103,6 +113,7 @@ can skip ahead to the CMake section.
        pcre2_newline.c
        pcre2_newline.c
        pcre2_ord2utf.c
        pcre2_ord2utf.c
        pcre2_pattern_info.c
        pcre2_pattern_info.c
+       pcre2_script_run.c
        pcre2_serialize.c
        pcre2_serialize.c
        pcre2_string_utils.c
        pcre2_string_utils.c
        pcre2_study.c
        pcre2_study.c
@@ -126,7 +137,7 @@ can skip ahead to the CMake section.
      src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
      src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
      these yourself.
      these yourself.
 
 
-     Not also that the pcre2_fuzzsupport.c file contains special code that is
+     Note also that the pcre2_fuzzsupport.c file contains special code that is
      useful to those who want to run fuzzing tests on the PCRE2 library. Unless
      useful to those who want to run fuzzing tests on the PCRE2 library. Unless
      you are doing that, you can ignore it.
      you are doing that, you can ignore it.
 
 
@@ -135,9 +146,9 @@ can skip ahead to the CMake section.
      If your system has static and shared libraries, you may have to do this
      If your system has static and shared libraries, you may have to do this
      once for each type.
      once for each type.
 
 
- (6) If you want to build a 16-bit library or 32-bit library (as well as, or
-     instead of the 8-bit library) just supply 16 or 32 as the value of
-     -DPCRE2_CODE_UNIT_WIDTH when you are compiling.
+ (6) If you want to build a library that supports 16-bit or 32-bit code units,
+     (as well as, or instead of the 8-bit library) just supply 16 or 32 as the
+     value of -DPCRE2_CODE_UNIT_WIDTH when you are compiling.
 
 
  (7) If you want to build the POSIX wrapper functions (which apply only to the
  (7) If you want to build the POSIX wrapper functions (which apply only to the
      8-bit library), ensure that you have the src/pcre2posix.h file and then
      8-bit library), ensure that you have the src/pcre2posix.h file and then
@@ -185,7 +196,7 @@ can skip ahead to the CMake section.
 
 
 STACK SIZE IN WINDOWS ENVIRONMENTS
 STACK SIZE IN WINDOWS ENVIRONMENTS
 
 
-Prior to release 10.30 the default system stack size of 1Mb in some Windows
+Prior to release 10.30 the default system stack size of 1MiB in some Windows
 environments caused issues with some tests. This should no longer be the case
 environments caused issues with some tests. This should no longer be the case
 for 10.30 and later releases.
 for 10.30 and later releases.
 
 
@@ -330,6 +341,18 @@ cache can be deleted by selecting "File > Delete Cache".
     available for review in Testing\Temporary under your build dir.
     available for review in Testing\Temporary under your build dir.
 
 
 
 
+BUILDING PCRE2 ON WINDOWS WITH VISUAL STUDIO
+
+The code currently cannot be compiled without an inttypes.h header, which is
+available only with Visual Studio 2013 or newer. However, this portable and
+permissively-licensed implementation of the stdint.h header could be used as an
+alternative:
+
+  http://www.azillionmonkeys.com/qed/pstdint.h
+
+Just rename it and drop it into the top level of the build tree.
+
+
 TESTING WITH RUNTEST.BAT
 TESTING WITH RUNTEST.BAT
 
 
 If configured with CMake, building the test project ("make test" or building
 If configured with CMake, building the test project ("make test" or building
@@ -382,6 +405,6 @@ Everything in that location, source and executable, is in EBCDIC and native
 z/OS file formats. The port provides an API for LE languages such as COBOL and
 z/OS file formats. The port provides an API for LE languages such as COBOL and
 for the z/OS and z/VM versions of the Rexx languages.
 for the z/OS and z/VM versions of the Rexx languages.
 
 
-===============================
-Last Updated: 13 September 2017
-===============================
+===========================
+Last Updated: 28 April 2021
+===========================

+ 107 - 85
regex.mod/pcre/doc/html/README.txt

@@ -1,19 +1,23 @@
 README file for PCRE2 (Perl-compatible regular expression library)
 README file for PCRE2 (Perl-compatible regular expression library)
 ------------------------------------------------------------------
 ------------------------------------------------------------------
 
 
-PCRE2 is a re-working of the original PCRE library to provide an entirely new
-API. The latest release of PCRE2 is always available in three alternative
-formats from:
+PCRE2 is a re-working of the original PCRE1 library to provide an entirely new
+API. Since its initial release in 2015, there has been further development of
+the code and it now differs from PCRE1 in more than just the API. There are new
+features, and the internals have been improved. The original PCRE1 library is
+now obsolete and no longer maintained. The latest release of PCRE2 is available
+in .tar.gz, tar.bz2, or .zip form from this GitHub repository:
 
 
-  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz
-  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2
-  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip
+https://github.com/PhilipHazel/pcre2/releases
 
 
-There is a mailing list for discussion about the development of PCRE (both the
-original and new APIs) at [email protected]. You can access the archives and
-subscribe or manage your subscription here:
+There is a mailing list for discussion about the development of PCRE2 at
+[email protected]. You can subscribe by sending an email to
+[email protected].
 
 
-   https://lists.exim.org/mailman/listinfo/pcre-dev
+You can access the archives and also subscribe or manage your subscription
+here:
+
+https://groups.google.com/pcre2-dev
 
 
 Please read the NEWS file if you are upgrading from a previous release. The
 Please read the NEWS file if you are upgrading from a previous release. The
 contents of this README file are:
 contents of this README file are:
@@ -39,13 +43,13 @@ The PCRE2 APIs
 PCRE2 is written in C, and it has its own API. There are three sets of
 PCRE2 is written in C, and it has its own API. There are three sets of
 functions, one for the 8-bit library, which processes strings of bytes, one for
 functions, one for the 8-bit library, which processes strings of bytes, one for
 the 16-bit library, which processes strings of 16-bit values, and one for the
 the 16-bit library, which processes strings of 16-bit values, and one for the
-32-bit library, which processes strings of 32-bit values. There are no C++
-wrappers.
+32-bit library, which processes strings of 32-bit values. Unlike PCRE1, there
+are no C++ wrappers.
 
 
 The distribution does contain a set of C wrapper functions for the 8-bit
 The distribution does contain a set of C wrapper functions for the 8-bit
 library that are based on the POSIX regular expression API (see the pcre2posix
 library that are based on the POSIX regular expression API (see the pcre2posix
-man page). These can be found in a library called libpcre2-posix. Note that
-this just provides a POSIX calling interface to PCRE2; the regular expressions
+man page). These are built into a library called libpcre2-posix. Note that this
+just provides a POSIX calling interface to PCRE2; the regular expressions
 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
 and does not give full access to all of PCRE2's facilities.
 and does not give full access to all of PCRE2's facilities.
 
 
@@ -53,20 +57,8 @@ The header file for the POSIX-style functions is called pcre2posix.h. The
 official POSIX name is regex.h, but I did not want to risk possible problems
 official POSIX name is regex.h, but I did not want to risk possible problems
 with existing files of that name by distributing it that way. To use PCRE2 with
 with existing files of that name by distributing it that way. To use PCRE2 with
 an existing program that uses the POSIX API, pcre2posix.h will have to be
 an existing program that uses the POSIX API, pcre2posix.h will have to be
-renamed or pointed at by a link.
-
-If you are using the POSIX interface to PCRE2 and there is already a POSIX
-regex library installed on your system, as well as worrying about the regex.h
-header file (as mentioned above), you must also take care when linking programs
-to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they
-may pick up the POSIX functions of the same name from the other library.
-
-One way of avoiding this confusion is to compile PCRE2 with the addition of
--Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
-compiler flags (CFLAGS if you are using "configure" -- see below). This has the
-effect of renaming the functions so that the names no longer clash. Of course,
-you have to do the same thing for your applications, or write them using the
-new names.
+renamed or pointed at by a link (or the program modified, of course). See the
+pcre2posix documentation for more details.
 
 
 
 
 Documentation for PCRE2
 Documentation for PCRE2
@@ -171,10 +163,14 @@ library. They are also documented in the pcre2build man page.
   give large performance improvements on certain platforms, add --enable-jit to
   give large performance improvements on certain platforms, add --enable-jit to
   the "configure" command. This support is available only for certain hardware
   the "configure" command. This support is available only for certain hardware
   architectures. If you try to enable it on an unsupported architecture, there
   architectures. If you try to enable it on an unsupported architecture, there
-  will be a compile time error. If you are running under SELinux you may also
-  want to add --enable-jit-sealloc, which enables the use of an execmem
-  allocator in JIT that is compatible with SELinux. This has no effect if JIT
-  is not enabled.
+  will be a compile time error. If in doubt, use --enable-jit=auto, which
+  enables JIT only if the current hardware is supported.
+
+. If you are enabling JIT under SELinux environment you may also want to add
+  --enable-jit-sealloc, which enables the use of an executable memory allocator
+  that is compatible with SELinux. Warning: this allocator is experimental!
+  It does not support fork() operation and may crash when no disk space is
+  available. This option has no effect if JIT is disabled.
 
 
 . If you do not want to make use of the default support for UTF-8 Unicode
 . If you do not want to make use of the default support for UTF-8 Unicode
   character strings in the 8-bit library, UTF-16 Unicode character strings in
   character strings in the 8-bit library, UTF-16 Unicode character strings in
@@ -239,9 +235,11 @@ library. They are also documented in the pcre2build man page.
   discussion in the pcre2api man page (search for pcre2_set_match_limit).
   discussion in the pcre2api man page (search for pcre2_set_match_limit).
 
 
 . There is a separate counter that limits the depth of nested backtracking
 . There is a separate counter that limits the depth of nested backtracking
-  during a matching process, which indirectly limits the amount of heap memory
-  that is used. This also has a default of ten million, which is essentially
-  "unlimited". You can change the default by setting, for example,
+  (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
+  matching process, which indirectly limits the amount of heap memory that is
+  used, and in the case of pcre2_dfa_match() the amount of stack as well. This
+  counter also has a default of ten million, which is essentially "unlimited".
+  You can change the default by setting, for example,
 
 
   --with-match-limit-depth=5000
   --with-match-limit-depth=5000
 
 
@@ -249,16 +247,17 @@ library. They are also documented in the pcre2build man page.
   pcre2_set_depth_limit).
   pcre2_set_depth_limit).
 
 
 . You can also set an explicit limit on the amount of heap memory used by
 . You can also set an explicit limit on the amount of heap memory used by
-  the pcre2_match() interpreter:
+  the pcre2_match() and pcre2_dfa_match() interpreters:
 
 
   --with-heap-limit=500
   --with-heap-limit=500
 
 
-  The units are kilobytes. This limit does not apply when the JIT optimization
-  (which has its own memory control features) is used. There is more discussion
-  on the pcre2api man page (search for pcre2_set_heap_limit).
+  The units are kibibytes (units of 1024 bytes). This limit does not apply when
+  the JIT optimization (which has its own memory control features) is used.
+  There is more discussion on the pcre2api man page (search for
+  pcre2_set_heap_limit).
 
 
 . In the 8-bit library, the default maximum compiled pattern size is around
 . In the 8-bit library, the default maximum compiled pattern size is around
-  64K bytes. You can increase this by adding --with-link-size=3 to the
+  64 kibibytes. You can increase this by adding --with-link-size=3 to the
   "configure" command. PCRE2 then uses three bytes instead of two for offsets
   "configure" command. PCRE2 then uses three bytes instead of two for offsets
   to different parts of the compiled pattern. In the 16-bit library,
   to different parts of the compiled pattern. In the 16-bit library,
   --with-link-size=3 is the same as --with-link-size=4, which (in both
   --with-link-size=3 is the same as --with-link-size=4, which (in both
@@ -272,9 +271,9 @@ library. They are also documented in the pcre2build man page.
 
 
   --enable-rebuild-chartables
   --enable-rebuild-chartables
 
 
-  a program called dftables is compiled and run in the default C locale when
-  you obey "make". It builds a source file called pcre2_chartables.c. If you do
-  not specify this option, pcre2_chartables.c is created as a copy of
+  a program called pcre2_dftables is compiled and run in the default C locale
+  when you obey "make". It builds a source file called pcre2_chartables.c. If
+  you do not specify this option, pcre2_chartables.c is created as a copy of
   pcre2_chartables.c.dist. See "Character tables" below for further
   pcre2_chartables.c.dist. See "Character tables" below for further
   information.
   information.
 
 
@@ -300,8 +299,8 @@ library. They are also documented in the pcre2build man page.
   unaddressable. This allows it to detect invalid memory accesses, and is
   unaddressable. This allows it to detect invalid memory accesses, and is
   mostly useful for debugging PCRE2 itself.
   mostly useful for debugging PCRE2 itself.
 
 
-. In environments where the gcc compiler is used and lcov version 1.6 or above
-  is installed, if you specify
+. In environments where the gcc compiler is used and lcov is installed, if you
+  specify
 
 
   --enable-coverage
   --enable-coverage
 
 
@@ -315,10 +314,14 @@ library. They are also documented in the pcre2build man page.
 . When JIT support is enabled, pcre2grep automatically makes use of it, unless
 . When JIT support is enabled, pcre2grep automatically makes use of it, unless
   you add --disable-pcre2grep-jit to the "configure" command.
   you add --disable-pcre2grep-jit to the "configure" command.
 
 
-. On non-Windows sytems there is support for calling external scripts during
-  matching in the pcre2grep command via PCRE2's callout facility with string
-  arguments. This support can be disabled by adding --disable-pcre2grep-callout
-  to the "configure" command.
+. There is support for calling external programs during matching in the
+  pcre2grep command, using PCRE2's callout facility with string arguments. This
+  support can be disabled by adding --disable-pcre2grep-callout to the
+  "configure" command. There are two kinds of callout: one that generates
+  output from inbuilt code, and another that calls an external program. The
+  latter has special support for Windows and VMS; otherwise it assumes the
+  existence of the fork() function. This facility can be disabled by adding
+  --disable-pcre2grep-callout-fork to the "configure" command.
 
 
 . The pcre2grep program currently supports only 8-bit data files, and so
 . The pcre2grep program currently supports only 8-bit data files, and so
   requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
   requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
@@ -371,6 +374,15 @@ library. They are also documented in the pcre2build man page.
   tgetflag, or tgoto, this is the problem, and linking with the ncurses library
   tgetflag, or tgoto, this is the problem, and linking with the ncurses library
   should fix it.
   should fix it.
 
 
+. The C99 standard defines formatting modifiers z and t for size_t and
+  ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in
+  environments other than Microsoft Visual Studio versions earlier than 2013
+  when __STDC_VERSION__ is defined and has a value greater than or equal to
+  199901L (indicating C99). However, there is at least one environment that
+  claims to be C99 but does not support these modifiers. If
+  --disable-percent-zt is specified, no use is made of the z or t modifiers.
+  Instead of %td or %zu, %lu is used, with a cast for size_t values.
+
 . There is a special option called --enable-fuzz-support for use by people who
 . There is a special option called --enable-fuzz-support for use by people who
   want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
   want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
   library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
   library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
@@ -538,11 +550,11 @@ Cross-compiling using autotools
 
 
 You can specify CC and CFLAGS in the normal way to the "configure" command, in
 You can specify CC and CFLAGS in the normal way to the "configure" command, in
 order to cross-compile PCRE2 for some other host. However, you should NOT
 order to cross-compile PCRE2 for some other host. However, you should NOT
-specify --enable-rebuild-chartables, because if you do, the dftables.c source
-file is compiled and run on the local host, in order to generate the inbuilt
-character tables (the pcre2_chartables.c file). This will probably not work,
-because dftables.c needs to be compiled with the local compiler, not the cross
-compiler.
+specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c
+source file is compiled and run on the local host, in order to generate the
+inbuilt character tables (the pcre2_chartables.c file). This will probably not
+work, because pcre2_dftables.c needs to be compiled with the local compiler,
+not the cross compiler.
 
 
 When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
 When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
 created by making a copy of pcre2_chartables.c.dist, which is a default set of
 created by making a copy of pcre2_chartables.c.dist, which is a default set of
@@ -550,17 +562,18 @@ tables that assumes ASCII code. Cross-compiling with the default tables should
 not be a problem.
 not be a problem.
 
 
 If you need to modify the character tables when cross-compiling, you should
 If you need to modify the character tables when cross-compiling, you should
-move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
-and run it on the local host to make a new version of pcre2_chartables.c.dist.
-Then when you cross-compile PCRE2 this new version of the tables will be used.
+move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
+hand and run it on the local host to make a new version of
+pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
+at build time" for more details.
 
 
 
 
 Making new tarballs
 Making new tarballs
 -------------------
 -------------------
 
 
-The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
-zip formats. The command "make distcheck" does the same, but then does a trial
-build of the new distribution to ensure that it works.
+The command "make dist" creates two PCRE2 tarballs, in tar.gz and zip formats.
+The command "make distcheck" does the same, but then does a trial build of the
+new distribution to ensure that it works.
 
 
 If you have modified any of the man page sources in the doc directory, you
 If you have modified any of the man page sources in the doc directory, you
 should first run the PrepareRelease script before making a distribution. This
 should first run the PrepareRelease script before making a distribution. This
@@ -711,8 +724,8 @@ compile context.
 The source file called pcre2_chartables.c contains the default set of tables.
 The source file called pcre2_chartables.c contains the default set of tables.
 By default, this is created as a copy of pcre2_chartables.c.dist, which
 By default, this is created as a copy of pcre2_chartables.c.dist, which
 contains tables for ASCII coding. However, if --enable-rebuild-chartables is
 contains tables for ASCII coding. However, if --enable-rebuild-chartables is
-specified for ./configure, a different version of pcre2_chartables.c is built
-by the program dftables (compiled from dftables.c), which uses the ANSI C
+specified for ./configure, a new version of pcre2_chartables.c is built by the
+program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C
 character handling functions such as isalnum(), isalpha(), isupper(),
 character handling functions such as isalnum(), isalpha(), isupper(),
 islower(), etc. to build the table sources. This means that the default C
 islower(), etc. to build the table sources. This means that the default C
 locale that is set for your system will control the contents of these default
 locale that is set for your system will control the contents of these default
@@ -722,32 +735,40 @@ file does not get automatically re-generated. The best way to do this is to
 move pcre2_chartables.c.dist out of the way and replace it with your customized
 move pcre2_chartables.c.dist out of the way and replace it with your customized
 tables.
 tables.
 
 
-When the dftables program is run as a result of --enable-rebuild-chartables,
-it uses the default C locale that is set on your system. It does not pay
-attention to the LC_xxx environment variables. In other words, it uses the
-system's default locale rather than whatever the compiling user happens to have
-set. If you really do want to build a source set of character tables in a
-locale that is specified by the LC_xxx variables, you can run the dftables
-program by hand with the -L option. For example:
+When the pcre2_dftables program is run as a result of specifying
+--enable-rebuild-chartables, it uses the default C locale that is set on your
+system. It does not pay attention to the LC_xxx environment variables. In other
+words, it uses the system's default locale rather than whatever the compiling
+user happens to have set. If you really do want to build a source set of
+character tables in a locale that is specified by the LC_xxx variables, you can
+run the pcre2_dftables program by hand with the -L option. For example:
 
 
-  ./dftables -L pcre2_chartables.c.special
+  ./pcre2_dftables -L pcre2_chartables.c.special
 
 
-The first two 256-byte tables provide lower casing and case flipping functions,
-respectively. The next table consists of three 32-byte bit maps which identify
-digits, "word" characters, and white space, respectively. These are used when
-building 32-byte bit maps that represent character classes for code points less
-than 256. The final 256-byte table has bits indicating various character types,
-as follows:
+The second argument names the file where the source code for the tables is
+written. The first two 256-byte tables provide lower casing and case flipping
+functions, respectively. The next table consists of a number of 32-byte bit
+maps which identify certain character classes such as digits, "word"
+characters, white space, etc. These are used when building 32-byte bit maps
+that represent character classes for code points less than 256. The final
+256-byte table has bits indicating various character types, as follows:
 
 
     1   white space character
     1   white space character
     2   letter
     2   letter
-    4   decimal digit
-    8   hexadecimal digit
+    4   lower case letter
+    8   decimal digit
    16   alphanumeric or '_'
    16   alphanumeric or '_'
-  128   regular expression metacharacter or binary zero
 
 
-You should not alter the set of characters that contain the 128 bit, as that
-will cause PCRE2 to malfunction.
+You can also specify -b (with or without -L) when running pcre2_dftables. This
+causes the tables to be written in binary instead of as source code. A set of
+binary tables can be loaded into memory by an application and passed to
+pcre2_compile() in the same way as tables created dynamically by calling
+pcre2_maketables(). The tables are just a string of bytes, independent of
+hardware characteristics such as endianness. This means they can be bundled
+with an application that runs in different environments, to ensure consistent
+behaviour.
+
+See also the pcre2build section "Creating character tables at build time".
 
 
 
 
 File manifest
 File manifest
@@ -758,7 +779,7 @@ The distribution should contain the files listed below.
 (A) Source files for the PCRE2 library functions and their headers are found in
 (A) Source files for the PCRE2 library functions and their headers are found in
     the src directory:
     the src directory:
 
 
-  src/dftables.c           auxiliary program for building pcre2_chartables.c
+  src/pcre2_dftables.c     auxiliary program for building pcre2_chartables.c
                            when --enable-rebuild-chartables is specified
                            when --enable-rebuild-chartables is specified
 
 
   src/pcre2_chartables.c.dist  a default set of character tables that assume
   src/pcre2_chartables.c.dist  a default set of character tables that assume
@@ -784,6 +805,7 @@ The distribution should contain the files listed below.
   src/pcre2_newline.c      )
   src/pcre2_newline.c      )
   src/pcre2_ord2utf.c      )
   src/pcre2_ord2utf.c      )
   src/pcre2_pattern_info.c )
   src/pcre2_pattern_info.c )
+  src/pcre2_script_run.c   )
   src/pcre2_serialize.c    )
   src/pcre2_serialize.c    )
   src/pcre2_string_utils.c )
   src/pcre2_string_utils.c )
   src/pcre2_study.c        )
   src/pcre2_study.c        )
@@ -881,6 +903,6 @@ The distribution should contain the files listed below.
                           )   environments
                           )   environments
 
 
 Philip Hazel
 Philip Hazel
-Email local part: ph10
-Email domain: cam.ac.uk
-Last updated: 12 September 2017
+Email local part: Philip.Hazel
+Email domain: gmail.com
+Last updated: 29 October 2021

+ 7 - 1
regex.mod/pcre/doc/html/index.html

@@ -141,11 +141,14 @@ in the library.
     <td>&nbsp;&nbsp;Free a general context</td></tr>
     <td>&nbsp;&nbsp;Free a general context</td></tr>
 
 
 <tr><td><a href="pcre2_get_error_message.html">pcre2_get_error_message</a></td>
 <tr><td><a href="pcre2_get_error_message.html">pcre2_get_error_message</a></td>
-    <td>&nbsp;&nbsp;Free study data</td></tr>
+    <td>&nbsp;&nbsp;Get textual error message for error number</td></tr>
 
 
 <tr><td><a href="pcre2_get_mark.html">pcre2_get_mark</a></td>
 <tr><td><a href="pcre2_get_mark.html">pcre2_get_mark</a></td>
     <td>&nbsp;&nbsp;Get a (*MARK) name</td></tr>
     <td>&nbsp;&nbsp;Get a (*MARK) name</td></tr>
 
 
+<tr><td><a href="pcre2_get_match_data_size.html">pcre2_get_match_data_size</a></td>
+    <td>&nbsp;&nbsp;Get the size of a match data block</td></tr>
+
 <tr><td><a href="pcre2_get_ovector_count.html">pcre2_get_ovector_count</a></td>
 <tr><td><a href="pcre2_get_ovector_count.html">pcre2_get_ovector_count</a></td>
     <td>&nbsp;&nbsp;Get the ovector count</td></tr>
     <td>&nbsp;&nbsp;Get the ovector count</td></tr>
 
 
@@ -176,6 +179,9 @@ in the library.
 <tr><td><a href="pcre2_maketables.html">pcre2_maketables</a></td>
 <tr><td><a href="pcre2_maketables.html">pcre2_maketables</a></td>
     <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
     <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
 
 
+<tr><td><a href="pcre2_maketables_free.html">pcre2_maketables_free</a></td>
+    <td>&nbsp;&nbsp;Free character tables</td></tr>
+
 <tr><td><a href="pcre2_match.html">pcre2_match</a></td>
 <tr><td><a href="pcre2_match.html">pcre2_match</a></td>
     <td>&nbsp;&nbsp;Match a compiled pattern to a subject string
     <td>&nbsp;&nbsp;Match a compiled pattern to a subject string
     (Perl compatible)</td></tr>
     (Perl compatible)</td></tr>

+ 29 - 13
regex.mod/pcre/doc/html/pcre2.html

@@ -23,16 +23,30 @@ please consult the man page, in case the conversion went wrong.
 <P>
 <P>
 PCRE2 is the name used for a revised API for the PCRE library, which is a set
 PCRE2 is the name used for a revised API for the PCRE library, which is a set
 of functions, written in C, that implement regular expression pattern matching
 of functions, written in C, that implement regular expression pattern matching
-using the same syntax and semantics as Perl, with just a few differences. Some
-features that appeared in Python and the original PCRE before they appeared in
-Perl are also available using the Python syntax. There is also some support for
-one or two .NET and Oniguruma syntax items, and there are options for
-requesting some minor changes that give better ECMAScript (aka JavaScript)
-compatibility.
+using the same syntax and semantics as Perl, with just a few differences. After
+nearly two decades, the limitations of the original API were making development
+increasingly difficult. The new API is more extensible, and it was simplified
+by abolishing the separate "study" optimizing function; in PCRE2, patterns are
+automatically optimized where possible. Since forking from PCRE1, the code has
+been extensively refactored and new features introduced. The old library is now
+obsolete and is no longer maintained.
+</P>
+<P>
+As well as Perl-style regular expression patterns, some features that appeared
+in Python and the original PCRE before they appeared in Perl are available
+using the Python syntax. There is also some support for one or two .NET and
+Oniguruma syntax items, and there are options for requesting some minor changes
+that give better ECMAScript (aka JavaScript) compatibility.
+</P>
+<P>
+The source code for PCRE2 can be compiled to support strings of 8-bit, 16-bit,
+or 32-bit code units, which means that up to three separate libraries may be
+installed, one for each code unit size. The size of code unit is not related to
+the bit size of the underlying hardware. In a 64-bit environment that also
+supports 32-bit applications, versions of PCRE2 that are compiled in both
+64-bit and 32-bit modes may be needed.
 </P>
 </P>
 <P>
 <P>
-The source code for PCRE2 can be compiled to support 8-bit, 16-bit, or 32-bit
-code units, which means that up to three separate libraries may be installed.
 The original work to extend PCRE to 16-bit and 32-bit code units was done by
 The original work to extend PCRE to 16-bit and 32-bit code units was done by
 Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings
 Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings
 can be interpreted either as one character per code unit, or as UTF-encoded
 can be interpreted either as one character per code unit, or as UTF-encoded
@@ -155,8 +169,9 @@ listing), and the short pages for individual functions, are concatenated in
   pcre2-config       show PCRE2 installation configuration information
   pcre2-config       show PCRE2 installation configuration information
   pcre2api           details of PCRE2's native C API
   pcre2api           details of PCRE2's native C API
   pcre2build         building PCRE2
   pcre2build         building PCRE2
-  pcre2callout       details of the callout feature
+  pcre2callout       details of the pattern callout feature
   pcre2compat        discussion of Perl compatibility
   pcre2compat        discussion of Perl compatibility
+  pcre2convert       details of pattern conversion functions
   pcre2demo          a demonstration C program that uses PCRE2
   pcre2demo          a demonstration C program that uses PCRE2
   pcre2grep          description of the <b>pcre2grep</b> command (8-bit only)
   pcre2grep          description of the <b>pcre2grep</b> command (8-bit only)
   pcre2jit           discussion of just-in-time optimization support
   pcre2jit           discussion of just-in-time optimization support
@@ -167,6 +182,7 @@ listing), and the short pages for individual functions, are concatenated in
   pcre2perform       discussion of performance issues
   pcre2perform       discussion of performance issues
   pcre2posix         the POSIX-compatible C API for the 8-bit library
   pcre2posix         the POSIX-compatible C API for the 8-bit library
   pcre2sample        discussion of the pcre2demo program
   pcre2sample        discussion of the pcre2demo program
+  pcre2serialize     details of pattern serialization
   pcre2syntax        quick syntax reference
   pcre2syntax        quick syntax reference
   pcre2test          description of the <b>pcre2test</b> command
   pcre2test          description of the <b>pcre2test</b> command
   pcre2unicode       discussion of Unicode and UTF support
   pcre2unicode       discussion of Unicode and UTF support
@@ -178,20 +194,20 @@ function, listing its arguments and results.
 <P>
 <P>
 Philip Hazel
 Philip Hazel
 <br>
 <br>
-University Computing Service
+Retired from University Computing Service
 <br>
 <br>
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
 </P>
 </P>
 <P>
 <P>
 Putting an actual email address here is a spam magnet. If you want to email me,
 Putting an actual email address here is a spam magnet. If you want to email me,
-use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
+use my two names separated by a dot at gmail.com.
 </P>
 </P>
 <br><a name="SEC5" href="#TOC1">REVISION</a><br>
 <br><a name="SEC5" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 01 April 2017
+Last updated: 27 August 2021
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 2 - 1
regex.mod/pcre/doc/html/pcre2_code_free.html

@@ -25,7 +25,8 @@ SYNOPSIS
 DESCRIPTION
 DESCRIPTION
 </b><br>
 </b><br>
 <P>
 <P>
-This function frees the memory used for a compiled pattern, including any
+If <i>code</i> is NULL, this function does nothing. Otherwise, <i>code</i> must
+point to a compiled pattern. This function frees its memory, including any
 memory used by the JIT compiler. If the compiled pattern was created by a call
 memory used by the JIT compiler. If the compiled pattern was created by a call
 to <b>pcre2_code_copy_with_tables()</b>, the memory for the character tables is
 to <b>pcre2_code_copy_with_tables()</b>, the memory for the character tables is
 also freed.
 also freed.

+ 8 - 2
regex.mod/pcre/doc/html/pcre2_compile.html

@@ -65,7 +65,8 @@ The option bits are:
   PCRE2_EXTENDED           Ignore white space and # comments
   PCRE2_EXTENDED           Ignore white space and # comments
   PCRE2_FIRSTLINE          Force matching to be before newline
   PCRE2_FIRSTLINE          Force matching to be before newline
   PCRE2_LITERAL            Pattern characters are all literal
   PCRE2_LITERAL            Pattern characters are all literal
-  PCRE2_MATCH_UNSET_BACKREF  Match unset back references
+  PCRE2_MATCH_INVALID_UTF  Enable support for matching invalid UTF
+  PCRE2_MATCH_UNSET_BACKREF  Match unset backreferences
   PCRE2_MULTILINE          ^ and $ match newlines within data
   PCRE2_MULTILINE          ^ and $ match newlines within data
   PCRE2_NEVER_BACKSLASH_C  Lock out the use of \C in patterns
   PCRE2_NEVER_BACKSLASH_C  Lock out the use of \C in patterns
   PCRE2_NEVER_UCP          Lock out PCRE2_UCP, e.g. via (*UCP)
   PCRE2_NEVER_UCP          Lock out PCRE2_UCP, e.g. via (*UCP)
@@ -86,7 +87,12 @@ PCRE2 must be built with Unicode support (the default) in order to use
 PCRE2_UTF, PCRE2_UCP and related options.
 PCRE2_UTF, PCRE2_UCP and related options.
 </P>
 </P>
 <P>
 <P>
-The yield of the function is a pointer to a private data structure that
+Additional options may be set in the compile context via the
+<a href="pcre2_set_compile_extra_options.html"><b>pcre2_set_compile_extra_options</b></a>
+function.
+</P>
+<P>
+The yield of this function is a pointer to a private data structure that
 contains the compiled pattern, or NULL if an error was detected.
 contains the compiled pattern, or NULL if an error was detected.
 </P>
 </P>
 <P>
 <P>

+ 2 - 1
regex.mod/pcre/doc/html/pcre2_compile_context_free.html

@@ -27,7 +27,8 @@ DESCRIPTION
 <P>
 <P>
 This function frees the memory occupied by a compile context, using the memory
 This function frees the memory occupied by a compile context, using the memory
 freeing function from the general context with which it was created, or
 freeing function from the general context with which it was created, or
-<b>free()</b> if that was not set.
+<b>free()</b> if that was not set. If the argument is NULL, the function returns
+immediately without doing anything.
 </P>
 </P>
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the

+ 2 - 1
regex.mod/pcre/doc/html/pcre2_convert_context_free.html

@@ -28,7 +28,8 @@ DESCRIPTION
 This function is part of an experimental set of pattern conversion functions.
 This function is part of an experimental set of pattern conversion functions.
 It frees the memory occupied by a convert context, using the memory
 It frees the memory occupied by a convert context, using the memory
 freeing function from the general context with which it was created, or
 freeing function from the general context with which it was created, or
-<b>free()</b> if that was not set.
+<b>free()</b> if that was not set. If the argument is NULL, the function returns
+immediately without doing anything.
 </P>
 </P>
 <P>
 <P>
 The pattern conversion functions are described in the
 The pattern conversion functions are described in the

+ 2 - 1
regex.mod/pcre/doc/html/pcre2_converted_pattern_free.html

@@ -28,7 +28,8 @@ DESCRIPTION
 This function is part of an experimental set of pattern conversion functions.
 This function is part of an experimental set of pattern conversion functions.
 It frees the memory occupied by a converted pattern that was obtained by
 It frees the memory occupied by a converted pattern that was obtained by
 calling <b>pcre2_pattern_convert()</b> with arguments that caused it to place
 calling <b>pcre2_pattern_convert()</b> with arguments that caused it to place
-the converted pattern into newly obtained heap memory.
+the converted pattern into newly obtained heap memory. If the argument is NULL,
+the function returns immediately without doing anything.
 </P>
 </P>
 <P>
 <P>
 The pattern conversion functions are described in the
 The pattern conversion functions are described in the

+ 12 - 4
regex.mod/pcre/doc/html/pcre2_dfa_match.html

@@ -45,12 +45,20 @@ just once (except when processing lookaround assertions). This function is
   <i>workspace</i>    Points to a vector of ints used as working space
   <i>workspace</i>    Points to a vector of ints used as working space
   <i>wscount</i>      Number of elements in the vector
   <i>wscount</i>      Number of elements in the vector
 </pre>
 </pre>
-For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
-up a callout function or specify the match and/or the recursion depth limits.
-The <i>length</i> and <i>startoffset</i> values are code units, not characters.
-The options are:
+The size of output vector needed to contain all the results depends on the
+number of simultaneous matches, not on the number of parentheses in the
+pattern. Using <b>pcre2_match_data_create_from_pattern()</b> to create the match
+data block is therefore not advisable when using this function.
+</P>
+<P>
+A match context is needed only if you want to set up a callout function or
+specify the heap limit or the match or the recursion depth limits. The
+<i>length</i> and <i>startoffset</i> values are code units, not characters. The
+options are:
 <pre>
 <pre>
   PCRE2_ANCHORED          Match only at the first position
   PCRE2_ANCHORED          Match only at the first position
+  PCRE2_COPY_MATCHED_SUBJECT
+                          On success, make a private subject copy
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_NOTBOL            Subject is not the beginning of a line
   PCRE2_NOTBOL            Subject is not the beginning of a line
   PCRE2_NOTEOL            Subject is not the end of a line
   PCRE2_NOTEOL            Subject is not the end of a line

+ 2 - 1
regex.mod/pcre/doc/html/pcre2_general_context_free.html

@@ -26,7 +26,8 @@ DESCRIPTION
 </b><br>
 </b><br>
 <P>
 <P>
 This function frees the memory occupied by a general context, using the memory
 This function frees the memory occupied by a general context, using the memory
-freeing function within the context, if set.
+freeing function within the context, if set.  If the argument is NULL, the
+function returns immediately without doing anything.
 </P>
 </P>
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the

+ 39 - 0
regex.mod/pcre/doc/html/pcre2_get_match_data_size.html

@@ -0,0 +1,39 @@
+<html>
+<head>
+<title>pcre2_get_match_data_size specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre2_get_match_data_size man page</h1>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
+<p>
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre2.h&#62;</b>
+</P>
+<P>
+<b>PCRE2_SIZE pcre2_get_match_data_size(pcre2_match_data *<i>match_data</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function returns the size, in bytes, of the match data block that is its
+argument.
+</P>
+<P>
+There is a complete description of the PCRE2 native API in the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+page and a description of the POSIX API in the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>

+ 8 - 1
regex.mod/pcre/doc/html/pcre2_jit_compile.html

@@ -41,9 +41,16 @@ bits:
   PCRE2_JIT_PARTIAL_SOFT  compile code for soft partial matching
   PCRE2_JIT_PARTIAL_SOFT  compile code for soft partial matching
   PCRE2_JIT_PARTIAL_HARD  compile code for hard partial matching
   PCRE2_JIT_PARTIAL_HARD  compile code for hard partial matching
 </pre>
 </pre>
+There is also an obsolete option called PCRE2_JIT_INVALID_UTF, which has been
+superseded by the <b>pcre2_compile()</b> option PCRE2_MATCH_INVALID_UTF. The old
+option is deprecated and may be removed in the future.
+</P>
+<P>
 The yield of the function is 0 for success, or a negative error code otherwise.
 The yield of the function is 0 for success, or a negative error code otherwise.
 In particular, PCRE2_ERROR_JIT_BADOPTION is returned if JIT is not supported or
 In particular, PCRE2_ERROR_JIT_BADOPTION is returned if JIT is not supported or
-if an unknown bit is set in <i>options</i>.
+if an unknown bit is set in <i>options</i>. The function can also return
+PCRE2_ERROR_NOMEMORY if JIT is unable to allocate executable memory for the
+compiler, even if it was because of a system security restriction.
 </P>
 </P>
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the

+ 1 - 1
regex.mod/pcre/doc/html/pcre2_jit_free_unused_memory.html

@@ -29,7 +29,7 @@ This function frees unused JIT executable memory. The argument is a general
 context, for custom memory management, or NULL for standard memory management.
 context, for custom memory management, or NULL for standard memory management.
 JIT memory allocation retains some memory in order to improve future JIT
 JIT memory allocation retains some memory in order to improve future JIT
 compilation speed. In low memory conditions,
 compilation speed. In low memory conditions,
-\fBpcre2_jit_free_unused_memory()\fB can be used to cause this memory to be
+<b>pcre2_jit_free_unused_memory()</b> can be used to cause this memory to be
 freed.
 freed.
 </P>
 </P>
 <P>
 <P>

+ 3 - 1
regex.mod/pcre/doc/html/pcre2_jit_match.html

@@ -33,7 +33,9 @@ processed by the JIT compiler against a given subject string, using a matching
 algorithm that is similar to Perl's. It is a "fast path" interface to JIT, and
 algorithm that is similar to Perl's. It is a "fast path" interface to JIT, and
 it bypasses some of the sanity checks that <b>pcre2_match()</b> applies.
 it bypasses some of the sanity checks that <b>pcre2_match()</b> applies.
 Its arguments are exactly the same as for
 Its arguments are exactly the same as for
-<a href="pcre2_match.html"><b>pcre2_match()</b>.</a>
+<a href="pcre2_match.html"><b>pcre2_match()</b>,</a>
+except that the subject string must be specified with a length;
+PCRE2_ZERO_TERMINATED is not supported.
 </P>
 </P>
 <P>
 <P>
 The supported options are PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
 The supported options are PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,

+ 8 - 3
regex.mod/pcre/doc/html/pcre2_jit_stack_assign.html

@@ -38,7 +38,11 @@ passed to a matching function. The arguments of this function are:
 </PRE>
 </PRE>
 </P>
 </P>
 <P>
 <P>
-If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32K
+If <i>mcontext</i> is NULL, the function returns immediately, without doing
+anything.
+</P>
+<P>
+If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32KiB
 block on the machine stack is used.
 block on the machine stack is used.
 </P>
 </P>
 <P>
 <P>
@@ -49,8 +53,9 @@ If <i>callback</i> is NULL and <i>callback_data</i> is not NULL,
 <P>
 <P>
 If <i>callback</i> not NULL, it is called with <i>callback_data</i> as an
 If <i>callback</i> not NULL, it is called with <i>callback_data</i> as an
 argument at the start of matching, in order to set up a JIT stack. If the
 argument at the start of matching, in order to set up a JIT stack. If the
-result is NULL, the internal 32K stack is used; otherwise the return value must
-be a valid JIT stack, the result of calling <b>pcre2_jit_stack_create()</b>.
+result is NULL, the internal 32KiB stack is used; otherwise the return value
+must be a valid JIT stack, the result of calling
+<b>pcre2_jit_stack_create()</b>.
 </P>
 </P>
 <P>
 <P>
 You may safely use the same JIT stack for multiple patterns, as long as they
 You may safely use the same JIT stack for multiple patterns, as long as they

+ 2 - 2
regex.mod/pcre/doc/html/pcre2_jit_stack_create.html

@@ -33,8 +33,8 @@ context, for memory allocation functions, or NULL for standard memory
 allocation. The result can be passed to the JIT run-time code by calling
 allocation. The result can be passed to the JIT run-time code by calling
 <b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
 <b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
 which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
 which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
-A maximum stack size of 512K to 1M should be more than enough for any pattern.
-For more details, see the
+A maximum stack size of 512KiB to 1MiB should be more than enough for any
+pattern. For more details, see the
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 page.
 page.
 </P>
 </P>

+ 3 - 2
regex.mod/pcre/doc/html/pcre2_jit_stack_free.html

@@ -26,8 +26,9 @@ DESCRIPTION
 </b><br>
 </b><br>
 <P>
 <P>
 This function is used to free a JIT stack that was created by
 This function is used to free a JIT stack that was created by
-<b>pcre2_jit_stack_create()</b> when it is no longer needed. For more details,
-see the
+<b>pcre2_jit_stack_create()</b> when it is no longer needed. If the argument is
+NULL, the function returns immediately without doing anything. For more
+details, see the
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 page.
 page.
 </P>
 </P>

+ 1 - 1
regex.mod/pcre/doc/html/pcre2_maketables.html

@@ -19,7 +19,7 @@ SYNOPSIS
 <b>#include &#60;pcre2.h&#62;</b>
 <b>#include &#60;pcre2.h&#62;</b>
 </P>
 </P>
 <P>
 <P>
-<b>const unsigned char *pcre2_maketables(pcre2_general_context *<i>gcontext</i>);</b>
+<b>const uint8_t *pcre2_maketables(pcre2_general_context *<i>gcontext</i>);</b>
 </P>
 </P>
 <br><b>
 <br><b>
 DESCRIPTION
 DESCRIPTION

+ 44 - 0
regex.mod/pcre/doc/html/pcre2_maketables_free.html

@@ -0,0 +1,44 @@
+<html>
+<head>
+<title>pcre2_maketables_free specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre2_maketables_free man page</h1>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
+<p>
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre2.h&#62;</b>
+</P>
+<P>
+<b>void pcre2_maketables_free(pcre2_general_context *<i>gcontext</i>,</b>
+<b>  const uint8_t *<i>tables</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function discards a set of character tables that were created by a call
+to
+<a href="pcre2_maketables.html"><b>pcre2_maketables()</b>.</a>
+</P>
+<P>
+The <i>gcontext</i> parameter should match what was used in that call to
+account for any custom allocators that might be in use; if it is NULL
+the system <b>free()</b> is used.
+</P>
+<P>
+There is a complete description of the PCRE2 native API in the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>

+ 5 - 3
regex.mod/pcre/doc/html/pcre2_match.html

@@ -55,11 +55,13 @@ A match context is needed only if you want to:
   Change the backtracking depth limit
   Change the backtracking depth limit
   Set custom memory management specifically for the match
   Set custom memory management specifically for the match
 </pre>
 </pre>
-The <i>length</i> and <i>startoffset</i> values are code
-units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
-subject that is terminated by a binary zero code unit. The options are:
+The <i>length</i> and <i>startoffset</i> values are code units, not characters.
+The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
+terminated by a binary zero code unit. The options are:
 <pre>
 <pre>
   PCRE2_ANCHORED          Match only at the first position
   PCRE2_ANCHORED          Match only at the first position
+  PCRE2_COPY_MATCHED_SUBJECT
+                          On success, make a private subject copy
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_NOTBOL            Subject string is not the beginning of a line
   PCRE2_NOTBOL            Subject string is not the beginning of a line
   PCRE2_NOTEOL            Subject string is not the end of a line
   PCRE2_NOTEOL            Subject string is not the end of a line

+ 2 - 1
regex.mod/pcre/doc/html/pcre2_match_context_free.html

@@ -27,7 +27,8 @@ DESCRIPTION
 <P>
 <P>
 This function frees the memory occupied by a match context, using the memory
 This function frees the memory occupied by a match context, using the memory
 freeing function from the general context with which it was created, or
 freeing function from the general context with which it was created, or
-<b>free()</b> if that was not set.
+<b>free()</b> if that was not set. If the argument is NULL, the function returns
+immediately without doing anything.
 </P>
 </P>
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the

+ 3 - 2
regex.mod/pcre/doc/html/pcre2_match_data_create.html

@@ -30,8 +30,9 @@ This function creates a new match data block, which is used for holding the
 result of a match. The first argument specifies the number of pairs of offsets
 result of a match. The first argument specifies the number of pairs of offsets
 that are required. These form the "output vector" (ovector) within the match
 that are required. These form the "output vector" (ovector) within the match
 data block, and are used to identify the matched string and any captured
 data block, and are used to identify the matched string and any captured
-substrings. There is always one pair of offsets; if <b>ovecsize</b> is zero, it
-is treated as one.
+substrings when matching with <b>pcre2_match()</b>, or a number of different
+matches at the same point when used with <b>pcre2_dfa_match()</b>. There is
+always one pair of offsets; if <b>ovecsize</b> is zero, it is treated as one.
 </P>
 </P>
 <P>
 <P>
 The second argument points to a general context, for custom memory management,
 The second argument points to a general context, for custom memory management,

+ 9 - 6
regex.mod/pcre/doc/html/pcre2_match_data_create_from_pattern.html

@@ -26,12 +26,15 @@ SYNOPSIS
 DESCRIPTION
 DESCRIPTION
 </b><br>
 </b><br>
 <P>
 <P>
-This function creates a new match data block, which is used for holding the
-result of a match. The first argument points to a compiled pattern. The number
-of capturing parentheses within the pattern is used to compute the number of
-pairs of offsets that are required in the match data block. These form the
-"output vector" (ovector) within the match data block, and are used to identify
-the matched string and any captured substrings.
+This function creates a new match data block for holding the result of a match.
+The first argument points to a compiled pattern. The number of capturing
+parentheses within the pattern is used to compute the number of pairs of
+offsets that are required in the match data block. These form the "output
+vector" (ovector) within the match data block, and are used to identify the
+matched string and any captured substrings when matching with
+<b>pcre2_match()</b>. If you are using <b>pcre2_dfa_match()</b>, which uses the
+outut vector in a different way, you should use <b>pcre2_match_data_create()</b>
+instead of this function.
 </P>
 </P>
 <P>
 <P>
 The second argument points to a general context, for custom memory management,
 The second argument points to a general context, for custom memory management,

+ 9 - 3
regex.mod/pcre/doc/html/pcre2_match_data_free.html

@@ -25,9 +25,15 @@ SYNOPSIS
 DESCRIPTION
 DESCRIPTION
 </b><br>
 </b><br>
 <P>
 <P>
-This function frees the memory occupied by a match data block, using the memory
-freeing function from the general context or compiled pattern with which it was
-created, or <b>free()</b> if that was not set.
+If <i>match_data</i> is NULL, this function does nothing. Otherwise,
+<i>match_data</i> must point to a match data block, which this function frees,
+using the memory freeing function from the general context or compiled pattern
+with which it was created, or <b>free()</b> if that was not set.
+</P>
+<P>
+If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
+match data block, the copy of the subject that was remembered with the block is
+also freed.
 </P>
 </P>
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the

+ 3 - 2
regex.mod/pcre/doc/html/pcre2_pattern_info.html

@@ -19,7 +19,8 @@ SYNOPSIS
 <b>#include &#60;pcre2.h&#62;</b>
 <b>#include &#60;pcre2.h&#62;</b>
 </P>
 </P>
 <P>
 <P>
-<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
+<b>int pcre2_pattern_info(const pcre2_code *<i>code</i>, uint32_t <i>what</i>,</b>
+<b>   void *<i>where</i>);</b>
 </P>
 </P>
 <br><b>
 <br><b>
 DESCRIPTION
 DESCRIPTION
@@ -36,7 +37,7 @@ request are as follows:
 <pre>
 <pre>
   PCRE2_INFO_ALLOPTIONS      Final options after compiling
   PCRE2_INFO_ALLOPTIONS      Final options after compiling
   PCRE2_INFO_ARGOPTIONS      Options passed to <b>pcre2_compile()</b>
   PCRE2_INFO_ARGOPTIONS      Options passed to <b>pcre2_compile()</b>
-  PCRE2_INFO_BACKREFMAX      Number of highest back reference
+  PCRE2_INFO_BACKREFMAX      Number of highest backreference
   PCRE2_INFO_BSR             What \R matches:
   PCRE2_INFO_BSR             What \R matches:
                                PCRE2_BSR_UNICODE: Unicode line endings
                                PCRE2_BSR_UNICODE: Unicode line endings
                                PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
                                PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only

+ 6 - 3
regex.mod/pcre/doc/html/pcre2_serialize_decode.html

@@ -28,7 +28,10 @@ DESCRIPTION
 </b><br>
 </b><br>
 <P>
 <P>
 This function decodes a serialized set of compiled patterns back into a list of
 This function decodes a serialized set of compiled patterns back into a list of
-individual patterns. Its arguments are:
+individual patterns. This is possible only on a host that is running the same
+version of PCRE2, with the same code unit width, and the host must also have
+the same endianness, pointer width and PCRE2_SIZE type. The arguments for
+<b>pcre2_serialize_decode()</b> are:
 <pre>
 <pre>
   <i>codes</i>            pointer to a vector in which to build the list
   <i>codes</i>            pointer to a vector in which to build the list
   <i>number_of_codes</i>  number of slots in the vector
   <i>number_of_codes</i>  number of slots in the vector
@@ -54,8 +57,8 @@ on a system with different endianness.
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 <a href="pcre2api.html"><b>pcre2api</b></a>
-page and a description of the POSIX API in the
-<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page and a description of the serialization functions in the
+<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
 page.
 page.
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 8 - 3
regex.mod/pcre/doc/html/pcre2_serialize_encode.html

@@ -28,7 +28,12 @@ DESCRIPTION
 </b><br>
 </b><br>
 <P>
 <P>
 This function encodes a list of compiled patterns into a byte stream that can
 This function encodes a list of compiled patterns into a byte stream that can
-be saved on disc or elsewhere. Its arguments are:
+be saved on disc or elsewhere. Note that this is not an abstract format like
+Java or .NET. Conversion of the byte stream back into usable compiled patterns
+can only happen on a host that is running the same version of PCRE2, with the
+same code unit width, and the host must also have the same endianness, pointer
+width and PCRE2_SIZE type. The arguments for <b>pcre2_serialize_encode()</b>
+are:
 <pre>
 <pre>
   <i>codes</i>             pointer to a vector containing the list
   <i>codes</i>             pointer to a vector containing the list
   <i>number_of_codes</i>   number of slots in the vector
   <i>number_of_codes</i>   number of slots in the vector
@@ -53,8 +58,8 @@ that a slot in the vector does not point to a compiled pattern.
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 <a href="pcre2api.html"><b>pcre2api</b></a>
-page and a description of the POSIX API in the
-<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page and a description of the serialization functions in the
+<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
 page.
 page.
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 4 - 3
regex.mod/pcre/doc/html/pcre2_serialize_free.html

@@ -27,13 +27,14 @@ DESCRIPTION
 <P>
 <P>
 This function frees the memory that was obtained by
 This function frees the memory that was obtained by
 <b>pcre2_serialize_encode()</b> to hold a serialized byte stream. The argument
 <b>pcre2_serialize_encode()</b> to hold a serialized byte stream. The argument
-must point to such a byte stream.
+must point to such a byte stream or be NULL, in which case the function returns
+without doing anything.
 </P>
 </P>
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 <a href="pcre2api.html"><b>pcre2api</b></a>
-page and a description of the POSIX API in the
-<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page and a description of the serialization functions in the
+<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
 page.
 page.
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 2 - 2
regex.mod/pcre/doc/html/pcre2_serialize_get_number_of_codes.html

@@ -41,8 +41,8 @@ on a system with different endianness.
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 <a href="pcre2api.html"><b>pcre2api</b></a>
-page and a description of the POSIX API in the
-<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page and a description of the serialization functions in the
+<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
 page.
 page.
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 7 - 4
regex.mod/pcre/doc/html/pcre2_set_character_tables.html

@@ -20,16 +20,19 @@ SYNOPSIS
 </P>
 </P>
 <P>
 <P>
 <b>int pcre2_set_character_tables(pcre2_compile_context *<i>ccontext</i>,</b>
 <b>int pcre2_set_character_tables(pcre2_compile_context *<i>ccontext</i>,</b>
-<b>  const unsigned char *<i>tables</i>);</b>
+<b>  const uint8_t *<i>tables</i>);</b>
 </P>
 </P>
 <br><b>
 <br><b>
 DESCRIPTION
 DESCRIPTION
 </b><br>
 </b><br>
 <P>
 <P>
 This function sets a pointer to custom character tables within a compile
 This function sets a pointer to custom character tables within a compile
-context. The second argument must be the result of a call to
-<b>pcre2_maketables()</b> or NULL to request the default tables. The result is
-always zero.
+context. The second argument must point to a set of PCRE2 character tables or
+be NULL to request the default tables. The result is always zero. Character
+tables can be created by calling <b>pcre2_maketables()</b> or by running the
+<b>pcre2_dftables</b> maintenance command in binary mode (see the
+<a href="pcre2build.html"><b>pcre2build</b></a>
+documentation).
 </P>
 </P>
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the

+ 5 - 2
regex.mod/pcre/doc/html/pcre2_set_compile_extra_options.html

@@ -20,7 +20,7 @@ SYNOPSIS
 </P>
 </P>
 <P>
 <P>
 <b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b>
 <b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b>
-<b>  PCRE2_SIZE <i>extra_options</i>);</b>
+<b>  uint32_t <i>extra_options</i>);</b>
 </P>
 </P>
 <br><b>
 <br><b>
 DESCRIPTION
 DESCRIPTION
@@ -30,8 +30,11 @@ This function sets additional option bits for <b>pcre2_compile()</b> that are
 housed in a compile context. It completely replaces all the bits. The extra
 housed in a compile context. It completely replaces all the bits. The extra
 options are:
 options are:
 <pre>
 <pre>
-  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
+  PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK     Allow \K in lookarounds PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \x{df800} to \x{dfff}
+                                         in UTF-8 and UTF-32 modes
+  PCRE2_EXTRA_ALT_BSUX                 Extended alternate \u, \U, and \x handling
   PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL    Treat all invalid escapes as a literal following character
   PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL    Treat all invalid escapes as a literal following character
+  PCRE2_EXTRA_ESCAPED_CR_IS_LF         Interpret \r as \n
   PCRE2_EXTRA_MATCH_LINE               Pattern matches whole lines
   PCRE2_EXTRA_MATCH_LINE               Pattern matches whole lines
   PCRE2_EXTRA_MATCH_WORD               Pattern matches "words"
   PCRE2_EXTRA_MATCH_WORD               Pattern matches "words"
 </pre>
 </pre>

+ 1 - 1
regex.mod/pcre/doc/html/pcre2_set_glob_separator.html

@@ -28,7 +28,7 @@ DESCRIPTION
 <P>
 <P>
 This function is part of an experimental set of pattern conversion functions.
 This function is part of an experimental set of pattern conversion functions.
 It sets the component separator character that is used when converting globs.
 It sets the component separator character that is used when converting globs.
-The second argument must one of the characters forward slash, backslash, or
+The second argument must be one of the characters forward slash, backslash, or
 dot. The default is backslash when running under Windows, otherwise forward
 dot. The default is backslash when running under Windows, otherwise forward
 slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
 slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
 the second argument is invalid.
 the second argument is invalid.

+ 43 - 0
regex.mod/pcre/doc/html/pcre2_set_substitute_callout.html

@@ -0,0 +1,43 @@
+<html>
+<head>
+<title>pcre2_set_substitute_callout specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre2_set_substitute_callout man page</h1>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
+<p>
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre2.h&#62;</b>
+</P>
+<P>
+<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
+<b>  int (*<i>callout_function</i>)(pcre2_substitute_callout_block *),</b>
+<b>  void *<i>callout_data</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function sets the substitute callout fields in a match context (the first
+argument). The second argument specifies a callout function, and the third
+argument is an opaque data item that is passed to it. The result of this
+function is always zero.
+</P>
+<P>
+There is a complete description of the PCRE2 native API in the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+page and a description of the POSIX API in the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>

+ 23 - 6
regex.mod/pcre/doc/html/pcre2_substitute.html

@@ -48,8 +48,8 @@ Its arguments are:
   <i>outlengthptr</i>  Points to the length of the output buffer
   <i>outlengthptr</i>  Points to the length of the output buffer
 </pre>
 </pre>
 A match data block is needed only if you want to inspect the data from the
 A match data block is needed only if you want to inspect the data from the
-match that is returned in that block. A match context is needed only if you
-want to:
+final match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is
+set. A match context is needed only if you want to:
 <pre>
 <pre>
   Set up a callout function
   Set up a callout function
   Set a matching offset limit
   Set a matching offset limit
@@ -57,9 +57,14 @@ want to:
   Change the backtracking depth limit
   Change the backtracking depth limit
   Set custom memory management in the match context
   Set custom memory management in the match context
 </pre>
 </pre>
-The <i>length</i>, <i>startoffset</i> and <i>rlength</i> values are code
-units, not characters, as is the contents of the variable pointed at by
-<i>outlengthptr</i>, which is updated to the actual length of the new string.
+The <i>length</i>, <i>startoffset</i> and <i>rlength</i> values are code units,
+not characters, as is the contents of the variable pointed at by
+<i>outlengthptr</i>. This variable must contain the length of the output buffer
+when the function is called. If the function is successful, the value is
+changed to the length of the new string, excluding the trailing zero that is
+automatically added.
+</P>
+<P>
 The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for
 The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for
 zero-terminated strings. The options are:
 zero-terminated strings. The options are:
 <pre>
 <pre>
@@ -74,12 +79,24 @@ zero-terminated strings. The options are:
                               PCRE2_UTF was set at compile time)
                               PCRE2_UTF was set at compile time)
   PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
   PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
   PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
   PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
+  PCRE2_SUBSTITUTE_LITERAL   The replacement string is literal
+  PCRE2_SUBSTITUTE_MATCHED   Use pre-existing match data for 1st match
   PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
   PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
+  PCRE2_SUBSTITUTE_REPLACEMENT_ONLY  Return only replacement string(s)
   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
   PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
   PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
 </pre>
 </pre>
+If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED,
+PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
+</P>
+<P>
+If PCRE2_SUBSTITUTE_MATCHED is set, <i>match_data</i> must be non-zero; its
+contents must be the result of a call to <b>pcre2_match()</b> using the same
+pattern and subject.
+</P>
+<P>
 The function returns the number of substitutions, which may be zero if there
 The function returns the number of substitutions, which may be zero if there
-were no matches. The result can be greater than one only when
+are no matches. The result may be greater than one only when
 PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
 PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
 is returned.
 is returned.
 </P>
 </P>

+ 1 - 1
regex.mod/pcre/doc/html/pcre2_substring_free.html

@@ -28,7 +28,7 @@ DESCRIPTION
 This is a convenience function for freeing the memory obtained by a previous
 This is a convenience function for freeing the memory obtained by a previous
 call to <b>pcre2_substring_get_byname()</b> or
 call to <b>pcre2_substring_get_byname()</b> or
 <b>pcre2_substring_get_bynumber()</b>. Its only argument is a pointer to the
 <b>pcre2_substring_get_bynumber()</b>. Its only argument is a pointer to the
-string.
+string. If the argument is NULL, the function does nothing.
 </P>
 </P>
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the

+ 2 - 1
regex.mod/pcre/doc/html/pcre2_substring_list_free.html

@@ -27,7 +27,8 @@ DESCRIPTION
 <P>
 <P>
 This is a convenience function for freeing the store obtained by a previous
 This is a convenience function for freeing the store obtained by a previous
 call to <b>pcre2substring_list_get()</b>. Its only argument is a pointer to
 call to <b>pcre2substring_list_get()</b>. Its only argument is a pointer to
-the list of string pointers.
+the list of string pointers. If the argument is NULL, the function returns
+immediately, without doing anything.
 </P>
 </P>
 <P>
 <P>
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the

+ 2 - 2
regex.mod/pcre/doc/html/pcre2_substring_nametable_scan.html

@@ -27,8 +27,8 @@ DESCRIPTION
 </b><br>
 </b><br>
 <P>
 <P>
 This convenience function finds, for a compiled pattern, the first and last
 This convenience function finds, for a compiled pattern, the first and last
-entries for a given name in the table that translates capturing parenthesis
-names into numbers.
+entries for a given name in the table that translates capture group names into
+numbers.
 <pre>
 <pre>
   <i>code</i>    Compiled regular expression
   <i>code</i>    Compiled regular expression
   <i>name</i>    Name whose entries required
   <i>name</i>    Name whose entries required

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 396 - 175
regex.mod/pcre/doc/html/pcre2api.html


+ 101 - 51
regex.mod/pcre/doc/html/pcre2build.html

@@ -33,11 +33,12 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC18" href="#SEC18">INCLUDING DEBUGGING CODE</a>
 <li><a name="TOC18" href="#SEC18">INCLUDING DEBUGGING CODE</a>
 <li><a name="TOC19" href="#SEC19">DEBUGGING WITH VALGRIND SUPPORT</a>
 <li><a name="TOC19" href="#SEC19">DEBUGGING WITH VALGRIND SUPPORT</a>
 <li><a name="TOC20" href="#SEC20">CODE COVERAGE REPORTING</a>
 <li><a name="TOC20" href="#SEC20">CODE COVERAGE REPORTING</a>
-<li><a name="TOC21" href="#SEC21">SUPPORT FOR FUZZERS</a>
-<li><a name="TOC22" href="#SEC22">OBSOLETE OPTION</a>
-<li><a name="TOC23" href="#SEC23">SEE ALSO</a>
-<li><a name="TOC24" href="#SEC24">AUTHOR</a>
-<li><a name="TOC25" href="#SEC25">REVISION</a>
+<li><a name="TOC21" href="#SEC21">DISABLING THE Z AND T FORMATTING MODIFIERS</a>
+<li><a name="TOC22" href="#SEC22">SUPPORT FOR FUZZERS</a>
+<li><a name="TOC23" href="#SEC23">OBSOLETE OPTION</a>
+<li><a name="TOC24" href="#SEC24">SEE ALSO</a>
+<li><a name="TOC25" href="#SEC25">AUTHOR</a>
+<li><a name="TOC26" href="#SEC26">REVISION</a>
 </ul>
 </ul>
 <br><a name="SEC1" href="#TOC1">BUILDING PCRE2</a><br>
 <br><a name="SEC1" href="#TOC1">BUILDING PCRE2</a><br>
 <P>
 <P>
@@ -82,7 +83,8 @@ The following sections include descriptions of "on/off" options whose names
 begin with --enable or --disable. Because of the way that <b>configure</b>
 begin with --enable or --disable. Because of the way that <b>configure</b>
 works, --enable and --disable always come in pairs, so the complementary option
 works, --enable and --disable always come in pairs, so the complementary option
 always exists as well, but as it specifies the default, it is not described.
 always exists as well, but as it specifies the default, it is not described.
-Options that specify values have names that start with --with.
+Options that specify values have names that start with --with. At the end of a
+<b>configure</b> run, a summary of the configuration is output.
 </P>
 </P>
 <br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
 <br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
 <P>
 <P>
@@ -126,7 +128,7 @@ To build it without Unicode support, add
   --disable-unicode
   --disable-unicode
 </pre>
 </pre>
 to the <b>configure</b> command. This setting applies to all three libraries. It
 to the <b>configure</b> command. This setting applies to all three libraries. It
-is not possible to build one library with Unicode support, and another without,
+is not possible to build one library with Unicode support and another without
 in the same configuration.
 in the same configuration.
 </P>
 </P>
 <P>
 <P>
@@ -170,8 +172,15 @@ Just-in-time (JIT) compiler support is included in the build by specifying
   --enable-jit
   --enable-jit
 </pre>
 </pre>
 This support is available only for certain hardware architectures. If this
 This support is available only for certain hardware architectures. If this
-option is set for an unsupported architecture, a building error occurs. If you
-are running under SELinux you may also want to add
+option is set for an unsupported architecture, a building error occurs.
+If in doubt, use
+<pre>
+  --enable-jit=auto
+</pre>
+which enables JIT only if the current hardware is supported. You can check
+if JIT is enabled in the configuration summary that is output at the end of a
+<b>configure</b> run. If you are enabling JIT under SELinux you may also want to
+add
 <pre>
 <pre>
   --enable-jit-sealloc
   --enable-jit-sealloc
 </pre>
 </pre>
@@ -179,11 +188,11 @@ which enables the use of an execmem allocator in JIT that is compatible with
 SELinux. This has no effect if JIT is not enabled. See the
 SELinux. This has no effect if JIT is not enabled. See the
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 documentation for a discussion of JIT usage. When JIT support is enabled,
 documentation for a discussion of JIT usage. When JIT support is enabled,
-pcre2grep automatically makes use of it, unless you add
+<b>pcre2grep</b> automatically makes use of it, unless you add
 <pre>
 <pre>
   --disable-pcre2grep-jit
   --disable-pcre2grep-jit
 </pre>
 </pre>
-to the "configure" command.
+to the <b>configure</b> command.
 </P>
 </P>
 <br><a name="SEC8" href="#TOC1">NEWLINE RECOGNITION</a><br>
 <br><a name="SEC8" href="#TOC1">NEWLINE RECOGNITION</a><br>
 <P>
 <P>
@@ -219,7 +228,7 @@ separator, U+2028), and PS (paragraph separator, U+2029). The final option is
 <pre>
 <pre>
   --enable-newline-is-nul
   --enable-newline-is-nul
 </pre>
 </pre>
-which causes NUL (binary zero) is set as the default line-ending character.
+which causes NUL (binary zero) to be set as the default line-ending character.
 </P>
 </P>
 <P>
 <P>
 Whatever default line ending convention is selected when PCRE2 is built can be
 Whatever default line ending convention is selected when PCRE2 is built can be
@@ -244,10 +253,10 @@ Within a compiled pattern, offset values are used to point from one part to
 another (for example, from an opening parenthesis to an alternation
 another (for example, from an opening parenthesis to an alternation
 metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
 metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
 are used for these offsets, leading to a maximum size for a compiled pattern of
 are used for these offsets, leading to a maximum size for a compiled pattern of
-around 64K code units. This is sufficient to handle all but the most gigantic
-patterns. Nevertheless, some people do want to process truly enormous patterns,
-so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
-adding a setting such as
+around 64 thousand code units. This is sufficient to handle all but the most
+gigantic patterns. Nevertheless, some people do want to process truly enormous
+patterns, so it is possible to compile PCRE2 to use three-byte or four-byte
+offsets by adding a setting such as
 <pre>
 <pre>
   --with-link-size=3
   --with-link-size=3
 </pre>
 </pre>
@@ -274,22 +283,23 @@ to the <b>configure</b> command. This setting also applies to the
 counting is done differently).
 counting is done differently).
 </P>
 </P>
 <P>
 <P>
-The <b>pcre2_match()</b> function starts out using a 20K vector on the system
+The <b>pcre2_match()</b> function starts out using a 20KiB vector on the system
 stack to record backtracking points. The more nested backtracking points there
 stack to record backtracking points. The more nested backtracking points there
 are (that is, the deeper the search tree), the more memory is needed. If the
 are (that is, the deeper the search tree), the more memory is needed. If the
 initial vector is not large enough, heap memory is used, up to a certain limit,
 initial vector is not large enough, heap memory is used, up to a certain limit,
-which is specified in kilobytes. The limit can be changed at run time, as
-described in the
+which is specified in kibibytes (units of 1024 bytes). The limit can be changed
+at run time, as described in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation. The default limit (in effect unlimited) is 20 million. You can
 documentation. The default limit (in effect unlimited) is 20 million. You can
 change this by a setting such as
 change this by a setting such as
 <pre>
 <pre>
   --with-heap-limit=500
   --with-heap-limit=500
 </pre>
 </pre>
-which limits the amount of heap to 500 kilobytes. This limit applies only to
-interpretive matching in pcre2_match(). It does not apply when JIT (which has
-its own memory arrangements) is used, nor does it apply to
-<b>pcre2_dfa_match()</b>.
+which limits the amount of heap to 500 KiB. This limit applies only to
+interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
+may also use the heap for internal workspace when processing complicated
+patterns. This limit does not apply when JIT (which has its own memory
+arrangements) is used.
 </P>
 </P>
 <P>
 <P>
 You can also explicitly limit the depth of nested backtracking in the
 You can also explicitly limit the depth of nested backtracking in the
@@ -311,7 +321,7 @@ As well as applying to <b>pcre2_match()</b>, the depth limit also controls
 the depth of recursive function calls in <b>pcre2_dfa_match()</b>. These are
 the depth of recursive function calls in <b>pcre2_dfa_match()</b>. These are
 used for lookaround assertions, atomic groups, and recursion within patterns.
 used for lookaround assertions, atomic groups, and recursion within patterns.
 The limit does not apply to JIT matching.
 The limit does not apply to JIT matching.
-</P>
+<a name="createtables"></a></P>
 <br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
 <br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
 <P>
 <P>
 PCRE2 uses fixed tables for processing characters whose code points are less
 PCRE2 uses fixed tables for processing characters whose code points are less
@@ -322,12 +332,34 @@ only. If you add
   --enable-rebuild-chartables
   --enable-rebuild-chartables
 </pre>
 </pre>
 to the <b>configure</b> command, the distributed tables are no longer used.
 to the <b>configure</b> command, the distributed tables are no longer used.
-Instead, a program called <b>dftables</b> is compiled and run. This outputs the
-source for new set of tables, created in the default locale of your C run-time
-system. This method of replacing the tables does not work if you are cross
-compiling, because <b>dftables</b> is run on the local host. If you need to
-create alternative tables when cross compiling, you will have to do so "by
-hand".
+Instead, a program called <b>pcre2_dftables</b> is compiled and run. This
+outputs the source for new set of tables, created in the default locale of your
+C run-time system. This method of replacing the tables does not work if you are
+cross compiling, because <b>pcre2_dftables</b> needs to be run on the local
+host and therefore not compiled with the cross compiler.
+</P>
+<P>
+If you need to create alternative tables when cross compiling, you will have to
+do so "by hand". There may also be other reasons for creating tables manually.
+To cause <b>pcre2_dftables</b> to be built on the local host, run a normal
+compiling command, and then run the program with the output file as its
+argument, for example:
+<pre>
+  cc src/pcre2_dftables.c -o pcre2_dftables
+  ./pcre2_dftables src/pcre2_chartables.c
+</pre>
+This builds the tables in the default locale of the local host. If you want to
+specify a locale, you must use the -L option:
+<pre>
+  LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c
+</pre>
+You can also specify -b (with or without -L). This causes the tables to be
+written in binary instead of as source code. A set of binary tables can be
+loaded into memory by an application and passed to <b>pcre2_compile()</b> in the
+same way as tables created by calling <b>pcre2_maketables()</b>. The tables are
+just a string of bytes, independent of hardware characteristics such as
+endianness. This means they can be bundled with an application that runs in
+different environments, to ensure consistent behaviour.
 </P>
 </P>
 <br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>
 <br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>
 <P>
 <P>
@@ -366,12 +398,15 @@ environment.
 </P>
 </P>
 <br><a name="SEC14" href="#TOC1">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a><br>
 <br><a name="SEC14" href="#TOC1">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a><br>
 <P>
 <P>
-By default, on non-Windows systems, <b>pcre2grep</b> supports the use of
-callouts with string arguments within the patterns it is matching, in order to
-run external scripts. For details, see the
+By default <b>pcre2grep</b> supports the use of callouts with string arguments
+within the patterns it is matching. There are two kinds: one that generates
+output using local code, and another that calls an external program or script.
+If --disable-pcre2grep-callout-fork is added to the <b>configure</b> command,
+only the first kind of callout is supported; if --disable-pcre2grep-callout is
+used, all callouts are completely ignored. For more details of <b>pcre2grep</b>
+callouts, see the
 <a href="pcre2grep.html"><b>pcre2grep</b></a>
 <a href="pcre2grep.html"><b>pcre2grep</b></a>
-documentation. This support can be disabled by adding
---disable-pcre2grep-callout to the <b>configure</b> command.
+documentation.
 </P>
 </P>
 <br><a name="SEC15" href="#TOC1">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
 <br><a name="SEC15" href="#TOC1">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
 <P>
 <P>
@@ -390,18 +425,18 @@ they are not.
 <P>
 <P>
 <b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
 <b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
 scanning, in order to be able to output "before" and "after" lines when it
 scanning, in order to be able to output "before" and "after" lines when it
-finds a match. The starting size of the buffer is controlled by a parameter
-whose default value is 20K. The buffer itself is three times this size, but
-because of the way it is used for holding "before" lines, the longest line that
-is guaranteed to be processable is the parameter size. If a longer line is
-encountered, <b>pcre2grep</b> automatically expands the buffer, up to a
-specified maximum size, whose default is 1M or the starting size, whichever is
-the larger. You can change the default parameter values by adding, for example,
+finds a match. The default starting size of the buffer is 20KiB. The buffer
+itself is three times this size, but because of the way it is used for holding
+"before" lines, the longest line that is guaranteed to be processable is the
+notional buffer size. If a longer line is encountered, <b>pcre2grep</b>
+automatically expands the buffer, up to a specified maximum size, whose default
+is 1MiB or the starting size, whichever is the larger. You can change the
+default parameter values by adding, for example,
 <pre>
 <pre>
   --with-pcre2grep-bufsize=51200
   --with-pcre2grep-bufsize=51200
   --with-pcre2grep-max-bufsize=2097152
   --with-pcre2grep-max-bufsize=2097152
 </pre>
 </pre>
-to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override
+to the <b>configure</b> command. The caller of <b>pcre2grep</b> can override
 these values by using --buffer-size and --max-buffer-size on the command line.
 these values by using --buffer-size and --max-buffer-size on the command line.
 </P>
 </P>
 <br><a name="SEC17" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
 <br><a name="SEC17" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
@@ -514,7 +549,22 @@ This cleans all coverage data including the generated coverage report. For more
 information about code coverage, see the <b>gcov</b> and <b>lcov</b>
 information about code coverage, see the <b>gcov</b> and <b>lcov</b>
 documentation.
 documentation.
 </P>
 </P>
-<br><a name="SEC21" href="#TOC1">SUPPORT FOR FUZZERS</a><br>
+<br><a name="SEC21" href="#TOC1">DISABLING THE Z AND T FORMATTING MODIFIERS</a><br>
+<P>
+The C99 standard defines formatting modifiers z and t for size_t and
+ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in
+environments other than old versions of Microsoft Visual Studio when
+__STDC_VERSION__ is defined and has a value greater than or equal to 199901L
+(indicating support for C99).
+However, there is at least one environment that claims to be C99 but does not
+support these modifiers. If
+<pre>
+  --disable-percent-zt
+</pre>
+is specified, no use is made of the z or t modifiers. Instead of %td or %zu,
+a suitable format is used depending in the size of long for the platform.
+</P>
+<br><a name="SEC22" href="#TOC1">SUPPORT FOR FUZZERS</a><br>
 <P>
 <P>
 There is a special option for use by people who want to run fuzzing tests on
 There is a special option for use by people who want to run fuzzing tests on
 PCRE2:
 PCRE2:
@@ -533,12 +583,12 @@ generated from the string.
 Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
 Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
 to be created. This is normally run under valgrind or used when PCRE2 is
 to be created. This is normally run under valgrind or used when PCRE2 is
 compiled with address sanitizing enabled. It calls the fuzzing function and
 compiled with address sanitizing enabled. It calls the fuzzing function and
-outputs information about it is doing. The input strings are specified by
+outputs information about what it is doing. The input strings are specified by
 arguments: if an argument starts with "=" the rest of it is a literal input
 arguments: if an argument starts with "=" the rest of it is a literal input
 string. Otherwise, it is assumed to be a file name, and the contents of the
 string. Otherwise, it is assumed to be a file name, and the contents of the
 file are the test string.
 file are the test string.
 </P>
 </P>
-<br><a name="SEC22" href="#TOC1">OBSOLETE OPTION</a><br>
+<br><a name="SEC23" href="#TOC1">OBSOLETE OPTION</a><br>
 <P>
 <P>
 In versions of PCRE2 prior to 10.30, there were two ways of handling
 In versions of PCRE2 prior to 10.30, there were two ways of handling
 backtracking in the <b>pcre2_match()</b> function. The default was to use the
 backtracking in the <b>pcre2_match()</b> function. The default was to use the
@@ -550,11 +600,11 @@ was set, memory on the heap was used. From release 10.30 onwards this has
 changed (the stack is no longer used) and this option now does nothing except
 changed (the stack is no longer used) and this option now does nothing except
 give a warning.
 give a warning.
 </P>
 </P>
-<br><a name="SEC23" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
 <P>
 <P>
 <b>pcre2api</b>(3), <b>pcre2-config</b>(3).
 <b>pcre2api</b>(3), <b>pcre2-config</b>(3).
 </P>
 </P>
-<br><a name="SEC24" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
 <P>
 <P>
 Philip Hazel
 Philip Hazel
 <br>
 <br>
@@ -563,11 +613,11 @@ University Computing Service
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
 </P>
 </P>
-<br><a name="SEC25" href="#TOC1">REVISION</a><br>
+<br><a name="SEC26" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 18 July 2017
+Last updated: 20 March 2020
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2020 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 19 - 9
regex.mod/pcre/doc/html/pcre2callout.html

@@ -44,6 +44,14 @@ a match context (see <b>pcre2_set_callout()</b> in the
 documentation).
 documentation).
 </P>
 </P>
 <P>
 <P>
+When using the <b>pcre2_substitute()</b> function, an additional callout feature
+is available. This does a callout after each change to the subject string and
+is described in the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+documentation; the rest of this document is concerned with callouts during
+pattern matching.
+</P>
+<P>
 Within a regular expression, (?C&#60;arg&#62;) indicates a point at which the external
 Within a regular expression, (?C&#60;arg&#62;) indicates a point at which the external
 function is to be called. Different callout points can be identified by putting
 function is to be called. Different callout points can be identified by putting
 a number less than 256 after the letter C. The default value is zero.
 a number less than 256 after the letter C. The default value is zero.
@@ -143,7 +151,7 @@ branch, automatic anchoring occurs if all branches are anchorable.
 </P>
 </P>
 <P>
 <P>
 This optimization is disabled, however, if .* is in an atomic group or if there
 This optimization is disabled, however, if .* is in an atomic group or if there
-is a back reference to the capturing group in which it appears. It is also
+is a backreference to the capture group in which it appears. It is also
 disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
 disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
 callouts does not affect it.
 callouts does not affect it.
 </P>
 </P>
@@ -310,10 +318,12 @@ PCRE2_UNSET.
 </P>
 </P>
 <P>
 <P>
 For DFA matching, the <i>offset_vector</i> field points to the ovector that was
 For DFA matching, the <i>offset_vector</i> field points to the ovector that was
-passed to the matching function in the match data block, but it holds no useful
-information at callout time because <b>pcre2_dfa_match()</b> does not support
-substring capturing. The value of <i>capture_top</i> is always 1 and the value
-of <i>capture_last</i> is always 0 for DFA matching.
+passed to the matching function in the match data block for callouts at the top
+level, but to an internal ovector during the processing of pattern recursions,
+lookarounds, and atomic groups. However, these ovectors hold no useful
+information because <b>pcre2_dfa_match()</b> does not support substring
+capturing. The value of <i>capture_top</i> is always 1 and the value of
+<i>capture_last</i> is always 0 for DFA matching.
 </P>
 </P>
 <P>
 <P>
 The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
 The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
@@ -344,8 +354,8 @@ callout before an assertion such as (?=ab) the length is 3. For an an
 alternation bar or a closing parenthesis, the length is one, unless a closing
 alternation bar or a closing parenthesis, the length is one, unless a closing
 parenthesis is followed by a quantifier, in which case its length is included.
 parenthesis is followed by a quantifier, in which case its length is included.
 (This changed in release 10.23. In earlier releases, before an opening
 (This changed in release 10.23. In earlier releases, before an opening
-parenthesis the length was that of the entire subpattern, and before an
-alternation bar or a closing parenthesis the length was zero.)
+parenthesis the length was that of the entire group, and before an alternation
+bar or a closing parenthesis the length was zero.)
 </P>
 </P>
 <P>
 <P>
 The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
 The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
@@ -461,9 +471,9 @@ Cambridge, England.
 </P>
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 22 December 2017
+Last updated: 03 February 2019
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2019 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 84 - 65
regex.mod/pcre/doc/html/pcre2compat.html

@@ -16,10 +16,10 @@ please consult the man page, in case the conversion went wrong.
 DIFFERENCES BETWEEN PCRE2 AND PERL
 DIFFERENCES BETWEEN PCRE2 AND PERL
 </b><br>
 </b><br>
 <P>
 <P>
-This document describes the differences in the ways that PCRE2 and Perl handle
-regular expressions. The differences described here are with respect to Perl
-versions 5.26, but as both Perl and PCRE2 are continually changing, the
-information may sometimes be out of date.
+This document describes some of the differences in the ways that PCRE2 and Perl
+handle regular expressions. The differences described here are with respect to
+Perl version 5.32.0, but as both Perl and PCRE2 are continually changing, the
+information may at times be out of date.
 </P>
 </P>
 <P>
 <P>
 1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does
 1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does
@@ -31,54 +31,66 @@ page.
 2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
 2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
 they do not mean what you might think. For example, (?!a){3} does not assert
 they do not mean what you might think. For example, (?!a){3} does not assert
 that the next three characters are not "a". It just asserts that the next
 that the next three characters are not "a". It just asserts that the next
-character is not "a" three times (in principle: PCRE2 optimizes this to run the
+character is not "a" three times (in principle; PCRE2 optimizes this to run the
 assertion just once). Perl allows some repeat quantifiers on other assertions,
 assertion just once). Perl allows some repeat quantifiers on other assertions,
-for example, \b* (but not \b{3}), but these do not seem to have any use.
+for example, \b* (but not \b{3}, though oddly it does allow ^{3}), but these
+do not seem to have any use. PCRE2 does not allow any kind of quantifier on
+non-lookaround assertions.
 </P>
 </P>
 <P>
 <P>
-3. Capturing subpatterns that occur inside negative lookaround assertions are
-counted, but their entries in the offsets vector are set only when a negative
-assertion is a condition that has a matching branch (that is, the condition is
-false).
+3. Capture groups that occur inside negative lookaround assertions are counted,
+but their entries in the offsets vector are set only when a negative assertion
+is a condition that has a matching branch (that is, the condition is false).
+Perl may set such capture groups in other circumstances.
 </P>
 </P>
 <P>
 <P>
-4. The following Perl escape sequences are not supported: \l, \u, \L,
-\U, and \N when followed by a character name or Unicode value. (\N on its
-own, matching a non-newline character, is supported.) In fact these are
+4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
+\U, and \N when followed by a character name. \N on its own, matching a
+non-newline character, and \N{U+dd..}, matching a Unicode code point, are
+supported. The escapes that modify the case of following letters are
 implemented by Perl's general string-handling and are not part of its pattern
 implemented by Perl's general string-handling and are not part of its pattern
 matching engine. If any of these are encountered by PCRE2, an error is
 matching engine. If any of these are encountered by PCRE2, an error is
-generated by default. However, if the PCRE2_ALT_BSUX option is set,
-\U and \u are interpreted as ECMAScript interprets them.
+generated by default. However, if either of the PCRE2_ALT_BSUX or
+PCRE2_EXTRA_ALT_BSUX options is set, \U and \u are interpreted as ECMAScript
+interprets them.
 </P>
 </P>
 <P>
 <P>
 5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
 5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
 built with Unicode support (the default). The properties that can be tested
 with \p and \P are limited to the general category properties such as Lu and
 with \p and \P are limited to the general category properties such as Lu and
 Nd, script names such as Greek or Han, and the derived properties Any and L&.
 Nd, script names such as Greek or Han, and the derived properties Any and L&.
-PCRE2 does support the Cs (surrogate) property, which Perl does not; the Perl
-documentation says "Because Perl hides the need for the user to understand the
-internal representation of Unicode characters, there is no need to implement
-the somewhat messy concept of surrogates."
+Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
+is limited. See the
+<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
+documentation for details. The long synonyms for property names that Perl
+supports (such as \p{Letter}) are not supported by PCRE2, nor is it permitted
+to prefix any of these properties with "Is".
 </P>
 </P>
 <P>
 <P>
-6. PCRE2 does support the \Q...\E escape for quoting substrings. Characters
-in between are treated as literals. This is slightly different from Perl in
-that $ and @ are also handled as literals inside the quotes. In Perl, they
-cause variable interpolation (but of course PCRE2 does not have variables).
-Note the following examples:
+6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
+in between are treated as literals. However, this is slightly different from
+Perl in that $ and @ are also handled as literals inside the quotes. In Perl,
+they cause variable interpolation (but of course PCRE2 does not have
+variables). Also, Perl does "double-quotish backslash interpolation" on any
+backslashes between \Q and \E which, its documentation says, "may lead to
+confusing results". PCRE2 treats a backslash between \Q and \E just like any
+other character. Note the following examples:
 <pre>
 <pre>
-    Pattern            PCRE2 matches      Perl matches
+    Pattern            PCRE2 matches     Perl matches
 
 
     \Qabc$xyz\E        abc$xyz           abc followed by the contents of $xyz
     \Qabc$xyz\E        abc$xyz           abc followed by the contents of $xyz
     \Qabc\$xyz\E       abc\$xyz          abc\$xyz
     \Qabc\$xyz\E       abc\$xyz          abc\$xyz
     \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
     \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
+    \QA\B\E            A\B               A\B
+    \Q\\E              \                 \\E
 </pre>
 </pre>
-The \Q...\E sequence is recognized both inside and outside character classes.
+The \Q...\E sequence is recognized both inside and outside character classes
+by both PCRE2 and Perl.
 </P>
 </P>
 <P>
 <P>
 7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
 7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
-constructions. However, there is support PCRE2's "callout" feature, which
-allows an external function to be called during pattern matching. See the
+constructions. However, PCRE2 does have a "callout" feature, which allows an
+external function to be called during pattern matching. See the
 <a href="pcre2callout.html"><b>pcre2callout</b></a>
 <a href="pcre2callout.html"><b>pcre2callout</b></a>
 documentation for details.
 documentation for details.
 </P>
 </P>
@@ -88,12 +100,12 @@ to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
 into subroutine calls is now supported, as in Perl.
 into subroutine calls is now supported, as in Perl.
 </P>
 </P>
 <P>
 <P>
-9. If any of the backtracking control verbs are used in a subpattern that is
-called as a subroutine (whether or not recursively), their effect is confined
-to that subpattern; it does not extend to the surrounding pattern. This is not
-always the case in Perl. In particular, if (*THEN) is present in a group that
-is called as a subroutine, its action is limited to that group, even if the
-group does not contain any | characters. Note that such subpatterns are
+9. In PCRE2, if any of the backtracking control verbs are used in a group that
+is called as a subroutine (whether or not recursively), their effect is
+confined to that group; it does not extend to the surrounding pattern. This is
+not always the case in Perl. In particular, if (*THEN) is present in a group
+that is called as a subroutine, its action is limited to that group, even if
+the group does not contain any | characters. Note that such groups are
 processed as anchored at the point where they are tested.
 processed as anchored at the point where they are tested.
 </P>
 </P>
 <P>
 <P>
@@ -104,59 +116,61 @@ triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
 same as PCRE2, but there are cases where it differs.
 same as PCRE2, but there are cases where it differs.
 </P>
 </P>
 <P>
 <P>
-11. Most backtracking verbs in assertions have their normal actions. They are
-not confined to the assertion.
-</P>
-<P>
-12. There are some differences that are concerned with the settings of captured
+11. There are some differences that are concerned with the settings of captured
 strings when part of a pattern is repeated. For example, matching "aba" against
 strings when part of a pattern is repeated. For example, matching "aba" against
 the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
 the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
 "b".
 "b".
 </P>
 </P>
 <P>
 <P>
-13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern
-names is not as general as Perl's. This is a consequence of the fact the PCRE2
-works internally just with numbers, using an external table to translate
-between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B),
-where the two capturing parentheses have the same number but different names,
-is not supported, and causes an error at compile time. If it were allowed, it
-would not be possible to distinguish which parentheses matched, because both
-names map to capturing subpattern number 1. To avoid this confusing situation,
-an error is given at compile time.
+12. PCRE2's handling of duplicate capture group numbers and names is not as
+general as Perl's. This is a consequence of the fact the PCRE2 works internally
+just with numbers, using an external table to translate between numbers and
+names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B)), where the two
+capture groups have the same number but different names, is not supported, and
+causes an error at compile time. If it were allowed, it would not be possible
+to distinguish which group matched, because both names map to capture group
+number 1. To avoid this confusing situation, an error is given at compile time.
 </P>
 </P>
 <P>
 <P>
-14. Perl used to recognize comments in some places that PCRE2 does not, for
-example, between the ( and ? at the start of a subpattern. If the /x modifier
-is set, Perl allowed white space between ( and ? though the latest Perls give
-an error (for a while it was just deprecated). There may still be some cases
-where Perl behaves differently.
+13. Perl used to recognize comments in some places that PCRE2 does not, for
+example, between the ( and ? at the start of a group. If the /x modifier is
+set, Perl allowed white space between ( and ? though the latest Perls give an
+error (for a while it was just deprecated). There may still be some cases where
+Perl behaves differently.
 </P>
 </P>
 <P>
 <P>
-15. Perl, when in warning mode, gives warnings for character classes such as
+14. Perl, when in warning mode, gives warnings for character classes such as
 [A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
 [A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
 warning features, so it gives an error in these cases because they are almost
 warning features, so it gives an error in these cases because they are almost
 certainly user mistakes.
 certainly user mistakes.
 </P>
 </P>
 <P>
 <P>
-16. In PCRE2, the upper/lower case character properties Lu and Ll are not
+15. In PCRE2, the upper/lower case character properties Lu and Ll are not
 affected when case-independent matching is specified. For example, \p{Lu}
 affected when case-independent matching is specified. For example, \p{Lu}
 always matches an upper case letter. I think Perl has changed in this respect;
 always matches an upper case letter. I think Perl has changed in this respect;
-in the release at the time of writing (5.24), \p{Lu} and \p{Ll} match all
+in the release at the time of writing (5.32), \p{Lu} and \p{Ll} match all
 letters, regardless of case, when case independence is specified.
 letters, regardless of case, when case independence is specified.
 </P>
 </P>
 <P>
 <P>
+16. From release 5.32.0, Perl locks out the use of \K in lookaround
+assertions. From release 10.38 PCRE2 does the same by default. However, there
+is an option for re-enabling the previous behaviour. When this option is set,
+\K is acted on when it occurs in positive assertions, but is ignored in
+negative assertions.
+</P>
+<P>
 17. PCRE2 provides some extensions to the Perl regular expression facilities.
 17. PCRE2 provides some extensions to the Perl regular expression facilities.
-Perl 5.10 includes new features that are not in earlier versions of Perl, some
+Perl 5.10 included new features that were not in earlier versions of Perl, some
 of which (such as named parentheses) were in PCRE2 for some time before. This
 of which (such as named parentheses) were in PCRE2 for some time before. This
-list is with respect to Perl 5.26:
+list is with respect to Perl 5.32:
 <br>
 <br>
 <br>
 <br>
 (a) Although lookbehind assertions in PCRE2 must match fixed length strings,
 (a) Although lookbehind assertions in PCRE2 must match fixed length strings,
-each alternative branch of a lookbehind assertion can match a different length
-of string. Perl requires them all to have the same length.
+each alternative toplevel branch of a lookbehind assertion can match a
+different length of string. Perl requires them all to have the same length.
 <br>
 <br>
 <br>
 <br>
-(b) From PCRE2 10.23, back references to groups of fixed length are supported
+(b) From PCRE2 10.23, backreferences to groups of fixed length are supported
 in lookbehinds, provided that there is no possibility of referencing a
 in lookbehinds, provided that there is no possibility of referencing a
 non-unique number or name. Perl does not support backreferences in lookbehinds.
 non-unique number or name. Perl does not support backreferences in lookbehinds.
 <br>
 <br>
@@ -198,8 +212,13 @@ different way and is not Perl-compatible.
 <br>
 <br>
 <br>
 <br>
 (l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
 (l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
-the start of a pattern that set overall options that cannot be changed within
+the start of a pattern. These set overall options that cannot be changed within
 the pattern.
 the pattern.
+<br>
+<br>
+(m) PCRE2 supports non-atomic positive lookaround assertions. This is an
+extension to the lookaround facilities. The default, Perl-compatible
+lookarounds are atomic.
 </P>
 </P>
 <P>
 <P>
 18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
 18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
@@ -220,7 +239,7 @@ AUTHOR
 <P>
 <P>
 Philip Hazel
 Philip Hazel
 <br>
 <br>
-University Computing Service
+Retired from University Computing Service
 <br>
 <br>
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
@@ -229,9 +248,9 @@ Cambridge, England.
 REVISION
 REVISION
 </b><br>
 </b><br>
 <P>
 <P>
-Last updated: 18 April 2017
+Last updated: 30 August 2021
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 6 - 5
regex.mod/pcre/doc/html/pcre2convert.html

@@ -105,7 +105,8 @@ If <b>buffer</b> points to a NULL pointer, an output buffer is obtained using
 the allocator in the context or <b>malloc()</b> if no context is supplied. A
 the allocator in the context or <b>malloc()</b> if no context is supplied. A
 pointer to this buffer is placed in the variable to which <b>buffer</b> points.
 pointer to this buffer is placed in the variable to which <b>buffer</b> points.
 When no longer needed the output buffer must be freed by calling
 When no longer needed the output buffer must be freed by calling
-<b>pcre2_converted_pattern_free()</b>.
+<b>pcre2_converted_pattern_free()</b>. If this function is called with a NULL
+argument, it returns immediately without doing anything.
 </P>
 </P>
 <P>
 <P>
 If <b>buffer</b> points to a non-NULL pointer, <b>blength</b> must be set to the
 If <b>buffer</b> points to a non-NULL pointer, <b>blength</b> must be set to the
@@ -140,8 +141,8 @@ permitted to match separator characters, but the double-star (**) feature
 </P>
 </P>
 <P>
 <P>
 PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to
 PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to
-match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the
-double-star feature disabled. These options may be given together.
+match separator characters. PCRE2_CONVERT_GLOB_NO_STARSTAR matches globs with
+the double-star feature disabled. These options may be given together.
 </P>
 </P>
 <br><a name="SEC5" href="#TOC1">CONVERTING POSIX PATTERNS</a><br>
 <br><a name="SEC5" href="#TOC1">CONVERTING POSIX PATTERNS</a><br>
 <P>
 <P>
@@ -181,9 +182,9 @@ Cambridge, England.
 </P>
 </P>
 <br><a name="SEC7" href="#TOC1">REVISION</a><br>
 <br><a name="SEC7" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 12 July 2017
+Last updated: 28 June 2018
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 27 - 18
regex.mod/pcre/doc/html/pcre2demo.html

@@ -104,12 +104,11 @@ uint32_t newline;
 
 
 PCRE2_SIZE erroroffset;
 PCRE2_SIZE erroroffset;
 PCRE2_SIZE *ovector;
 PCRE2_SIZE *ovector;
+PCRE2_SIZE subject_length;
 
 
-size_t subject_length;
 pcre2_match_data *match_data;
 pcre2_match_data *match_data;
 
 
 
 
-
 /**************************************************************************
 /**************************************************************************
 * First, sort out the command line. There is only one possible option at  *
 * First, sort out the command line. There is only one possible option at  *
 * the moment, "-g" to request repeated matching to find all occurrences,  *
 * the moment, "-g" to request repeated matching to find all occurrences,  *
@@ -138,12 +137,14 @@ if (argc - i != 2)
   return 1;
   return 1;
   }
   }
 
 
-/* As pattern and subject are char arguments, they can be straightforwardly
-cast to PCRE2_SPTR as we are working in 8-bit code units. */
+/* Pattern and subject are char arguments, so they can be straightforwardly
+cast to PCRE2_SPTR because we are working in 8-bit code units. The subject
+length is cast to PCRE2_SIZE for completeness, though PCRE2_SIZE is in fact
+defined to be size_t. */
 
 
 pattern = (PCRE2_SPTR)argv[i];
 pattern = (PCRE2_SPTR)argv[i];
 subject = (PCRE2_SPTR)argv[i+1];
 subject = (PCRE2_SPTR)argv[i+1];
-subject_length = strlen((char *)subject);
+subject_length = (PCRE2_SIZE)strlen((char *)subject);
 
 
 
 
 /*************************************************************************
 /*************************************************************************
@@ -172,17 +173,22 @@ if (re == NULL)
 
 
 
 
 /*************************************************************************
 /*************************************************************************
-* If the compilation succeeded, we call PCRE again, in order to do a     *
+* If the compilation succeeded, we call PCRE2 again, in order to do a    *
 * pattern match against the subject string. This does just ONE match. If *
 * pattern match against the subject string. This does just ONE match. If *
 * further matching is needed, it will be done below. Before running the  *
 * further matching is needed, it will be done below. Before running the  *
-* match we must set up a match_data block for holding the result.        *
+* match we must set up a match_data block for holding the result. Using  *
+* pcre2_match_data_create_from_pattern() ensures that the block is       *
+* exactly the right size for the number of capturing parentheses in the  *
+* pattern. If you need to know the actual size of a match_data block as  *
+* a number of bytes, you can find it like this:                          *
+*                                                                        *
+* PCRE2_SIZE match_data_size = pcre2_get_match_data_size(match_data);    *
 *************************************************************************/
 *************************************************************************/
 
 
-/* Using this function ensures that the block is exactly the right size for
-the number of capturing parentheses in the pattern. */
-
 match_data = pcre2_match_data_create_from_pattern(re, NULL);
 match_data = pcre2_match_data_create_from_pattern(re, NULL);
 
 
+/* Now run the match. */
+
 rc = pcre2_match(
 rc = pcre2_match(
   re,                   /* the compiled pattern */
   re,                   /* the compiled pattern */
   subject,              /* the subject string */
   subject,              /* the subject string */
@@ -205,12 +211,12 @@ if (rc &lt; 0)
     default: printf("Matching error %d\n", rc); break;
     default: printf("Matching error %d\n", rc); break;
     }
     }
   pcre2_match_data_free(match_data);   /* Release memory used for the match */
   pcre2_match_data_free(match_data);   /* Release memory used for the match */
-  pcre2_code_free(re);                 /* data and the compiled pattern. */
+  pcre2_code_free(re);                 /*   data and the compiled pattern. */
   return 1;
   return 1;
   }
   }
 
 
-/* Match succeded. Get a pointer to the output vector, where string offsets are
-stored. */
+/* Match succeeded. Get a pointer to the output vector, where string offsets
+are stored. */
 
 
 ovector = pcre2_get_ovector_pointer(match_data);
 ovector = pcre2_get_ovector_pointer(match_data);
 printf("Match succeeded at offset %d\n", (int)ovector[0]);
 printf("Match succeeded at offset %d\n", (int)ovector[0]);
@@ -228,9 +234,12 @@ pcre2_match_data_create_from_pattern() above. */
 if (rc == 0)
 if (rc == 0)
   printf("ovector was not big enough for all the captured substrings\n");
   printf("ovector was not big enough for all the captured substrings\n");
 
 
-/* We must guard against patterns such as /(?=.\K)/ that use \K in an assertion
-to set the start of a match later than its end. In this demonstration program,
-we just detect this case and give up. */
+/* Since release 10.38 PCRE2 has locked out the use of \K in lookaround
+assertions. However, there is an option to re-enable the old behaviour. If that
+is set, it is possible to run patterns such as /(?=.\K)/ that use \K in an
+assertion to set the start of a match later than its end. In this demonstration
+program, we show how to detect this case, but it shouldn't arise because the
+option is never set. */
 
 
 if (ovector[0] &gt; ovector[1])
 if (ovector[0] &gt; ovector[1])
   {
   {
@@ -249,7 +258,7 @@ application you might want to do things other than print them. */
 for (i = 0; i &lt; rc; i++)
 for (i = 0; i &lt; rc; i++)
   {
   {
   PCRE2_SPTR substring_start = subject + ovector[2*i];
   PCRE2_SPTR substring_start = subject + ovector[2*i];
-  size_t substring_length = ovector[2*i+1] - ovector[2*i];
+  PCRE2_SIZE substring_length = ovector[2*i+1] - ovector[2*i];
   printf("%2d: %.*s\n", i, (int)substring_length, (char *)substring_start);
   printf("%2d: %.*s\n", i, (int)substring_length, (char *)substring_start);
   }
   }
 
 
@@ -447,7 +456,7 @@ for (;;)
     return 1;
     return 1;
     }
     }
 
 
-  /* Match succeded */
+  /* Match succeeded */
 
 
   printf("\nMatch succeeded again at offset %d\n", (int)ovector[0]);
   printf("\nMatch succeeded again at offset %d\n", (int)ovector[0]);
 
 

+ 294 - 190
regex.mod/pcre/doc/html/pcre2grep.html

@@ -17,17 +17,18 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
 <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
 <li><a name="TOC3" href="#SEC3">SUPPORT FOR COMPRESSED FILES</a>
 <li><a name="TOC3" href="#SEC3">SUPPORT FOR COMPRESSED FILES</a>
 <li><a name="TOC4" href="#SEC4">BINARY FILES</a>
 <li><a name="TOC4" href="#SEC4">BINARY FILES</a>
-<li><a name="TOC5" href="#SEC5">OPTIONS</a>
-<li><a name="TOC6" href="#SEC6">ENVIRONMENT VARIABLES</a>
-<li><a name="TOC7" href="#SEC7">NEWLINES</a>
-<li><a name="TOC8" href="#SEC8">OPTIONS COMPATIBILITY</a>
-<li><a name="TOC9" href="#SEC9">OPTIONS WITH DATA</a>
-<li><a name="TOC10" href="#SEC10">USING PCRE2'S CALLOUT FACILITY</a>
-<li><a name="TOC11" href="#SEC11">MATCHING ERRORS</a>
-<li><a name="TOC12" href="#SEC12">DIAGNOSTICS</a>
-<li><a name="TOC13" href="#SEC13">SEE ALSO</a>
-<li><a name="TOC14" href="#SEC14">AUTHOR</a>
-<li><a name="TOC15" href="#SEC15">REVISION</a>
+<li><a name="TOC5" href="#SEC5">BINARY ZEROS IN PATTERNS</a>
+<li><a name="TOC6" href="#SEC6">OPTIONS</a>
+<li><a name="TOC7" href="#SEC7">ENVIRONMENT VARIABLES</a>
+<li><a name="TOC8" href="#SEC8">NEWLINES</a>
+<li><a name="TOC9" href="#SEC9">OPTIONS COMPATIBILITY</a>
+<li><a name="TOC10" href="#SEC10">OPTIONS WITH DATA</a>
+<li><a name="TOC11" href="#SEC11">USING PCRE2'S CALLOUT FACILITY</a>
+<li><a name="TOC12" href="#SEC12">MATCHING ERRORS</a>
+<li><a name="TOC13" href="#SEC13">DIAGNOSTICS</a>
+<li><a name="TOC14" href="#SEC14">SEE ALSO</a>
+<li><a name="TOC15" href="#SEC15">AUTHOR</a>
+<li><a name="TOC16" href="#SEC16">REVISION</a>
 </ul>
 </ul>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
 <P>
 <P>
@@ -85,9 +86,10 @@ controlled by parameters that can be set by the <b>--buffer-size</b> and
 that is obtained at the start of processing. If an input file contains very
 that is obtained at the start of processing. If an input file contains very
 long lines, a larger buffer may be needed; this is handled by automatically
 long lines, a larger buffer may be needed; this is handled by automatically
 extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
 extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
-default values for these parameters are specified when <b>pcre2grep</b> is
-built, with the default defaults being 20K and 1M respectively. An error occurs
-if a line is too long and the buffer can no longer be expanded.
+default values for these parameters can be set when <b>pcre2grep</b> is
+built; if nothing is specified, the defaults are set to 20KiB and 1MiB
+respectively. An error occurs if a line is too long and the buffer can no
+longer be expanded.
 </P>
 </P>
 <P>
 <P>
 The block of memory that is actually used is three times the "buffer size", to
 The block of memory that is actually used is three times the "buffer size", to
@@ -95,7 +97,7 @@ allow for buffering "before" and "after" lines. If the buffer size is too
 small, fewer than requested "before" and "after" lines may be output.
 small, fewer than requested "before" and "after" lines may be output.
 </P>
 </P>
 <P>
 <P>
-Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
+Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
 BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>. When there is more than one pattern
 BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>. When there is more than one pattern
 (specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
 (specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
 each line in the order in which they are defined, except that all the <b>-e</b>
 each line in the order in which they are defined, except that all the <b>-e</b>
@@ -109,8 +111,8 @@ matching substrings, or if <b>--only-matching</b>, <b>--file-offsets</b>, or
 (either shown literally, or as an offset), scanning resumes immediately
 (either shown literally, or as an offset), scanning resumes immediately
 following the match, so that further matches on the same line can be found. If
 following the match, so that further matches on the same line can be found. If
 there are multiple patterns, they are all tried on the remainder of the line,
 there are multiple patterns, they are all tried on the remainder of the line,
-but patterns that follow the one that matched are not tried on the earlier part
-of the line.
+but patterns that follow the one that matched are not tried on the earlier
+matched part of the line.
 </P>
 </P>
 <P>
 <P>
 This behaviour means that the order in which multiple patterns are specified
 This behaviour means that the order in which multiple patterns are specified
@@ -144,13 +146,18 @@ ignored.
 <br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
 <br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
 <P>
 <P>
 By default, a file that contains a binary zero byte within the first 1024 bytes
 By default, a file that contains a binary zero byte within the first 1024 bytes
-is identified as a binary file, and is processed specially. (GNU grep
-identifies binary files in this manner.) However, if the newline type is
-specified as "nul", that is, the line terminator is a binary zero, the test for
-a binary file is not applied. See the <b>--binary-files</b> option for a means
-of changing the way binary files are handled.
+is identified as a binary file, and is processed specially. However, if the
+newline type is specified as NUL, that is, the line terminator is a binary
+zero, the test for a binary file is not applied. See the <b>--binary-files</b>
+option for a means of changing the way binary files are handled.
 </P>
 </P>
-<br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
+<br><a name="SEC5" href="#TOC1">BINARY ZEROS IN PATTERNS</a><br>
+<P>
+Patterns passed from the command line are strings that are terminated by a
+binary zero, so cannot contain internal zeros. However, patterns that are read
+from a file via the <b>-f</b> option may contain binary zeros.
+</P>
+<br><a name="SEC6" href="#TOC1">OPTIONS</a><br>
 <P>
 <P>
 The order in which some of the options appear can affect the output. For
 The order in which some of the options appear can affect the output. For
 example, both the <b>-H</b> and <b>-l</b> options affect the printing of file
 example, both the <b>-H</b> and <b>-l</b> options affect the printing of file
@@ -181,6 +188,12 @@ Treat binary files as text. This is equivalent to
 <b>--binary-files</b>=<i>text</i>.
 <b>--binary-files</b>=<i>text</i>.
 </P>
 </P>
 <P>
 <P>
+<b>--allow-lookaround-bsk</b>
+PCRE2 now forbids the use of \K in lookarounds by default, in line with Perl.
+This option causes <b>pcre2grep</b> to set the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
+option, which enables this somewhat dangerous usage.
+</P>
+<P>
 <b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
 <b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
 Output up to <i>number</i> lines of context before each matching line. Fewer
 Output up to <i>number</i> lines of context before each matching line. Fewer
 lines are output if the previous match or the start of the file is within
 lines are output if the previous match or the start of the file is within
@@ -355,12 +368,15 @@ files; it does not apply to patterns specified by any of the <b>--include</b> or
 <P>
 <P>
 <b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
 <b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
 Read patterns from the file, one per line, and match them against each line of
 Read patterns from the file, one per line, and match them against each line of
-input. What constitutes a newline when reading the file is the operating
-system's default. The <b>--newline</b> option has no effect on this option.
-Trailing white space is removed from each line, and blank lines are ignored. An
-empty file contains no patterns and therefore matches nothing. See also the
-comments about multiple patterns versus a single pattern with alternatives in
-the description of <b>-e</b> above.
+input. As is the case with patterns on the command line, no delimiters should
+be used. What constitutes a newline when reading the file is the operating
+system's default interpretation of \n. The <b>--newline</b> option has no
+effect on this option. Trailing white space is removed from each line, and
+blank lines are ignored. An empty file contains no patterns and therefore
+matches nothing. Patterns read from a file in this way may contain binary
+zeros, which are treated as ordinary data characters. See also the comments
+about multiple patterns versus a single pattern with alternatives in the
+description of <b>-e</b> above.
 <br>
 <br>
 <br>
 <br>
 If this option is given more than once, all the specified files are read. A
 If this option is given more than once, all the specified files are read. A
@@ -373,14 +389,15 @@ command line; all arguments are treated as the names of paths to be searched.
 <P>
 <P>
 <b>--file-list</b>=<i>filename</i>
 <b>--file-list</b>=<i>filename</i>
 Read a list of files and/or directories that are to be scanned from the given
 Read a list of files and/or directories that are to be scanned from the given
-file, one per line. Trailing white space is removed from each line, and blank
-lines are ignored. These paths are processed before any that are listed on the
-command line. The file name can be given as "-" to refer to the standard input.
-If <b>--file</b> and <b>--file-list</b> are both specified as "-", patterns are
-read first. This is useful only when the standard input is a terminal, from
-which further lines (the list of files) can be read after an end-of-file
-indication. If this option is given more than once, all the specified files are
-read.
+file, one per line. What constitutes a newline when reading the file is the
+operating system's default. Trailing white space is removed from each line, and
+blank lines are ignored. These paths are processed before any that are listed
+on the command line. The file name can be given as "-" to refer to the standard
+input. If <b>--file</b> and <b>--file-list</b> are both specified as "-",
+patterns are read first. This is useful only when the standard input is a
+terminal, from which further lines (the list of files) can be read after an
+end-of-file indication. If this option is given more than once, all the
+specified files are read.
 </P>
 </P>
 <P>
 <P>
 <b>--file-offsets</b>
 <b>--file-offsets</b>
@@ -431,8 +448,8 @@ Ignore upper/lower case distinctions during comparisons.
 <P>
 <P>
 <b>--include</b>=<i>pattern</i>
 <b>--include</b>=<i>pattern</i>
 If any <b>--include</b> patterns are specified, the only files that are
 If any <b>--include</b> patterns are specified, the only files that are
-processed are those that match one of the patterns (and do not match an
-<b>--exclude</b> pattern). This option does not affect directories, but it
+processed are those whose names match one of the patterns and do not match an
+<b>--exclude</b> pattern. This option does not affect directories, but it
 applies to all files, whether listed on the command line, obtained from
 applies to all files, whether listed on the command line, obtained from
 <b>--file-list</b>, or by scanning a directory. The pattern is a PCRE2 regular
 <b>--file-list</b>, or by scanning a directory. The pattern is a PCRE2 regular
 expression, and is matched against the final component of the file name, not
 expression, and is matched against the final component of the file name, not
@@ -451,8 +468,8 @@ may be given any number of times; all the files are read.
 <P>
 <P>
 <b>--include-dir</b>=<i>pattern</i>
 <b>--include-dir</b>=<i>pattern</i>
 If any <b>--include-dir</b> patterns are specified, the only directories that
 If any <b>--include-dir</b> patterns are specified, the only directories that
-are processed are those that match one of the patterns (and do not match an
-<b>--exclude-dir</b> pattern). This applies to all directories, whether listed
+are processed are those whose names match one of the patterns and do not match
+an <b>--exclude-dir</b> pattern. This applies to all directories, whether listed
 on the command line, obtained from <b>--file-list</b>, or by scanning a parent
 on the command line, obtained from <b>--file-list</b>, or by scanning a parent
 directory. The pattern is a PCRE2 regular expression, and is matched against
 directory. The pattern is a PCRE2 regular expression, and is matched against
 the final component of the directory name, not the entire path. The <b>-F</b>,
 the final component of the directory name, not the entire path. The <b>-F</b>,
@@ -475,8 +492,9 @@ a separate line. Searching normally stops as soon as a matching line is found
 in a file. However, if the <b>-c</b> (count) option is also used, matching
 in a file. However, if the <b>-c</b> (count) option is also used, matching
 continues in order to obtain the correct count, and those files that have at
 continues in order to obtain the correct count, and those files that have at
 least one match are listed along with their counts. Using this option with
 least one match are listed along with their counts. Using this option with
-<b>-c</b> is a way of suppressing the listing of files with no matches. This
-opeion overrides any previous <b>-H</b>, <b>-h</b>, or <b>-L</b> options.
+<b>-c</b> is a way of suppressing the listing of files with no matches that
+occurs with <b>-c</b> on its own. This option overrides any previous <b>-H</b>,
+<b>-h</b>, or <b>-L</b> options.
 </P>
 </P>
 <P>
 <P>
 <b>--label</b>=<i>name</i>
 <b>--label</b>=<i>name</i>
@@ -489,13 +507,13 @@ short form for this option.
 When this option is given, non-compressed input is read and processed line by
 When this option is given, non-compressed input is read and processed line by
 line, and the output is flushed after each write. By default, input is read in
 line, and the output is flushed after each write. By default, input is read in
 large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
 large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
-terminal (which is currently possible only in Unix-like environments). Output
-to terminal is normally automatically flushed by the operating system. This
-option can be useful when the input or output is attached to a pipe and you do
-not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use
-will affect performance, and the <b>-M</b> (multiline) option ceases to work.
-When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is
-ignored.
+terminal, which is currently possible only in Unix-like environments or
+Windows. Output to terminal is normally automatically flushed by the operating
+system. This option can be useful when the input or output is attached to a
+pipe and you do not want <b>pcre2grep</b> to buffer up large amounts of data.
+However, its use will affect performance, and the <b>-M</b> (multiline) option
+ceases to work. When input is from a compressed .gz or .bz2 file,
+<b>--line-buffered</b> is ignored.
 </P>
 </P>
 <P>
 <P>
 <b>--line-offsets</b>
 <b>--line-offsets</b>
@@ -516,6 +534,49 @@ locale is specified, the PCRE2 library's default (usually the "C" locale) is
 used. There is no short form for this option.
 used. There is no short form for this option.
 </P>
 </P>
 <P>
 <P>
+<b>-M</b>, <b>--multiline</b>
+Allow patterns to match more than one line. When this option is set, the PCRE2
+library is called in "multiline" mode. This allows a matched string to extend
+past the end of a line and continue on one or more subsequent lines. Patterns
+used with <b>-M</b> may usefully contain literal newline characters and internal
+occurrences of ^ and $ characters. The output for a successful match may
+consist of more than one line. The first line is the line in which the match
+started, and the last line is the line in which the match ended. If the matched
+string ends with a newline sequence, the output ends at the end of that line.
+If <b>-v</b> is set, none of the lines in a multi-line match are output. Once a
+match has been handled, scanning restarts at the beginning of the line after
+the one in which the match ended.
+<br>
+<br>
+The newline sequence that separates multiple lines must be matched as part of
+the pattern. For example, to find the phrase "regular expression" in a file
+where "regular" might be at the end of a line and "expression" at the start of
+the next line, you could use this command:
+<pre>
+  pcre2grep -M 'regular\s+expression' &#60;file&#62;
+</pre>
+The \s escape sequence matches any white space character, including newlines,
+and is followed by + so as to match trailing white space on the first line as
+well as possibly handling a two-character newline sequence.
+<br>
+<br>
+There is a limit to the number of lines that can be matched, imposed by the way
+that <b>pcre2grep</b> buffers the input file as it scans it. With a sufficiently
+large processing buffer, this should not be a problem, but the <b>-M</b> option
+does not work when input is read line by line (see <b>--line-buffered</b>.)
+</P>
+<P>
+<b>-m</b> <i>number</i>, <b>--max-count</b>=<i>number</i>
+Stop processing after finding <i>number</i> matching lines, or non-matching
+lines if <b>-v</b> is also set. Any trailing context lines are output after the
+final match. In multiline mode, each multiline match counts as just one line
+for this purpose. If this limit is reached when reading the standard input from
+a regular file, the file is left positioned just after the last matching line.
+If <b>-c</b> is also set, the count that is output is never greater than
+<i>number</i>. This option has no effect if used with <b>-L</b>, <b>-l</b>, or
+<b>-q</b>, or when just checking for a match in a binary file.
+</P>
+<P>
 <b>--match-limit</b>=<i>number</i>
 <b>--match-limit</b>=<i>number</i>
 Processing some regular expression patterns may take a very long time to search
 Processing some regular expression patterns may take a very long time to search
 for all possible matching strings. Others may require a very large amount of
 for all possible matching strings. Others may require a very large amount of
@@ -530,11 +591,11 @@ counter that is incremented each time around its main processing loop. If the
 value set by <b>--match-limit</b> is reached, an error occurs.
 value set by <b>--match-limit</b> is reached, an error occurs.
 <br>
 <br>
 <br>
 <br>
-The <b>--heap-limit</b> option specifies, as a number of kilobytes, the amount
-of heap memory that may be used for matching. Heap memory is needed only if
-matching the pattern requires a significant number of nested backtracking
-points to be remembered. This parameter can be set to zero to forbid the use of
-heap memory altogether.
+The <b>--heap-limit</b> option specifies, as a number of kibibytes (units of
+1024 bytes), the amount of heap memory that may be used for matching. Heap
+memory is needed only if matching the pattern requires a significant number of
+nested backtracking points to be remembered. This parameter can be set to zero
+to forbid the use of heap memory altogether.
 <br>
 <br>
 <br>
 <br>
 The <b>--depth-limit</b> option limits the depth of nested backtracking points,
 The <b>--depth-limit</b> option limits the depth of nested backtracking points,
@@ -545,69 +606,44 @@ limit acts varies from pattern to pattern. This limit is of use only if it is
 set smaller than <b>--match-limit</b>.
 set smaller than <b>--match-limit</b>.
 <br>
 <br>
 <br>
 <br>
-There are no short forms for these options. The default settings are specified
-when the PCRE2 library is compiled, with the default defaults being very large
-and so effectively unlimited.
+There are no short forms for these options. The default limits can be set
+when the PCRE2 library is compiled; if they are not specified, the defaults
+are very large and so effectively unlimited.
 </P>
 </P>
 <P>
 <P>
-\fB--max-buffer-size=<i>number</i>
+<b>--max-buffer-size</b>=<i>number</i>
 This limits the expansion of the processing buffer, whose initial size can be
 This limits the expansion of the processing buffer, whose initial size can be
 set by <b>--buffer-size</b>. The maximum buffer size is silently forced to be no
 set by <b>--buffer-size</b>. The maximum buffer size is silently forced to be no
 smaller than the starting buffer size.
 smaller than the starting buffer size.
 </P>
 </P>
 <P>
 <P>
-<b>-M</b>, <b>--multiline</b>
-Allow patterns to match more than one line. When this option is set, the PCRE2
-library is called in "multiline" mode. This allows a matched string to extend
-past the end of a line and continue on one or more subsequent lines. Patterns
-used with <b>-M</b> may usefully contain literal newline characters and internal
-occurrences of ^ and $ characters. The output for a successful match may
-consist of more than one line. The first line is the line in which the match
-started, and the last line is the line in which the match ended. If the matched
-string ends with a newline sequence, the output ends at the end of that line.
-If <b>-v</b> is set, none of the lines in a multi-line match are output. Once a
-match has been handled, scanning restarts at the beginning of the line after
-the one in which the match ended.
-<br>
-<br>
-The newline sequence that separates multiple lines must be matched as part of
-the pattern. For example, to find the phrase "regular expression" in a file
-where "regular" might be at the end of a line and "expression" at the start of
-the next line, you could use this command:
+<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
+Six different conventions for indicating the ends of lines in scanned files are
+supported. For example:
 <pre>
 <pre>
-  pcre2grep -M 'regular\s+expression' &#60;file&#62;
+  pcre2grep -N CRLF 'some pattern' &#60;file&#62;
 </pre>
 </pre>
-The \s escape sequence matches any white space character, including newlines,
-and is followed by + so as to match trailing white space on the first line as
-well as possibly handling a two-character newline sequence.
-<br>
-<br>
-There is a limit to the number of lines that can be matched, imposed by the way
-that <b>pcre2grep</b> buffers the input file as it scans it. With a sufficiently
-large processing buffer, this should not be a problem, but the <b>-M</b> option
-does not work when input is read line by line (see \fP--line-buffered\fP.)
-</P>
-<P>
-<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
-The PCRE2 library supports five different conventions for indicating
-the ends of lines. They are the single-character sequences CR (carriage return)
-and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
-which recognizes any of the preceding three types, and an "any" convention, in
-which any Unicode line ending sequence is assumed to end a line. The Unicode
-sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF
-(form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
-PS (paragraph separator, U+2029).
+The newline type may be specified in upper, lower, or mixed case. If the
+newline type is NUL, lines are separated by binary zero characters. The other
+types are the single-character sequences CR (carriage return) and LF
+(linefeed), the two-character sequence CRLF, an "anycrlf" type, which
+recognizes any of the preceding three types, and an "any" type, for which any
+Unicode line ending sequence is assumed to end a line. The Unicode sequences
+are the three just mentioned, plus VT (vertical tab, U+000B), FF (form feed,
+U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
+(paragraph separator, U+2029).
 <br>
 <br>
 <br>
 <br>
 When the PCRE2 library is built, a default line-ending sequence is specified.
 When the PCRE2 library is built, a default line-ending sequence is specified.
 This is normally the standard sequence for the operating system. Unless
 This is normally the standard sequence for the operating system. Unless
 otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
 otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
-The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
-makes it possible to use <b>pcre2grep</b> to scan files that have come from
-other environments without having to modify their line endings. If the data
-that is being scanned does not agree with the convention set by this option,
-<b>pcre2grep</b> may behave in strange ways. Note that this option does not
-apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
+<br>
+<br>
+This option makes it possible to use <b>pcre2grep</b> to scan files that have
+come from other environments without having to modify their line endings. If
+the data that is being scanned does not agree with the convention set by this
+option, <b>pcre2grep</b> may behave in strange ways. Note that this option does
+not apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
 <b>--include-from</b> options, which are expected to use the operating system's
 <b>--include-from</b> options, which are expected to use the operating system's
 standard newline sequence.
 standard newline sequence.
 </P>
 </P>
@@ -629,29 +665,41 @@ It should never be needed in normal use.
 </P>
 </P>
 <P>
 <P>
 <b>-O</b> <i>text</i>, <b>--output</b>=<i>text</i>
 <b>-O</b> <i>text</i>, <b>--output</b>=<i>text</i>
-When there is a match, instead of outputting the whole line that matched,
-output just the given text. This option is mutually exclusive with
-<b>--only-matching</b>, <b>--file-offsets</b>, and <b>--line-offsets</b>. Escape
-sequences starting with a dollar character may be used to insert the contents
-of the matched part of the line and/or captured substrings into the text.
+When there is a match, instead of outputting the line that matched, output just
+the text specified in this option, followed by an operating-system standard
+newline. In this mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>,
+and <b>-C</b> options are ignored. The <b>--newline</b> option has no effect on
+this option, which is mutually exclusive with <b>--only-matching</b>,
+<b>--file-offsets</b>, and <b>--line-offsets</b>. However, like
+<b>--only-matching</b>, if there is more than one match in a line, each of them
+causes a line of output.
+<br>
+<br>
+Escape sequences starting with a dollar character may be used to insert the
+contents of the matched part of the line and/or captured substrings into the
+text.
 <br>
 <br>
 <br>
 <br>
-$&#60;digits&#62; or ${&#60;digits&#62;} is replaced by the captured
-substring of the given decimal number; zero substitutes the whole match. If
-the number is greater than the number of capturing substrings, or if the
-capture is unset, the replacement is empty.
+$&#60;digits&#62; or ${&#60;digits&#62;} is replaced by the captured substring of the given
+decimal number; zero substitutes the whole match. If the number is greater than
+the number of capturing substrings, or if the capture is unset, the replacement
+is empty.
 <br>
 <br>
 <br>
 <br>
 $a is replaced by bell; $b by backspace; $e by escape; $f by form feed; $n by
 $a is replaced by bell; $b by backspace; $e by escape; $f by form feed; $n by
 newline; $r by carriage return; $t by tab; $v by vertical tab.
 newline; $r by carriage return; $t by tab; $v by vertical tab.
 <br>
 <br>
 <br>
 <br>
-$o&#60;digits&#62; is replaced by the character represented by the given octal
-number; up to three digits are processed.
+$o&#60;digits&#62; or $o{&#60;digits&#62;} is replaced by the character whose code point is the
+given octal number. In the first form, up to three octal digits are processed.
+When more digits are needed in Unicode mode to specify a wide character, the
+second form must be used.
 <br>
 <br>
 <br>
 <br>
-$x&#60;digits&#62; is replaced by the character represented by the given hexadecimal
-number; up to two digits are processed.
+$x&#60;digits&#62; or $x{&#60;digits&#62;} is replaced by the character represented by the
+given hexadecimal number. In the first form, up to two hexadecimal digits are
+processed. When more digits are needed in Unicode mode to specify a wide
+character, the second form must be used.
 <br>
 <br>
 <br>
 <br>
 Any other character is substituted by itself. In particular, $$ is replaced by
 Any other character is substituted by itself. In particular, $$ is replaced by
@@ -673,20 +721,32 @@ otherwise empty line. This option is mutually exclusive with <b>--output</b>,
 <P>
 <P>
 <b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
 <b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
 Show only the part of the line that matched the capturing parentheses of the
 Show only the part of the line that matched the capturing parentheses of the
-given number. Up to 32 capturing parentheses are supported, and -o0 is
-equivalent to <b>-o</b> without a number. Because these options can be given
-without an argument (see above), if an argument is present, it must be given in
-the same shell item, for example, -o3 or --only-matching=2. The comments given
-for the non-argument case above also apply to this option. If the specified
-capturing parentheses do not exist in the pattern, or were not set in the
-match, nothing is output unless the file name or line number are being output.
+given number. Up to 50 capturing parentheses are supported by default. This
+limit can be changed via the <b>--om-capture</b> option. A pattern may contain
+any number of capturing parentheses, but only those whose number is within the
+limit can be accessed by <b>-o</b>. An error occurs if the number specified by
+<b>-o</b> is greater than the limit.
+<br>
+<br>
+-o0 is the same as <b>-o</b> without a number. Because these options can be
+given without an argument (see above), if an argument is present, it must be
+given in the same shell item, for example, -o3 or --only-matching=2. The
+comments given for the non-argument case above also apply to this option. If
+the specified capturing parentheses do not exist in the pattern, or were not
+set in the match, nothing is output unless the file name or line number are
+being output.
 <br>
 <br>
 <br>
 <br>
 If this option is given multiple times, multiple substrings are output for each
 If this option is given multiple times, multiple substrings are output for each
 match, in the order the options are given, and all on one line. For example,
 match, in the order the options are given, and all on one line. For example,
 -o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and
 -o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and
 then 3 again to be output. By default, there is no separator (but see the next
 then 3 again to be output. By default, there is no separator (but see the next
-option).
+but one option).
+</P>
+<P>
+<b>--om-capture</b>=<i>number</i>
+Set the number of capturing parentheses that can be accessed by <b>-o</b>. The
+default is 50.
 </P>
 </P>
 <P>
 <P>
 <b>--om-separator</b>=<i>text</i>
 <b>--om-separator</b>=<i>text</i>
@@ -708,7 +768,8 @@ option to "recurse".
 </P>
 </P>
 <P>
 <P>
 <b>--recursion-limit</b>=<i>number</i>
 <b>--recursion-limit</b>=<i>number</i>
-See <b>--match-limit</b> above.
+This is an obsolete synonym for <b>--depth-limit</b>. See <b>--match-limit</b>
+above for details.
 </P>
 </P>
 <P>
 <P>
 <b>-s</b>, <b>--no-messages</b>
 <b>-s</b>, <b>--no-messages</b>
@@ -729,11 +790,23 @@ ignored when used with <b>-L</b> (list files without matches), because the grand
 total would always be zero.
 total would always be zero.
 </P>
 </P>
 <P>
 <P>
-<b>-u</b>, <b>--utf-8</b>
+<b>-u</b>, <b>--utf</b>
 Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
 Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
 with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
 with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
-<b>--include</b> options) and all subject lines that are scanned must be valid
-strings of UTF-8 characters.
+<b>--include</b> options) and all lines that are scanned must be valid strings
+of UTF-8 characters. If an invalid UTF-8 string is encountered, an error
+occurs.
+</P>
+<P>
+<b>-U</b>, <b>--utf-allow-invalid</b>
+As <b>--utf</b>, but in addition subject lines may contain invalid UTF-8 code
+unit sequences. These can never form part of any pattern match. Patterns
+themselves, however, must still be valid UTF-8 strings. This facility allows
+valid UTF-8 strings to be sought within arbitrary byte sequences in executable
+or other binary files. For more details about matching in non-valid UTF-8
+strings, see the
+<a href="pcre2unicode.html"><b>pcre2unicode</b>(3)</a>
+documentation.
 </P>
 </P>
 <P>
 <P>
 <b>-V</b>, <b>--version</b>
 <b>-V</b>, <b>--version</b>
@@ -744,7 +817,9 @@ ignored.
 <P>
 <P>
 <b>-v</b>, <b>--invert-match</b>
 <b>-v</b>, <b>--invert-match</b>
 Invert the sense of the match, so that lines which do <i>not</i> match any of
 Invert the sense of the match, so that lines which do <i>not</i> match any of
-the patterns are the ones that are found.
+the patterns are the ones that are found. When this option is set, options such
+as <b>--only-matching</b> and <b>--output</b>, which specify parts of a match
+that are to be output, are ignored.
 </P>
 </P>
 <P>
 <P>
 <b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
 <b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
@@ -764,27 +839,39 @@ pattern and ")$" at the end. This option applies only to the patterns that are
 matched against the contents of files; it does not apply to patterns specified
 matched against the contents of files; it does not apply to patterns specified
 by any of the <b>--include</b> or <b>--exclude</b> options.
 by any of the <b>--include</b> or <b>--exclude</b> options.
 </P>
 </P>
-<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
+<br><a name="SEC7" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
 <P>
 <P>
 The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
 The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
 order, for a locale. The first one that is set is used. This can be overridden
 order, for a locale. The first one that is set is used. This can be overridden
 by the <b>--locale</b> option. If no locale is set, the PCRE2 library's default
 by the <b>--locale</b> option. If no locale is set, the PCRE2 library's default
 (usually the "C" locale) is used.
 (usually the "C" locale) is used.
 </P>
 </P>
-<br><a name="SEC7" href="#TOC1">NEWLINES</a><br>
+<br><a name="SEC8" href="#TOC1">NEWLINES</a><br>
 <P>
 <P>
 The <b>-N</b> (<b>--newline</b>) option allows <b>pcre2grep</b> to scan files with
 The <b>-N</b> (<b>--newline</b>) option allows <b>pcre2grep</b> to scan files with
-different newline conventions from the default. Any parts of the input files
-that are written to the standard output are copied identically, with whatever
-newline sequences they have in the input. However, the setting of this option
-does not affect the interpretation of files specified by the <b>-f</b>,
-<b>--exclude-from</b>, or <b>--include-from</b> options, which are assumed to use
-the operating system's standard newline sequence, nor does it affect the way in
-which <b>pcre2grep</b> writes informational messages to the standard error and
-output streams. For these it uses the string "\n" to indicate newlines,
-relying on the C I/O library to convert this to an appropriate sequence.
-</P>
-<br><a name="SEC8" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
+newline conventions that differ from the default. This option affects only the
+way scanned files are processed. It does not affect the interpretation of files
+specified by the <b>-f</b>, <b>--file-list</b>, <b>--exclude-from</b>, or
+<b>--include-from</b> options.
+</P>
+<P>
+Any parts of the scanned input files that are written to the standard output
+are copied with whatever newline sequences they have in the input. However, if
+the final line of a file is output, and it does not end with a newline
+sequence, a newline sequence is added. If the newline setting is CR, LF, CRLF
+or NUL, that line ending is output; for the other settings (ANYCRLF or ANY) a
+single NL is used.
+</P>
+<P>
+The newline setting does not affect the way in which <b>pcre2grep</b> writes
+newlines in informational messages to the standard output and error streams.
+Under Windows, the standard output is set to be binary, so that "\r\n" at the
+ends of output lines that are copied from the input is not converted to
+"\r\r\n" by the C I/O library. This means that any messages written to the
+standard output must end with "\r\n". For all other operating systems, and
+for all messages to the standard error stream, "\n" is used.
+</P>
+<br><a name="SEC9" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
 <P>
 <P>
 Many of the short and long forms of <b>pcre2grep</b>'s options are the same
 Many of the short and long forms of <b>pcre2grep</b>'s options are the same
 as in the GNU <b>grep</b> program. Any long option of the form
 as in the GNU <b>grep</b> program. Any long option of the form
@@ -793,9 +880,9 @@ as in the GNU <b>grep</b> program. Any long option of the form
 <b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
 <b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
 <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
 <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
 <b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
 <b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
-<b>--output</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
-<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a
-capturing parentheses number.
+<b>--output</b>, <b>-u</b>, <b>--utf</b>, <b>-U</b>, and <b>--utf-allow-invalid</b>
+options are specific to <b>pcre2grep</b>, as is the use of the
+<b>--only-matching</b> option with a capturing parentheses number.
 </P>
 </P>
 <P>
 <P>
 Although most of the common options work the same way, a few are different in
 Although most of the common options work the same way, a few are different in
@@ -804,7 +891,7 @@ for GNU <b>grep</b>, but a regular expression for <b>pcre2grep</b>. If both the
 <b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
 <b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
 without counts, but <b>pcre2grep</b> gives the counts as well.
 without counts, but <b>pcre2grep</b> gives the counts as well.
 </P>
 </P>
-<br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
+<br><a name="SEC10" href="#TOC1">OPTIONS WITH DATA</a><br>
 <P>
 <P>
 There are four different ways in which an option with data can be specified.
 There are four different ways in which an option with data can be specified.
 If a short form option is used, the data may follow immediately, or (with one
 If a short form option is used, the data may follow immediately, or (with one
@@ -836,14 +923,16 @@ The exceptions to the above are the <b>--colour</b> (or <b>--color</b>) and
 options does have data, it must be given in the first form, using an equals
 options does have data, it must be given in the first form, using an equals
 character. Otherwise <b>pcre2grep</b> will assume that it has no data.
 character. Otherwise <b>pcre2grep</b> will assume that it has no data.
 </P>
 </P>
-<br><a name="SEC10" href="#TOC1">USING PCRE2'S CALLOUT FACILITY</a><br>
+<br><a name="SEC11" href="#TOC1">USING PCRE2'S CALLOUT FACILITY</a><br>
 <P>
 <P>
 <b>pcre2grep</b> has, by default, support for calling external programs or
 <b>pcre2grep</b> has, by default, support for calling external programs or
 scripts or echoing specific strings during matching by making use of PCRE2's
 scripts or echoing specific strings during matching by making use of PCRE2's
-callout facility. However, this support can be disabled when <b>pcre2grep</b> is
-built. You can find out whether your binary has support for callouts by running
-it with the <b>--help</b> option. If the support is not enabled, all callouts in
-patterns are ignored by <b>pcre2grep</b>.
+callout facility. However, this support can be completely or partially disabled
+when <b>pcre2grep</b> is built. You can find out whether your binary has support
+for callouts by running it with the <b>--help</b> option. If callout support is
+completely disabled, all callouts in patterns are ignored by <b>pcre2grep</b>.
+If the facility is partially disabled, calling external programs is not
+supported, and callouts that request it are ignored.
 </P>
 </P>
 <P>
 <P>
 A callout in a PCRE2 pattern is of the form (?C&#60;arg&#62;) where the argument is
 A callout in a PCRE2 pattern is of the form (?C&#60;arg&#62;) where the argument is
@@ -853,9 +942,39 @@ documentation for details). Numbered callouts are ignored by <b>pcre2grep</b>;
 only callouts with string arguments are useful.
 only callouts with string arguments are useful.
 </P>
 </P>
 <br><b>
 <br><b>
+Echoing a specific string
+</b><br>
+<P>
+Starting the callout string with a pipe character invokes an echoing facility
+that avoids calling an external program or script. This facility is always
+available, provided that callouts were not completely disabled when
+<b>pcre2grep</b> was built. The rest of the callout string is processed as a
+zero-terminated string, which means it should not contain any internal binary
+zeros. It is written to the output, having first been passed through the same
+escape processing as text from the <b>--output</b> (<b>-O</b>) option (see
+above). However, $0 cannot be used to insert a matched substring because the
+match is still in progress. Instead, the single character '0' is inserted. Any
+syntax errors in the string (for example, a dollar not followed by another
+character) causes the callout to be ignored. No terminator is added to the
+output string, so if you want a newline, you must include it explicitly using
+the escape $n. For example:
+<pre>
+  pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' &#60;some file&#62;
+</pre>
+Matching continues normally after the string is output. If you want to see only
+the callout output but not any output from an actual match, you should end the
+pattern with (*FAIL).
+</P>
+<br><b>
 Calling external programs or scripts
 Calling external programs or scripts
 </b><br>
 </b><br>
 <P>
 <P>
+This facility can be independently disabled when <b>pcre2grep</b> is built. It
+is supported for Windows, where a call to <b>_spawnvp()</b> is used, for VMS,
+where <b>lib$spawn()</b> is used, and for any Unix-like environment where
+<b>fork()</b> and <b>execv()</b> are available.
+</P>
+<P>
 If the callout string does not start with a pipe (vertical bar) character, it
 If the callout string does not start with a pipe (vertical bar) character, it
 is parsed into a list of substrings separated by pipe characters. The first
 is parsed into a list of substrings separated by pipe characters. The first
 substring must be an executable name, with the following substrings specifying
 substring must be an executable name, with the following substrings specifying
@@ -864,14 +983,11 @@ arguments:
   executable_name|arg1|arg2|...
   executable_name|arg1|arg2|...
 </pre>
 </pre>
 Any substring (including the executable name) may contain escape sequences
 Any substring (including the executable name) may contain escape sequences
-started by a dollar character: $&#60;digits&#62; or ${&#60;digits&#62;} is replaced by the
-captured substring of the given decimal number, which must be greater than
-zero. If the number is greater than the number of capturing substrings, or if
-the capture is unset, the replacement is empty.
-</P>
-<P>
-Any other character is substituted by itself. In particular, $$ is replaced by
-a single dollar and $| is replaced by a pipe character. Here is an example:
+started by a dollar character. These are the same as for the <b>--output</b>
+(<b>-O</b>) option documented above, except that $0 cannot insert the matched
+string because the match is still in progress. Instead, the character '0'
+is inserted. If you need a literal dollar or pipe character in any
+substring, use $$ or $| respectively. Here is an example:
 <pre>
 <pre>
   echo -e "abcde\n12345" | pcre2grep \
   echo -e "abcde\n12345" | pcre2grep \
     '(?x)(.)(..(.))
     '(?x)(.)(..(.))
@@ -884,29 +1000,16 @@ a single dollar and $| is replaced by a pipe character. Here is an example:
     Arg1: [1] [234] [4] Arg2: |1| ()
     Arg1: [1] [234] [4] Arg2: |1| ()
     12345
     12345
 </pre>
 </pre>
-The parameters for the <b>execv()</b> system call that is used to run the
-program or script are zero-terminated strings. This means that binary zero
-characters in the callout argument will cause premature termination of their
-substrings, and therefore should not be present. Any syntax errors in the
-string (for example, a dollar not followed by another character) cause the
-callout to be ignored. If running the program fails for any reason (including
-the non-existence of the executable), a local matching failure occurs and the
-matcher backtracks in the normal way.
-</P>
-<br><b>
-Echoing a specific string
-</b><br>
-<P>
-If the callout string starts with a pipe (vertical bar) character, the rest of
-the string is written to the output, having been passed through the same escape
-processing as text from the --output option. This provides a simple echoing
-facility that avoids calling an external program or script. No terminator is
-added to the string, so if you want a newline, you must include it explicitly.
-Matching continues normally after the string is output. If you want to see only
-the callout output but not any output from an actual match, you should end the
-relevant pattern with (*FAIL).
+The parameters for the system call that is used to run the program or script
+are zero-terminated strings. This means that binary zero characters in the
+callout argument will cause premature termination of their substrings, and
+therefore should not be present. Any syntax errors in the string (for example,
+a dollar not followed by another character) causes the callout to be ignored.
+If running the program fails for any reason (including the non-existence of the
+executable), a local matching failure occurs and the matcher backtracks in the
+normal way.
 </P>
 </P>
-<br><a name="SEC11" href="#TOC1">MATCHING ERRORS</a><br>
+<br><a name="SEC12" href="#TOC1">MATCHING ERRORS</a><br>
 <P>
 <P>
 It is possible to supply a regular expression that takes a very long time to
 It is possible to supply a regular expression that takes a very long time to
 fail to match certain lines. Such patterns normally involve nested indefinite
 fail to match certain lines. Such patterns normally involve nested indefinite
@@ -922,7 +1025,7 @@ overall resource limit. There are also other limits that affect the amount of
 memory used during matching; see the discussion of <b>--heap-limit</b> and
 memory used during matching; see the discussion of <b>--heap-limit</b> and
 <b>--depth-limit</b> above.
 <b>--depth-limit</b> above.
 </P>
 </P>
-<br><a name="SEC12" href="#TOC1">DIAGNOSTICS</a><br>
+<br><a name="SEC13" href="#TOC1">DIAGNOSTICS</a><br>
 <P>
 <P>
 Exit status is 0 if any matches were found, 1 if no matches were found, and 2
 Exit status is 0 if any matches were found, 1 if no matches were found, and 2
 for syntax errors, overlong lines, non-existent or inaccessible files (even if
 for syntax errors, overlong lines, non-existent or inaccessible files (even if
@@ -934,24 +1037,25 @@ affect the return code.
 When run under VMS, the return code is placed in the symbol PCRE2GREP_RC
 When run under VMS, the return code is placed in the symbol PCRE2GREP_RC
 because VMS does not distinguish between exit(0) and exit(1).
 because VMS does not distinguish between exit(0) and exit(1).
 </P>
 </P>
-<br><a name="SEC13" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC14" href="#TOC1">SEE ALSO</a><br>
 <P>
 <P>
-<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3), <b>pcre2callout</b>(3).
+<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3), <b>pcre2callout</b>(3),
+<b>pcre2unicode</b>(3).
 </P>
 </P>
-<br><a name="SEC14" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC15" href="#TOC1">AUTHOR</a><br>
 <P>
 <P>
 Philip Hazel
 Philip Hazel
 <br>
 <br>
-University Computing Service
+Retired from University Computing Service
 <br>
 <br>
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
 </P>
 </P>
-<br><a name="SEC15" href="#TOC1">REVISION</a><br>
+<br><a name="SEC16" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 13 November 2017
+Last updated: 31 August 2021
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 82 - 48
regex.mod/pcre/doc/html/pcre2jit.html

@@ -16,16 +16,17 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC1" href="#SEC1">PCRE2 JUST-IN-TIME COMPILER SUPPORT</a>
 <li><a name="TOC1" href="#SEC1">PCRE2 JUST-IN-TIME COMPILER SUPPORT</a>
 <li><a name="TOC2" href="#SEC2">AVAILABILITY OF JIT SUPPORT</a>
 <li><a name="TOC2" href="#SEC2">AVAILABILITY OF JIT SUPPORT</a>
 <li><a name="TOC3" href="#SEC3">SIMPLE USE OF JIT</a>
 <li><a name="TOC3" href="#SEC3">SIMPLE USE OF JIT</a>
-<li><a name="TOC4" href="#SEC4">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a>
-<li><a name="TOC5" href="#SEC5">RETURN VALUES FROM JIT MATCHING</a>
-<li><a name="TOC6" href="#SEC6">CONTROLLING THE JIT STACK</a>
-<li><a name="TOC7" href="#SEC7">JIT STACK FAQ</a>
-<li><a name="TOC8" href="#SEC8">FREEING JIT SPECULATIVE MEMORY</a>
-<li><a name="TOC9" href="#SEC9">EXAMPLE CODE</a>
-<li><a name="TOC10" href="#SEC10">JIT FAST PATH API</a>
-<li><a name="TOC11" href="#SEC11">SEE ALSO</a>
-<li><a name="TOC12" href="#SEC12">AUTHOR</a>
-<li><a name="TOC13" href="#SEC13">REVISION</a>
+<li><a name="TOC4" href="#SEC4">MATCHING SUBJECTS CONTAINING INVALID UTF</a>
+<li><a name="TOC5" href="#SEC5">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a>
+<li><a name="TOC6" href="#SEC6">RETURN VALUES FROM JIT MATCHING</a>
+<li><a name="TOC7" href="#SEC7">CONTROLLING THE JIT STACK</a>
+<li><a name="TOC8" href="#SEC8">JIT STACK FAQ</a>
+<li><a name="TOC9" href="#SEC9">FREEING JIT SPECULATIVE MEMORY</a>
+<li><a name="TOC10" href="#SEC10">EXAMPLE CODE</a>
+<li><a name="TOC11" href="#SEC11">JIT FAST PATH API</a>
+<li><a name="TOC12" href="#SEC12">SEE ALSO</a>
+<li><a name="TOC13" href="#SEC13">AUTHOR</a>
+<li><a name="TOC14" href="#SEC14">REVISION</a>
 </ul>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE2 JUST-IN-TIME COMPILER SUPPORT</a><br>
 <br><a name="SEC1" href="#TOC1">PCRE2 JUST-IN-TIME COMPILER SUPPORT</a><br>
 <P>
 <P>
@@ -53,6 +54,7 @@ platforms:
 <pre>
 <pre>
   ARM 32-bit (v5, v7, and Thumb2)
   ARM 32-bit (v5, v7, and Thumb2)
   ARM 64-bit
   ARM 64-bit
+  IBM s390x 64 bit
   Intel x86 32-bit and 64-bit
   Intel x86 32-bit and 64-bit
   MIPS 32-bit and 64-bit
   MIPS 32-bit and 64-bit
   Power PC 32-bit and 64-bit
   Power PC 32-bit and 64-bit
@@ -89,7 +91,7 @@ or a negative error code.
 There is a limit to the size of pattern that JIT supports, imposed by the size
 There is a limit to the size of pattern that JIT supports, imposed by the size
 of machine stack that it uses. The exact rules are not documented because they
 of machine stack that it uses. The exact rules are not documented because they
 may change at any time, in particular, when new optimizations are introduced.
 may change at any time, in particular, when new optimizations are introduced.
-If a pattern is too big, a call to \fBpcre2_jit_compile()\fB returns
+If a pattern is too big, a call to <b>pcre2_jit_compile()</b> returns
 PCRE2_ERROR_NOMEMORY.
 PCRE2_ERROR_NOMEMORY.
 </P>
 </P>
 <P>
 <P>
@@ -144,12 +146,39 @@ support is not available, or the pattern was not processed by
 <b>pcre2_jit_compile()</b>, or the JIT compiler was not able to handle the
 <b>pcre2_jit_compile()</b>, or the JIT compiler was not able to handle the
 pattern.
 pattern.
 </P>
 </P>
-<br><a name="SEC4" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
+<br><a name="SEC4" href="#TOC1">MATCHING SUBJECTS CONTAINING INVALID UTF</a><br>
+<P>
+When a pattern is compiled with the PCRE2_UTF option, subject strings are
+normally expected to be a valid sequence of UTF code units. By default, this is
+checked at the start of matching and an error is generated if invalid UTF is
+detected. The PCRE2_NO_UTF_CHECK option can be passed to <b>pcre2_match()</b> to
+skip the check (for improved performance) if you are sure that a subject string
+is valid. If this option is used with an invalid string, the result is
+undefined.
+</P>
+<P>
+However, a way of running matches on strings that may contain invalid UTF
+sequences is available. Calling <b>pcre2_compile()</b> with the
+PCRE2_MATCH_INVALID_UTF option has two effects: it tells the interpreter in
+<b>pcre2_match()</b> to support invalid UTF, and, if <b>pcre2_jit_compile()</b>
+is called, the compiled JIT code also supports invalid UTF. Details of how this
+support works, in both the JIT and the interpretive cases, is given in the
+<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
+documentation.
+</P>
+<P>
+There is also an obsolete option for <b>pcre2_jit_compile()</b> called
+PCRE2_JIT_INVALID_UTF, which currently exists only for backward compatibility.
+It is superseded by the <b>pcre2_compile()</b> option PCRE2_MATCH_INVALID_UTF
+and should no longer be used. It may be removed in future.
+</P>
+<br><a name="SEC5" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
 <P>
 <P>
 The <b>pcre2_match()</b> options that are supported for JIT matching are
 The <b>pcre2_match()</b> options that are supported for JIT matching are
-PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
-PCRE2_ANCHORED option is not supported at match time.
+PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
+PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
+PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
+supported at match time.
 </P>
 </P>
 <P>
 <P>
 If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the
 If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the
@@ -160,7 +189,7 @@ The only unsupported pattern items are \C (match a single data unit) when
 running in a UTF mode, and a callout immediately before an assertion condition
 running in a UTF mode, and a callout immediately before an assertion condition
 in a conditional group.
 in a conditional group.
 </P>
 </P>
-<br><a name="SEC5" href="#TOC1">RETURN VALUES FROM JIT MATCHING</a><br>
+<br><a name="SEC6" href="#TOC1">RETURN VALUES FROM JIT MATCHING</a><br>
 <P>
 <P>
 When a pattern is matched using JIT matching, the return values are the same
 When a pattern is matched using JIT matching, the return values are the same
 as those given by the interpretive <b>pcre2_match()</b> code, with the addition
 as those given by the interpretive <b>pcre2_match()</b> code, with the addition
@@ -176,10 +205,10 @@ circumstance when JIT is not used, but the details of exactly what is counted
 are not the same. The PCRE2_ERROR_DEPTHLIMIT error code is never returned
 are not the same. The PCRE2_ERROR_DEPTHLIMIT error code is never returned
 when JIT matching is used.
 when JIT matching is used.
 <a name="stackcontrol"></a></P>
 <a name="stackcontrol"></a></P>
-<br><a name="SEC6" href="#TOC1">CONTROLLING THE JIT STACK</a><br>
+<br><a name="SEC7" href="#TOC1">CONTROLLING THE JIT STACK</a><br>
 <P>
 <P>
 When the compiled JIT code runs, it needs a block of memory to use as a stack.
 When the compiled JIT code runs, it needs a block of memory to use as a stack.
-By default, it uses 32K on the machine stack. However, some large or
+By default, it uses 32KiB on the machine stack. However, some large or
 complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
 complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
 is given when there is not enough stack. Three functions are provided for
 is given when there is not enough stack. Three functions are provided for
 managing blocks of memory for use as JIT stacks. There is further discussion
 managing blocks of memory for use as JIT stacks. There is further discussion
@@ -193,9 +222,10 @@ are a starting size, a maximum size, and a general context (for memory
 allocation functions, or NULL for standard memory allocation). It returns a
 allocation functions, or NULL for standard memory allocation). It returns a
 pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
 pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
 is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
 is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
-that is no longer needed. (For the technically minded: the address space is
-allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
-be more than enough for any pattern.
+that is no longer needed. If its argument is NULL, this function returns
+immediately, without doing anything. (For the technically minded: the address
+space is allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to
+1MiB should be more than enough for any pattern.
 </P>
 </P>
 <P>
 <P>
 The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
 The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
@@ -207,9 +237,10 @@ should use. Its arguments are as follows:
 </pre>
 </pre>
 The first argument is a pointer to a match context. When this is subsequently
 The first argument is a pointer to a match context. When this is subsequently
 passed to a matching function, its information determines which JIT stack is
 passed to a matching function, its information determines which JIT stack is
-used. There are three cases for the values of the other two options:
+used. If this argument is NULL, the function returns immediately, without doing
+anything. There are three cases for the values of the other two options:
 <pre>
 <pre>
-  (1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block
+  (1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32KiB block
       on the machine stack is used. This is the default when a match
       on the machine stack is used. This is the default when a match
       context is created.
       context is created.
 
 
@@ -220,7 +251,7 @@ used. There are three cases for the values of the other two options:
   (3) If <i>callback</i> is not NULL, it must point to a function that is
   (3) If <i>callback</i> is not NULL, it must point to a function that is
       called with <i>data</i> as an argument at the start of matching, in
       called with <i>data</i> as an argument at the start of matching, in
       order to set up a JIT stack. If the return from the callback
       order to set up a JIT stack. If the return from the callback
-      function is NULL, the internal 32K stack is used; otherwise the
+      function is NULL, the internal 32KiB stack is used; otherwise the
       return value must be a valid JIT stack, the result of calling
       return value must be a valid JIT stack, the result of calling
       <b>pcre2_jit_stack_create()</b>.
       <b>pcre2_jit_stack_create()</b>.
 </pre>
 </pre>
@@ -256,7 +287,7 @@ inefficient solution, and not recommended.
 This is a suggestion for how a multithreaded program that needs to set up
 This is a suggestion for how a multithreaded program that needs to set up
 non-default JIT stacks might operate:
 non-default JIT stacks might operate:
 <pre>
 <pre>
-  During thread initalization
+  During thread initialization
     thread_local_var = pcre2_jit_stack_create(...)
     thread_local_var = pcre2_jit_stack_create(...)
 
 
   During thread exit
   During thread exit
@@ -267,7 +298,7 @@ non-default JIT stacks might operate:
 </pre>
 </pre>
 All the functions described in this section do nothing if JIT is not available.
 All the functions described in this section do nothing if JIT is not available.
 <a name="stackfaq"></a></P>
 <a name="stackfaq"></a></P>
-<br><a name="SEC7" href="#TOC1">JIT STACK FAQ</a><br>
+<br><a name="SEC8" href="#TOC1">JIT STACK FAQ</a><br>
 <P>
 <P>
 (1) Why do we need JIT stacks?
 (1) Why do we need JIT stacks?
 <br>
 <br>
@@ -286,9 +317,9 @@ we do the recursion in memory.
 Modern operating systems have a nice feature: they can reserve an address space
 Modern operating systems have a nice feature: they can reserve an address space
 instead of allocating memory. We can safely allocate memory pages inside this
 instead of allocating memory. We can safely allocate memory pages inside this
 address space, so the stack could grow without moving memory data (this is
 address space, so the stack could grow without moving memory data (this is
-important because of pointers). Thus we can allocate 1M address space, and use
-only a single memory page (usually 4K) if that is enough. However, we can still
-grow up to 1M anytime if needed.
+important because of pointers). Thus we can allocate 1MiB address space, and
+use only a single memory page (usually 4KiB) if that is enough. However, we can
+still grow up to 1MiB anytime if needed.
 </P>
 </P>
 <P>
 <P>
 (3) Who "owns" a JIT stack?
 (3) Who "owns" a JIT stack?
@@ -309,12 +340,12 @@ stack through the JIT callback function.
 You can free a JIT stack at any time, as long as it will not be used by
 You can free a JIT stack at any time, as long as it will not be used by
 <b>pcre2_match()</b> again. When you assign the stack to a match context, only a
 <b>pcre2_match()</b> again. When you assign the stack to a match context, only a
 pointer is set. There is no reference counting or any other magic. You can free
 pointer is set. There is no reference counting or any other magic. You can free
-compiled patterns, contexts, and stacks in any order, anytime. Just \fIdo
-not\fP call <b>pcre2_match()</b> with a match context pointing to an already
-freed stack, as that will cause SEGFAULT. (Also, do not free a stack currently
-used by <b>pcre2_match()</b> in another thread). You can also replace the stack
-in a context at any time when it is not in use. You should free the previous
-stack before assigning a replacement.
+compiled patterns, contexts, and stacks in any order, anytime.
+Just <i>do not</i> call <b>pcre2_match()</b> with a match context pointing to an
+already freed stack, as that will cause SEGFAULT. (Also, do not free a stack
+currently used by <b>pcre2_match()</b> in another thread). You can also replace
+the stack in a context at any time when it is not in use. You should free the
+previous stack before assigning a replacement.
 </P>
 </P>
 <P>
 <P>
 (5) Should I allocate/free a stack every time before/after calling
 (5) Should I allocate/free a stack every time before/after calling
@@ -328,7 +359,7 @@ list of patterns.
 </P>
 </P>
 <P>
 <P>
 (6) OK, the stack is for long term memory allocation. But what happens if a
 (6) OK, the stack is for long term memory allocation. But what happens if a
-pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
+pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the
 stack is freed?
 stack is freed?
 <br>
 <br>
 <br>
 <br>
@@ -346,7 +377,7 @@ stack handling?
 No, thanks to Windows. If POSIX threads were used everywhere, we could throw
 No, thanks to Windows. If POSIX threads were used everywhere, we could throw
 out this complicated API.
 out this complicated API.
 </P>
 </P>
-<br><a name="SEC8" href="#TOC1">FREEING JIT SPECULATIVE MEMORY</a><br>
+<br><a name="SEC9" href="#TOC1">FREEING JIT SPECULATIVE MEMORY</a><br>
 <P>
 <P>
 <b>void pcre2_jit_free_unused_memory(pcre2_general_context *<i>gcontext</i>);</b>
 <b>void pcre2_jit_free_unused_memory(pcre2_general_context *<i>gcontext</i>);</b>
 </P>
 </P>
@@ -358,7 +389,7 @@ all possible memory. You can cause this to happen by calling
 pcre2_jit_free_unused_memory(). Its argument is a general context, for custom
 pcre2_jit_free_unused_memory(). Its argument is a general context, for custom
 memory management, or NULL for standard memory management.
 memory management, or NULL for standard memory management.
 </P>
 </P>
-<br><a name="SEC9" href="#TOC1">EXAMPLE CODE</a><br>
+<br><a name="SEC10" href="#TOC1">EXAMPLE CODE</a><br>
 <P>
 <P>
 This is a single-threaded example that specifies a JIT stack without using a
 This is a single-threaded example that specifies a JIT stack without using a
 callback. A real program should include error checking after all the function
 callback. A real program should include error checking after all the function
@@ -387,7 +418,7 @@ calls.
 
 
 </PRE>
 </PRE>
 </P>
 </P>
-<br><a name="SEC10" href="#TOC1">JIT FAST PATH API</a><br>
+<br><a name="SEC11" href="#TOC1">JIT FAST PATH API</a><br>
 <P>
 <P>
 Because the API described above falls back to interpreted matching when JIT is
 Because the API described above falls back to interpreted matching when JIT is
 not available, it is convenient for programs that are written for general use
 not available, it is convenient for programs that are written for general use
@@ -400,10 +431,13 @@ processed by <b>pcre2_jit_compile()</b>).
 </P>
 </P>
 <P>
 <P>
 The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
 The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
-the same arguments as <b>pcre2_match()</b>. The return values are also the same,
-plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
-requested that was not compiled. Unsupported option bits (for example,
-PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
+the same arguments as <b>pcre2_match()</b>. However, the subject string must be
+specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
+option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
+PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
+return values are also the same as for <b>pcre2_match()</b>, plus
+PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
+that was not compiled.
 </P>
 </P>
 <P>
 <P>
 When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
 When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
@@ -417,11 +451,11 @@ invalid data is passed, the result is undefined.
 Bypassing the sanity checks and the <b>pcre2_match()</b> wrapping can give
 Bypassing the sanity checks and the <b>pcre2_match()</b> wrapping can give
 speedups of more than 10%.
 speedups of more than 10%.
 </P>
 </P>
-<br><a name="SEC11" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
 <P>
 <P>
 <b>pcre2api</b>(3)
 <b>pcre2api</b>(3)
 </P>
 </P>
-<br><a name="SEC12" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
 <P>
 <P>
 Philip Hazel (FAQ by Zoltan Herczeg)
 Philip Hazel (FAQ by Zoltan Herczeg)
 <br>
 <br>
@@ -430,11 +464,11 @@ University Computing Service
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
 </P>
 </P>
-<br><a name="SEC13" href="#TOC1">REVISION</a><br>
+<br><a name="SEC14" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 31 March 2017
+Last updated: 23 May 2019
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2019 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 17 - 17
regex.mod/pcre/doc/html/pcre2limits.html

@@ -20,12 +20,12 @@ There are some size limitations in PCRE2 but it is hoped that they will never
 in practice be relevant.
 in practice be relevant.
 </P>
 </P>
 <P>
 <P>
-The maximum size of a compiled pattern is approximately 64K code units for the
-8-bit and 16-bit libraries if PCRE2 is compiled with the default internal
-linkage size, which is 2 bytes for these libraries. If you want to process
-regular expressions that are truly enormous, you can compile PCRE2 with an
-internal linkage size of 3 or 4 (when building the 16-bit library, 3 is rounded
-up to 4). See the <b>README</b> file in the source distribution and the
+The maximum size of a compiled pattern is approximately 64 thousand code units
+for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default
+internal linkage size, which is 2 bytes for these libraries. If you want to
+process regular expressions that are truly enormous, you can compile PCRE2 with
+an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is
+rounded up to 4). See the <b>README</b> file in the source distribution and the
 <a href="pcre2build.html"><b>pcre2build</b></a>
 <a href="pcre2build.html"><b>pcre2build</b></a>
 documentation for details. In these cases the limit is substantially larger.
 documentation for details. In these cases the limit is substantially larger.
 However, the speed of execution is slower. In the 32-bit library, the internal
 However, the speed of execution is slower. In the 32-bit library, the internal
@@ -50,17 +50,17 @@ All values in repeating quantifiers must be less than 65536.
 The maximum length of a lookbehind assertion is 65535 characters.
 The maximum length of a lookbehind assertion is 65535 characters.
 </P>
 </P>
 <P>
 <P>
-There is no limit to the number of parenthesized subpatterns, but there can be
-no more than 65535 capturing subpatterns. There is, however, a limit to the
-depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
-order to limit the amount of system stack used at compile time. The default
-limit can be specified when PCRE2 is built; the default default is 250. An
-application can change this limit by calling pcre2_set_parens_nest_limit() to
-set the limit in a compile context.
+There is no limit to the number of parenthesized groups, but there can be no
+more than 65535 capture groups, and there is a limit to the depth of nesting of
+parenthesized subpatterns of all kinds. This is imposed in order to limit the
+amount of system stack used at compile time. The default limit can be specified
+when PCRE2 is built; if not, the default is set to 250. An application can
+change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
+a compile context.
 </P>
 </P>
 <P>
 <P>
-The maximum length of name for a named subpattern is 32 code units, and the
-maximum number of named subpatterns is 10000.
+The maximum length of name for a named capture group is 32 code units, and the
+maximum number of such groups is 10000.
 </P>
 </P>
 <P>
 <P>
 The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
 The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
@@ -86,9 +86,9 @@ Cambridge, England.
 REVISION
 REVISION
 </b><br>
 </b><br>
 <P>
 <P>
-Last updated: 30 March 2017
+Last updated: 02 February 2019
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2019 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 42 - 30
regex.mod/pcre/doc/html/pcre2matching.html

@@ -78,14 +78,15 @@ tried is controlled by the greedy or ungreedy nature of the quantifier.
 If a leaf node is reached, a matching string has been found, and at that point
 If a leaf node is reached, a matching string has been found, and at that point
 the algorithm stops. Thus, if there is more than one possible match, this
 the algorithm stops. Thus, if there is more than one possible match, this
 algorithm returns the first one that it finds. Whether this is the shortest,
 algorithm returns the first one that it finds. Whether this is the shortest,
-the longest, or some intermediate length depends on the way the greedy and
-ungreedy repetition quantifiers are specified in the pattern.
+the longest, or some intermediate length depends on the way the alternations
+and the greedy or ungreedy repetition quantifiers are specified in the
+pattern.
 </P>
 </P>
 <P>
 <P>
 Because it ends up with a single path through the tree, it is relatively
 Because it ends up with a single path through the tree, it is relatively
 straightforward for this algorithm to keep track of the substrings that are
 straightforward for this algorithm to keep track of the substrings that are
 matched by portions of the pattern in parentheses. This provides support for
 matched by portions of the pattern in parentheses. This provides support for
-capturing parentheses and back references.
+capturing parentheses and backreferences.
 </P>
 </P>
 <br><a name="SEC4" href="#TOC1">THE ALTERNATIVE MATCHING ALGORITHM</a><br>
 <br><a name="SEC4" href="#TOC1">THE ALTERNATIVE MATCHING ALGORITHM</a><br>
 <P>
 <P>
@@ -109,11 +110,17 @@ no more unterminated paths. At this point, terminated paths represent the
 different matching possibilities (if there are none, the match has failed).
 different matching possibilities (if there are none, the match has failed).
 Thus, if there is more than one possible match, this algorithm finds all of
 Thus, if there is more than one possible match, this algorithm finds all of
 them, and in particular, it finds the longest. The matches are returned in
 them, and in particular, it finds the longest. The matches are returned in
-decreasing order of length. There is an option to stop the algorithm after the
-first match (which is necessarily the shortest) is found.
+the output vector in decreasing order of length. There is an option to stop the
+algorithm after the first match (which is necessarily the shortest) is found.
 </P>
 </P>
 <P>
 <P>
-Note that all the matches that are found start at the same point in the
+Note that the size of vector needed to contain all the results depends on the
+number of simultaneous matches, not on the number of parentheses in the
+pattern. Using <b>pcre2_match_data_create_from_pattern()</b> to create the match
+data block is therefore not advisable when doing DFA matching.
+</P>
+<P>
+Note also that all the matches that are found start at the same point in the
 subject. If the pattern
 subject. If the pattern
 <pre>
 <pre>
   cat(er(pillar)?)?
   cat(er(pillar)?)?
@@ -134,7 +141,8 @@ do want multiple matches in such cases, either use an ungreedy repeat
 </P>
 </P>
 <P>
 <P>
 There are a number of features of PCRE2 regular expressions that are not
 There are a number of features of PCRE2 regular expressions that are not
-supported by the alternative matching algorithm. They are as follows:
+supported or behave differently in the alternative matching function. Those
+that are not supported cause an error if encountered.
 </P>
 </P>
 <P>
 <P>
 1. Because the algorithm finds all possible matches, the greedy or ungreedy
 1. Because the algorithm finds all possible matches, the greedy or ungreedy
@@ -158,49 +166,49 @@ possibilities, and PCRE2's implementation of this algorithm does not attempt to
 do this. This means that no captured substrings are available.
 do this. This means that no captured substrings are available.
 </P>
 </P>
 <P>
 <P>
-3. Because no substrings are captured, back references within the pattern are
-not supported, and cause errors if encountered.
+3. Because no substrings are captured, backreferences within the pattern are
+not supported.
 </P>
 </P>
 <P>
 <P>
 4. For the same reason, conditional expressions that use a backreference as the
 4. For the same reason, conditional expressions that use a backreference as the
 condition or test for a specific group recursion are not supported.
 condition or test for a specific group recursion are not supported.
 </P>
 </P>
 <P>
 <P>
-5. Because many paths through the tree may be active, the \K escape sequence,
+5. Again for the same reason, script runs are not supported.
+</P>
+<P>
+6. Because many paths through the tree may be active, the \K escape sequence,
 which resets the start of the match when encountered (but may be on some paths
 which resets the start of the match when encountered (but may be on some paths
-and not on others), is not supported. It causes an error if encountered.
+and not on others), is not supported.
 </P>
 </P>
 <P>
 <P>
-6. Callouts are supported, but the value of the <i>capture_top</i> field is
+7. Callouts are supported, but the value of the <i>capture_top</i> field is
 always 1, and the value of the <i>capture_last</i> field is always 0.
 always 1, and the value of the <i>capture_last</i> field is always 0.
 </P>
 </P>
 <P>
 <P>
-7. The \C escape sequence, which (in the standard algorithm) always matches a
+8. The \C escape sequence, which (in the standard algorithm) always matches a
 single code unit, even in a UTF mode, is not supported in these modes, because
 single code unit, even in a UTF mode, is not supported in these modes, because
 the alternative algorithm moves through the subject string one character (not
 the alternative algorithm moves through the subject string one character (not
 code unit) at a time, for all active paths through the tree.
 code unit) at a time, for all active paths through the tree.
 </P>
 </P>
 <P>
 <P>
-8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
+9. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
 supported. (*FAIL) is supported, and behaves like a failing negative assertion.
 supported. (*FAIL) is supported, and behaves like a failing negative assertion.
 </P>
 </P>
-<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
 <P>
 <P>
-Using the alternative matching algorithm provides the following advantages:
+10. The PCRE2_MATCH_INVALID_UTF option for <b>pcre2_compile()</b> is not
+supported by <b>pcre2_dfa_match()</b>.
 </P>
 </P>
+<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
 <P>
 <P>
-1. All possible matches (at a single point in the subject) are automatically
-found, and in particular, the longest match is found. To find more than one
-match using the standard algorithm, you have to do kludgy things with
-callouts.
+The main advantage of the alternative algorithm is that all possible matches
+(at a single point in the subject) are automatically found, and in particular,
+the longest match is found. To find more than one match at the same point using
+the standard algorithm, you have to do kludgy things with callouts.
 </P>
 </P>
 <P>
 <P>
-2. Because the alternative algorithm scans the subject string just once, and
-never needs to backtrack (except for lookbehinds), it is possible to pass very
-long subject strings to the matching function in several pieces, checking for
-partial matching each time. Although it is also possible to do multi-segment
-matching using the standard algorithm, by retaining partially matched
-substrings, it is more complicated. The
+Partial matching is possible with this algorithm, though it has some
+limitations. The
 <a href="pcre2partial.html"><b>pcre2partial</b></a>
 <a href="pcre2partial.html"><b>pcre2partial</b></a>
 documentation gives details of partial matching and discusses multi-segment
 documentation gives details of partial matching and discusses multi-segment
 matching.
 matching.
@@ -215,26 +223,30 @@ because it has to search for all possible matches, but is also because it is
 less susceptible to optimization.
 less susceptible to optimization.
 </P>
 </P>
 <P>
 <P>
-2. Capturing parentheses and back references are not supported.
+2. Capturing parentheses, backreferences, script runs, and matching within
+invalid UTF string are not supported.
 </P>
 </P>
 <P>
 <P>
 3. Although atomic groups are supported, their use does not provide the
 3. Although atomic groups are supported, their use does not provide the
 performance advantage that it does for the standard algorithm.
 performance advantage that it does for the standard algorithm.
 </P>
 </P>
+<P>
+4. JIT optimization is not supported.
+</P>
 <br><a name="SEC7" href="#TOC1">AUTHOR</a><br>
 <br><a name="SEC7" href="#TOC1">AUTHOR</a><br>
 <P>
 <P>
 Philip Hazel
 Philip Hazel
 <br>
 <br>
-University Computing Service
+Retired from University Computing Service
 <br>
 <br>
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
 </P>
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 29 September 2014
+Last updated: 28 August 2021
 <br>
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 262 - 319
regex.mod/pcre/doc/html/pcre2partial.html

@@ -14,78 +14,123 @@ please consult the man page, in case the conversion went wrong.
 <br>
 <br>
 <ul>
 <ul>
 <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE2</a>
 <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE2</a>
-<li><a name="TOC2" href="#SEC2">PARTIAL MATCHING USING pcre2_match()</a>
-<li><a name="TOC3" href="#SEC3">PARTIAL MATCHING USING pcre2_dfa_match()</a>
-<li><a name="TOC4" href="#SEC4">PARTIAL MATCHING AND WORD BOUNDARIES</a>
-<li><a name="TOC5" href="#SEC5">EXAMPLE OF PARTIAL MATCHING USING PCRE2TEST</a>
+<li><a name="TOC2" href="#SEC2">REQUIREMENTS FOR A PARTIAL MATCH</a>
+<li><a name="TOC3" href="#SEC3">PARTIAL MATCHING USING pcre2_match()</a>
+<li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre2_match()</a>
+<li><a name="TOC5" href="#SEC5">PARTIAL MATCHING USING pcre2_dfa_match()</a>
 <li><a name="TOC6" href="#SEC6">MULTI-SEGMENT MATCHING WITH pcre2_dfa_match()</a>
 <li><a name="TOC6" href="#SEC6">MULTI-SEGMENT MATCHING WITH pcre2_dfa_match()</a>
-<li><a name="TOC7" href="#SEC7">MULTI-SEGMENT MATCHING WITH pcre2_match()</a>
-<li><a name="TOC8" href="#SEC8">ISSUES WITH MULTI-SEGMENT MATCHING</a>
-<li><a name="TOC9" href="#SEC9">AUTHOR</a>
-<li><a name="TOC10" href="#SEC10">REVISION</a>
+<li><a name="TOC7" href="#SEC7">AUTHOR</a>
+<li><a name="TOC8" href="#SEC8">REVISION</a>
 </ul>
 </ul>
 <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE2</a><br>
 <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE2</a><br>
 <P>
 <P>
-In normal use of PCRE2, if the subject string that is passed to a matching
-function matches as far as it goes, but is too short to match the entire
-pattern, PCRE2_ERROR_NOMATCH is returned. There are circumstances where it
-might be helpful to distinguish this case from other cases in which there is no
-match.
+In normal use of PCRE2, if there is a match up to the end of a subject string,
+but more characters are needed to match the entire pattern, PCRE2_ERROR_NOMATCH
+is returned, just like any other failing match. There are circumstances where
+it might be helpful to distinguish this "partial match" case.
 </P>
 </P>
 <P>
 <P>
-Consider, for example, an application where a human is required to type in data
-for a field with specific formatting requirements. An example might be a date
-in the form <i>ddmmmyy</i>, defined by this pattern:
-<pre>
-  ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
-</pre>
-If the application sees the user's keystrokes one by one, and can check that
-what has been typed so far is potentially valid, it is able to raise an error
-as soon as a mistake is made, by beeping and not reflecting the character that
-has been typed, for example. This immediate feedback is likely to be a better
-user interface than a check that is delayed until the entire string has been
-entered. Partial matching can also be useful when the subject string is very
-long and is not all available at once.
+One example is an application where the subject string is very long, and not
+all available at once. The requirement here is to be able to do the matching
+segment by segment, but special action is needed when a matched substring spans
+the boundary between two segments.
 </P>
 </P>
 <P>
 <P>
-PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT and
-PCRE2_PARTIAL_HARD options, which can be set when calling a matching function.
-The difference between the two options is whether or not a partial match is
-preferred to an alternative complete match, though the details differ between
-the two types of matching function. If both options are set, PCRE2_PARTIAL_HARD
-takes precedence.
+Another example is checking a user input string as it is typed, to ensure that
+it conforms to a required format. Invalid characters can be immediately
+diagnosed and rejected, giving instant feedback.
 </P>
 </P>
 <P>
 <P>
-If you want to use partial matching with just-in-time optimized code, you must
-call <b>pcre2_jit_compile()</b> with one or both of these options:
+Partial matching is a PCRE2-specific feature; it is not Perl-compatible. It is
+requested by setting one of the PCRE2_PARTIAL_HARD or PCRE2_PARTIAL_SOFT
+options when calling a matching function. The difference between the two
+options is whether or not a partial match is preferred to an alternative
+complete match, though the details differ between the two types of matching
+function. If both options are set, PCRE2_PARTIAL_HARD takes precedence.
+</P>
+<P>
+If you want to use partial matching with just-in-time optimized code, as well
+as setting a partial match option for the matching function, you must also call
+<b>pcre2_jit_compile()</b> with one or both of these options:
 <pre>
 <pre>
-  PCRE2_JIT_PARTIAL_SOFT
   PCRE2_JIT_PARTIAL_HARD
   PCRE2_JIT_PARTIAL_HARD
+  PCRE2_JIT_PARTIAL_SOFT
 </pre>
 </pre>
 PCRE2_JIT_COMPLETE should also be set if you are going to run non-partial
 PCRE2_JIT_COMPLETE should also be set if you are going to run non-partial
-matches on the same pattern. If the appropriate JIT mode has not been compiled,
-interpretive matching code is used.
+matches on the same pattern. Separate code is compiled for each mode. If the
+appropriate JIT mode has not been compiled, interpretive matching code is used.
 </P>
 </P>
 <P>
 <P>
 Setting a partial matching option disables two of PCRE2's standard
 Setting a partial matching option disables two of PCRE2's standard
-optimizations. PCRE2 remembers the last literal code unit in a pattern, and
-abandons matching immediately if it is not present in the subject string. This
-optimization cannot be used for a subject string that might match only
-partially. PCRE2 also knows the minimum length of a matching string, and does
+optimization hints. PCRE2 remembers the last literal code unit in a pattern,
+and abandons matching immediately if it is not present in the subject string.
+This optimization cannot be used for a subject string that might match only
+partially. PCRE2 also remembers a minimum length of a matching string, and does
 not bother to run the matching function on shorter strings. This optimization
 not bother to run the matching function on shorter strings. This optimization
 is also disabled for partial matching.
 is also disabled for partial matching.
 </P>
 </P>
-<br><a name="SEC2" href="#TOC1">PARTIAL MATCHING USING pcre2_match()</a><br>
+<br><a name="SEC2" href="#TOC1">REQUIREMENTS FOR A PARTIAL MATCH</a><br>
+<P>
+A possible partial match occurs during matching when the end of the subject
+string is reached successfully, but either more characters are needed to
+complete the match, or the addition of more characters might change what is
+matched.
+</P>
 <P>
 <P>
-A partial match occurs during a call to <b>pcre2_match()</b> when the end of the
-subject string is reached successfully, but matching cannot continue because
-more characters are needed. However, at least one character in the subject must
-have been inspected. This character need not form part of the final matched
-string; lookbehind assertions and the \K escape sequence provide ways of
-inspecting characters before the start of a matched string. The requirement for
-inspecting at least one character exists because an empty string can always be
-matched; without such a restriction there would always be a partial match of an
-empty string at the end of the subject.
+Example 1: if the pattern is /abc/ and the subject is "ab", more characters are
+definitely needed to complete a match. In this case both hard and soft matching
+options yield a partial match.
+</P>
+<P>
+Example 2: if the pattern is /ab+/ and the subject is "ab", a complete match
+can be found, but the addition of more characters might change what is
+matched. In this case, only PCRE2_PARTIAL_HARD returns a partial match;
+PCRE2_PARTIAL_SOFT returns the complete match.
+</P>
+<P>
+On reaching the end of the subject, when PCRE2_PARTIAL_HARD is set, if the next
+pattern item is \z, \Z, \b, \B, or $ there is always a partial match.
+Otherwise, for both options, the next pattern item must be one that inspects a
+character, and at least one of the following must be true:
+</P>
+<P>
+(1) At least one character has already been inspected. An inspected character
+need not form part of the final matched string; lookbehind assertions and the
+\K escape sequence provide ways of inspecting characters before the start of a
+matched string.
+</P>
+<P>
+(2) The pattern contains one or more lookbehind assertions. This condition
+exists in case there is a lookbehind that inspects characters before the start
+of the match.
+</P>
+<P>
+(3) There is a special case when the whole pattern can match an empty string.
+When the starting point is at the end of the subject, the empty string match is
+a possibility, and if PCRE2_PARTIAL_SOFT is set and neither of the above
+conditions is true, it is returned. However, because adding more characters
+might result in a non-empty match, PCRE2_PARTIAL_HARD returns a partial match,
+which in this case means "there is going to be a match at this point, but until
+some more characters are added, we do not know if it will be an empty string or
+something longer".
+</P>
+<br><a name="SEC3" href="#TOC1">PARTIAL MATCHING USING pcre2_match()</a><br>
+<P>
+When a partial matching option is set, the result of calling
+<b>pcre2_match()</b> can be one of the following:
+</P>
+<P>
+<b>A successful match</b>
+A complete match has been found, starting and ending within this subject.
+</P>
+<P>
+<b>PCRE2_ERROR_NOMATCH</b>
+No match can start anywhere in this subject.
+</P>
+<P>
+<b>PCRE2_ERROR_PARTIAL</b>
+Adding more characters may result in a complete match that uses one or more
+characters from the end of this subject.
 </P>
 </P>
 <P>
 <P>
 When a partial match is returned, the first two elements in the ovector point
 When a partial match is returned, the first two elements in the ovector point
@@ -103,54 +148,42 @@ these characters are needed for a subsequent re-match with additional
 characters.
 characters.
 </P>
 </P>
 <P>
 <P>
-What happens when a partial match is identified depends on which of the two
-partial matching options are set.
-</P>
-<br><b>
-PCRE2_PARTIAL_SOFT WITH pcre2_match()
-</b><br>
-<P>
-If PCRE2_PARTIAL_SOFT is set when <b>pcre2_match()</b> identifies a partial
-match, the partial match is remembered, but matching continues as normal, and
-other alternatives in the pattern are tried. If no complete match can be found,
-PCRE2_ERROR_PARTIAL is returned instead of PCRE2_ERROR_NOMATCH.
-</P>
-<P>
-This option is "soft" because it prefers a complete match over a partial match.
-All the various matching items in a pattern behave as if the subject string is
-potentially complete. For example, \z, \Z, and $ match at the end of the
-subject, as normal, and for \b and \B the end of the subject is treated as a
-non-alphanumeric.
-</P>
-<P>
 If there is more than one partial match, the first one that was found provides
 If there is more than one partial match, the first one that was found provides
 the data that is returned. Consider this pattern:
 the data that is returned. Consider this pattern:
 <pre>
 <pre>
   /123\w+X|dogY/
   /123\w+X|dogY/
 </pre>
 </pre>
-If this is matched against the subject string "abc123dog", both
-alternatives fail to match, but the end of the subject is reached during
-matching, so PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9,
-identifying "123dog" as the first partial match that was found. (In this
-example, there are two partial matches, because "dog" on its own partially
-matches the second alternative.)
+If this is matched against the subject string "abc123dog", both alternatives
+fail to match, but the end of the subject is reached during matching, so
+PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, identifying
+"123dog" as the first partial match. (In this example, there are two partial
+matches, because "dog" on its own partially matches the second alternative.)
 </P>
 </P>
 <br><b>
 <br><b>
-PCRE2_PARTIAL_HARD WITH pcre2_match()
+How a partial match is processed by pcre2_match()
 </b><br>
 </b><br>
 <P>
 <P>
-If PCRE2_PARTIAL_HARD is set for <b>pcre2_match()</b>, PCRE2_ERROR_PARTIAL is
-returned as soon as a partial match is found, without continuing to search for
-possible complete matches. This option is "hard" because it prefers an earlier
-partial match over a later complete match. For this reason, the assumption is
-made that the end of the supplied subject string may not be the true end of the
-available data, and so, if \z, \Z, \b, \B, or $ are encountered at the end
-of the subject, the result is PCRE2_ERROR_PARTIAL, provided that at least one
-character in the subject has been inspected.
+What happens when a partial match is identified depends on which of the two
+partial matching options is set.
+</P>
+<P>
+If PCRE2_PARTIAL_HARD is set, PCRE2_ERROR_PARTIAL is returned as soon as a
+partial match is found, without continuing to search for possible complete
+matches. This option is "hard" because it prefers an earlier partial match over
+a later complete match. For this reason, the assumption is made that the end of
+the supplied subject string is not the true end of the available data, which is
+why \z, \Z, \b, \B, and $ always give a partial match.
+</P>
+<P>
+If PCRE2_PARTIAL_SOFT is set, the partial match is remembered, but matching
+continues as normal, and other alternatives in the pattern are tried. If no
+complete match can be found, PCRE2_ERROR_PARTIAL is returned instead of
+PCRE2_ERROR_NOMATCH. This option is "soft" because it prefers a complete match
+over a partial match. All the various matching items in a pattern behave as if
+the subject string is potentially complete; \z, \Z, and $ match at the end of
+the subject, as normal, and for \b and \B the end of the subject is treated
+as a non-alphanumeric.
 </P>
 </P>
-<br><b>
-Comparing hard and soft partial matching
-</b><br>
 <P>
 <P>
 The difference between the two partial matching options can be illustrated by a
 The difference between the two partial matching options can be illustrated by a
 pattern such as:
 pattern such as:
@@ -175,245 +208,167 @@ to follow this explanation by thinking of the two patterns like this:
 The second pattern will never match "dogsbody", because it will always find the
 The second pattern will never match "dogsbody", because it will always find the
 shorter match first.
 shorter match first.
 </P>
 </P>
-<br><a name="SEC3" href="#TOC1">PARTIAL MATCHING USING pcre2_dfa_match()</a><br>
-<P>
-The DFA functions move along the subject string character by character, without
-backtracking, searching for all possible matches simultaneously. If the end of
-the subject is reached before the end of the pattern, there is the possibility
-of a partial match, again provided that at least one character has been
-inspected.
-</P>
-<P>
-When PCRE2_PARTIAL_SOFT is set, PCRE2_ERROR_PARTIAL is returned only if there
-have been no complete matches. Otherwise, the complete matches are returned.
-However, if PCRE2_PARTIAL_HARD is set, a partial match takes precedence over
-any complete matches. The portion of the string that was matched when the
-longest partial match was found is set as the first matching string.
-</P>
-<P>
-Because the DFA functions always search for all possible matches, and there is
-no difference between greedy and ungreedy repetition, their behaviour is
-different from the standard functions when PCRE2_PARTIAL_HARD is set. Consider
-the string "dog" matched against the ungreedy pattern shown above:
-<pre>
-  /dog(sbody)??/
-</pre>
-Whereas the standard function stops as soon as it finds the complete match for
-"dog", the DFA function also finds the partial match for "dogsbody", and so
-returns that when PCRE2_PARTIAL_HARD is set.
-</P>
-<br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br>
+<br><b>
+Example of partial matching using pcre2test
+</b><br>
 <P>
 <P>
-If a pattern ends with one of sequences \b or \B, which test for word
-boundaries, partial matching with PCRE2_PARTIAL_SOFT can give counter-intuitive
-results. Consider this pattern:
-<pre>
-  /\bcat\b/
-</pre>
-This matches "cat", provided there is a word boundary at either end. If the
-subject string is "the cat", the comparison of the final "t" with a following
-character cannot take place, so a partial match is found. However, normal
-matching carries on, and \b matches at the end of the subject when the last
-character is a letter, so a complete match is found. The result, therefore, is
-<i>not</i> PCRE2_ERROR_PARTIAL. Using PCRE2_PARTIAL_HARD in this case does yield
-PCRE2_ERROR_PARTIAL, because then the partial match takes precedence.
-</P>
-<br><a name="SEC5" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRE2TEST</a><br>
-<P>
-If the <b>partial_soft</b> (or <b>ps</b>) modifier is present on a
-<b>pcre2test</b> data line, the PCRE2_PARTIAL_SOFT option is used for the match.
-Here is a run of <b>pcre2test</b> that uses the date example quoted above:
+The <b>pcre2test</b> data modifiers <b>partial_hard</b> (or <b>ph</b>) and
+<b>partial_soft</b> (or <b>ps</b>) set PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT,
+respectively, when calling <b>pcre2_match()</b>. Here is a run of
+<b>pcre2test</b> using a pattern that matches the whole subject in the form of a
+date:
 <pre>
 <pre>
     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
-  data&#62; 25jun04\=ps
-   0: 25jun04
-   1: jun
-  data&#62; 25dec3\=ps
+  data&#62; 25dec3\=ph
   Partial match: 23dec3
   Partial match: 23dec3
-  data&#62; 3ju\=ps
+  data&#62; 3ju\=ph
   Partial match: 3ju
   Partial match: 3ju
-  data&#62; 3juj\=ps
-  No match
-  data&#62; j\=ps
+  data&#62; 3juj\=ph
   No match
   No match
 </pre>
 </pre>
-The first data string is matched completely, so <b>pcre2test</b> shows the
-matched substrings. The remaining four strings do not match the complete
-pattern, but the first two are partial matches. Similar output is obtained
-if DFA matching is used.
-</P>
-<P>
-If the <b>partial_hard</b> (or <b>ph</b>) modifier is present on a
-<b>pcre2test</b> data line, the PCRE2_PARTIAL_HARD option is set for the match.
-</P>
-<br><a name="SEC6" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre2_dfa_match()</a><br>
-<P>
-When a partial match has been found using a DFA matching function, it is
-possible to continue the match by providing additional subject data and calling
-the function again with the same compiled regular expression, this time setting
-the PCRE2_DFA_RESTART option. You must pass the same working space as before,
-because this is where details of the previous partial match are stored. Here is
-an example using <b>pcre2test</b>:
+This example gives the same results for both hard and soft partial matching
+options. Here is an example where there is a difference:
 <pre>
 <pre>
     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
-  data&#62; 23ja\=dfa,ps
-  Partial match: 23ja
-  data&#62; n05\=dfa,dfa_restart
-   0: n05
+  data&#62; 25jun04\=ps
+   0: 25jun04
+   1: jun
+  data&#62; 25jun04\=ph
+  Partial match: 25jun04
 </pre>
 </pre>
-The first call has "23ja" as the subject, and requests partial matching; the
-second call has "n05" as the subject for the continued (restarted) match.
-Notice that when the match is complete, only the last part is shown; PCRE2 does
-not retain the previously partially-matched string. It is up to the calling
-program to do that if it needs to.
-</P>
-<P>
-That means that, for an unanchored pattern, if a continued match fails, it is
-not possible to try again at a new starting point. All this facility is capable
-of doing is continuing with the previous match attempt. In the previous
-example, if the second set of data is "ug23" the result is no match, even
-though there would be a match for "aug23" if the entire string were given at
-once. Depending on the application, this may or may not be what you want.
-The only way to allow for starting again at the next character is to retain the
-matched part of the subject and try a new complete match.
+With PCRE2_PARTIAL_SOFT, the subject is matched completely. For
+PCRE2_PARTIAL_HARD, however, the subject is assumed not to be complete, so
+there is only a partial match.
 </P>
 </P>
+<br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre2_match()</a><br>
 <P>
 <P>
-You can set the PCRE2_PARTIAL_SOFT or PCRE2_PARTIAL_HARD options with
-PCRE2_DFA_RESTART to continue partial matching over multiple segments. This
-facility can be used to pass very long subject strings to the DFA matching
-functions.
+PCRE was not originally designed with multi-segment matching in mind. However,
+over time, features (including partial matching) that make multi-segment
+matching possible have been added. A very long string can be searched segment
+by segment by calling <b>pcre2_match()</b> repeatedly, with the aim of achieving
+the same results that would happen if the entire string was available for
+searching all the time. Normally, the strings that are being sought are much
+shorter than each individual segment, and are in the middle of very long
+strings, so the pattern is normally not anchored.
 </P>
 </P>
-<br><a name="SEC7" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre2_match()</a><br>
 <P>
 <P>
-Unlike the DFA function, it is not possible to restart the previous match with
-a new segment of data when using <b>pcre2_match()</b>. Instead, new data must be
-added to the previous subject string, and the entire match re-run, starting
-from the point where the partial match occurred. Earlier data can be discarded.
+Special logic must be implemented to handle a matched substring that spans a
+segment boundary. PCRE2_PARTIAL_HARD should be used, because it returns a
+partial match at the end of a segment whenever there is the possibility of
+changing the match by adding more characters. The PCRE2_NOTBOL option should
+also be set for all but the first segment.
 </P>
 </P>
 <P>
 <P>
-It is best to use PCRE2_PARTIAL_HARD in this situation, because it does not
-treat the end of a segment as the end of the subject when matching \z, \Z,
-\b, \B, and $. Consider an unanchored pattern that matches dates:
+When a partial match occurs, the next segment must be added to the current
+subject and the match re-run, using the <i>startoffset</i> argument of
+<b>pcre2_match()</b> to begin at the point where the partial match started.
+For example:
 <pre>
 <pre>
     re&#62; /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
     re&#62; /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
-  data&#62; The date is 23ja\=ph
+  data&#62; ...the date is 23ja\=ph
   Partial match: 23ja
   Partial match: 23ja
+  data&#62; ...the date is 23jan19 and on that day...\=offset=15
+   0: 23jan19
+   1: jan
 </pre>
 </pre>
-At this stage, an application could discard the text preceding "23ja", add on
-text from the next segment, and call the matching function again. Unlike the
-DFA matching function, the entire matching string must always be available,
-and the complete matching process occurs for each call, so more memory and more
-processing time is needed.
-</P>
-<br><a name="SEC8" href="#TOC1">ISSUES WITH MULTI-SEGMENT MATCHING</a><br>
-<P>
-Certain types of pattern may give problems with multi-segment matching,
-whichever matching function is used.
+Note the use of the <b>offset</b> modifier to start the new match where the
+partial match was found. In this example, the next segment was added to the one
+in which the partial match was found. This is the most straightforward
+approach, typically using a memory buffer that is twice the size of each
+segment. After a partial match, the first half of the buffer is discarded, the
+second half is moved to the start of the buffer, and a new segment is added
+before repeating the match as in the example above. After a no match, the
+entire buffer can be discarded.
+</P>
+<P>
+If there are memory constraints, you may want to discard text that precedes a
+partial match before adding the next segment. Unfortunately, this is not at
+present straightforward. In cases such as the above, where the pattern does not
+contain any lookbehinds, it is sufficient to retain only the partially matched
+substring. However, if the pattern contains a lookbehind assertion, characters
+that precede the start of the partial match may have been inspected during the
+matching process. When <b>pcre2test</b> displays a partial match, it indicates
+these characters with '&#60;' if the <b>allusedtext</b> modifier is set:
+<pre>
+    re&#62; "(?&#60;=123)abc"
+  data&#62; xx123ab\=ph,allusedtext
+  Partial match: 123ab
+                 &#60;&#60;&#60;
+</pre>
+However, the <b>allusedtext</b> modifier is not available for JIT matching,
+because JIT matching does not record the first (or last) consulted characters.
+For this reason, this information is not available via the API. It is therefore
+not possible in general to obtain the exact number of characters that must be
+retained in order to get the right match result. If you cannot retain the
+entire segment, you must find some heuristic way of choosing.
 </P>
 </P>
 <P>
 <P>
-1. If the pattern contains a test for the beginning of a line, you need to pass
-the PCRE2_NOTBOL option when the subject string for any call does start at the
-beginning of a line. There is also a PCRE2_NOTEOL option, but in practice when
-doing multi-segment matching you should be using PCRE2_PARTIAL_HARD, which
-includes the effect of PCRE2_NOTEOL.
+If you know the approximate length of the matching substrings, you can use that
+to decide how much text to retain. The only lookbehind information that is
+currently available via the API is the length of the longest individual
+lookbehind in a pattern, but this can be misleading if there are nested
+lookbehinds. The value returned by calling <b>pcre2_pattern_info()</b> with the
+PCRE2_INFO_MAXLOOKBEHIND option is the maximum number of characters (not code
+units) that any individual lookbehind moves back when it is processed. A
+pattern such as "(?&#60;=(?&#60;!b)a)" has a maximum lookbehind value of one, but
+inspects two characters before its starting point.
 </P>
 </P>
 <P>
 <P>
-2. If a pattern contains a lookbehind assertion, characters that precede the
-start of the partial match may have been inspected during the matching process.
-When using <b>pcre2_match()</b>, sufficient characters must be retained for the
-next match attempt. You can ensure that enough characters are retained by doing
-the following:
+In a non-UTF or a 32-bit case, moving back is just a subtraction, but in
+UTF-8 or UTF-16 you have to count characters while moving back through the code
+units.
 </P>
 </P>
+<br><a name="SEC5" href="#TOC1">PARTIAL MATCHING USING pcre2_dfa_match()</a><br>
 <P>
 <P>
-Before doing any matching, find the length of the longest lookbehind in the
-pattern by calling <b>pcre2_pattern_info()</b> with the PCRE2_INFO_MAXLOOKBEHIND
-option. Note that the resulting count is in characters, not code units. After a
-partial match, moving back from the ovector[0] offset in the subject by the
-number of characters given for the maximum lookbehind gets you to the earliest
-character that must be retained. In a non-UTF or a 32-bit situation, moving
-back is just a subtraction, but in UTF-8 or UTF-16 you have to count characters
-while moving back through the code units.
+The DFA function moves along the subject string character by character, without
+backtracking, searching for all possible matches simultaneously. If the end of
+the subject is reached before the end of the pattern, there is the possibility
+of a partial match.
 </P>
 </P>
 <P>
 <P>
-Characters before the point you have now reached can be discarded, and after
-the next segment has been added to what is retained, you should run the next
-match with the <b>startoffset</b> argument set so that the match begins at the
-same point as before.
+When PCRE2_PARTIAL_SOFT is set, PCRE2_ERROR_PARTIAL is returned only if there
+have been no complete matches. Otherwise, the complete matches are returned.
+If PCRE2_PARTIAL_HARD is set, a partial match takes precedence over any
+complete matches. The portion of the string that was matched when the longest
+partial match was found is set as the first matching string.
 </P>
 </P>
 <P>
 <P>
-For example, if the pattern "(?&#60;=123)abc" is partially matched against the
-string "xx123ab", the ovector offsets are 5 and 7 ("ab"). The maximum
-lookbehind count is 3, so all characters before offset 2 can be discarded. The
-value of <b>startoffset</b> for the next match should be 3. When <b>pcre2test</b>
-displays a partial match, it indicates the lookbehind characters with '&#60;'
-characters:
+Because the DFA function always searches for all possible matches, and there is
+no difference between greedy and ungreedy repetition, its behaviour is
+different from the <b>pcre2_match()</b>. Consider the string "dog" matched
+against this ungreedy pattern:
 <pre>
 <pre>
-    re&#62; "(?&#60;=123)abc"
-  data&#62; xx123ab\=ph
-  Partial match: 123ab
-                 &#60;&#60;&#60;
-</PRE>
+  /dog(sbody)??/
+</pre>
+Whereas the standard function stops as soon as it finds the complete match for
+"dog", the DFA function also finds the partial match for "dogsbody", and so
+returns that when PCRE2_PARTIAL_HARD is set.
 </P>
 </P>
+<br><a name="SEC6" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre2_dfa_match()</a><br>
 <P>
 <P>
-3. Because a partial match must always contain at least one character, what
-might be considered a partial match of an empty string actually gives a "no
-match" result. For example:
-<pre>
-    re&#62; /c(?&#60;=abc)x/
-  data&#62; ab\=ps
-  No match
-</pre>
-If the next segment begins "cx", a match should be found, but this will only
-happen if characters from the previous segment are retained. For this reason, a
-"no match" result should be interpreted as "partial match of an empty string"
-when the pattern contains lookbehinds.
-</P>
-<P>
-4. Matching a subject string that is split into multiple segments may not
-always produce exactly the same result as matching over one single long string,
-especially when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and
-Word Boundaries" above describes an issue that arises if the pattern ends with
-\b or \B. Another kind of difference may occur when there are multiple
-matching possibilities, because (for PCRE2_PARTIAL_SOFT) a partial match result
-is given only when there are no completed matches. This means that as soon as
-the shortest match has been found, continuation to a new subject segment is no
-longer possible. Consider this <b>pcre2test</b> example:
-<pre>
-    re&#62; /dog(sbody)?/
-  data&#62; dogsb\=ps
-   0: dog
-  data&#62; do\=ps,dfa
-  Partial match: do
-  data&#62; gsb\=ps,dfa,dfa_restart
-   0: g
-  data&#62; dogsbody\=dfa
-   0: dogsbody
-   1: dog
-</pre>
-The first data line passes the string "dogsb" to a standard matching function,
-setting the PCRE2_PARTIAL_SOFT option. Although the string is a partial match
-for "dogsbody", the result is not PCRE2_ERROR_PARTIAL, because the shorter
-string "dog" is a complete match. Similarly, when the subject is presented to
-a DFA matching function in several parts ("do" and "gsb" being the first two)
-the match stops when "dog" has been found, and it is not possible to continue.
-On the other hand, if "dogsbody" is presented as a single string, a DFA
-matching function finds both matches.
-</P>
-<P>
-Because of these problems, it is best to use PCRE2_PARTIAL_HARD when matching
-multi-segment data. The example above then behaves differently:
+When a partial match has been found using the DFA matching function, it is
+possible to continue the match by providing additional subject data and calling
+the function again with the same compiled regular expression, this time setting
+the PCRE2_DFA_RESTART option. You must pass the same working space as before,
+because this is where details of the previous partial match are stored. You can
+set the PCRE2_PARTIAL_SOFT or PCRE2_PARTIAL_HARD options with PCRE2_DFA_RESTART
+to continue partial matching over multiple segments. Here is an example using
+<b>pcre2test</b>:
 <pre>
 <pre>
-    re&#62; /dog(sbody)?/
-  data&#62; dogsb\=ph
-  Partial match: dogsb
-  data&#62; do\=ps,dfa
-  Partial match: do
-  data&#62; gsb\=ph,dfa,dfa_restart
-  Partial match: gsb
+    re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+  data&#62; 23ja\=dfa,ps
+  Partial match: 23ja
+  data&#62; n05\=dfa,dfa_restart
+   0: n05
 </pre>
 </pre>
-5. Patterns that contain alternatives at the top level which do not all start
-with the same pattern item may not work as expected when PCRE2_DFA_RESTART is
-used. For example, consider this pattern:
+The first call has "23ja" as the subject, and requests partial matching; the
+second call has "n05" as the subject for the continued (restarted) match.
+Notice that when the match is complete, only the last part is shown; PCRE2 does
+not retain the previously partially-matched string. It is up to the calling
+program to do that if it needs to. This means that, for an unanchored pattern,
+if a continued match fails, it is not possible to try again at a new starting
+point. All this facility is capable of doing is continuing with the previous
+match attempt. For example, consider this pattern:
 <pre>
 <pre>
   1234|3789
   1234|3789
 </pre>
 </pre>
@@ -422,30 +377,18 @@ alternative is found at offset 3. There is no partial match for the second
 alternative, because such a match does not start at the same point in the
 alternative, because such a match does not start at the same point in the
 subject string. Attempting to continue with the string "7890" does not yield a
 subject string. Attempting to continue with the string "7890" does not yield a
 match because only those alternatives that match at one point in the subject
 match because only those alternatives that match at one point in the subject
-are remembered. The problem arises because the start of the second alternative
-matches within the first alternative. There is no problem with anchored
-patterns or patterns such as:
-<pre>
-  1234|ABCD
-</pre>
-where no string can be a partial match for both alternatives. This is not a
-problem if a standard matching function is used, because the entire match has
-to be rerun each time:
-<pre>
-    re&#62; /1234|3789/
-  data&#62; ABC123\=ph
-  Partial match: 123
-  data&#62; 1237890
-   0: 3789
-</pre>
-Of course, instead of using PCRE2_DFA_RESTART, the same technique of re-running
-the entire match can also be used with the DFA matching function. Another
-possibility is to work with two buffers. If a partial match at offset <i>n</i>
-in the first buffer is followed by "no match" when PCRE2_DFA_RESTART is used on
-the second buffer, you can then try a new match starting at offset <i>n+1</i> in
-the first buffer.
+are remembered. Depending on the application, this may or may not be what you
+want.
+</P>
+<P>
+If you do want to allow for starting again at the next character, one way of
+doing it is to retain some or all of the segment and try a new complete match,
+as described for <b>pcre2_match()</b> above. Another possibility is to work with
+two buffers. If a partial match at offset <i>n</i> in the first buffer is
+followed by "no match" when PCRE2_DFA_RESTART is used on the second buffer, you
+can then try a new match starting at offset <i>n+1</i> in the first buffer.
 </P>
 </P>
-<br><a name="SEC9" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC7" href="#TOC1">AUTHOR</a><br>
 <P>
 <P>
 Philip Hazel
 Philip Hazel
 <br>
 <br>
@@ -454,11 +397,11 @@ University Computing Service
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
 </P>
 </P>
-<br><a name="SEC10" href="#TOC1">REVISION</a><br>
+<br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 22 December 2014
+Last updated: 04 September 2019
 <br>
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2019 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 329 - 229
regex.mod/pcre/doc/html/pcre2pattern.html


+ 27 - 19
regex.mod/pcre/doc/html/pcre2perform.html

@@ -31,9 +31,9 @@ of them.
 Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
 Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
 so that most simple patterns do not use much memory for storing the compiled
 so that most simple patterns do not use much memory for storing the compiled
 version. However, there is one case where the memory usage of a compiled
 version. However, there is one case where the memory usage of a compiled
-pattern can be unexpectedly large. If a parenthesized subpattern has a
-quantifier with a minimum greater than 1 and/or a limited maximum, the whole
-subpattern is repeated in the compiled code. For example, the pattern
+pattern can be unexpectedly large. If a parenthesized group has a quantifier
+with a minimum greater than 1 and/or a limited maximum, the whole group is
+repeated in the compiled code. For example, the pattern
 <pre>
 <pre>
   (abc|def){2,4}
   (abc|def){2,4}
 </pre>
 </pre>
@@ -52,9 +52,9 @@ example, the very simple pattern
 <pre>
 <pre>
   ((ab){1,1000}c){1,3}
   ((ab){1,1000}c){1,3}
 </pre>
 </pre>
-uses over 50K bytes when compiled using the 8-bit library. When PCRE2 is
+uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
 compiled with its default internal pointer size of two bytes, the size limit on
 compiled with its default internal pointer size of two bytes, the size limit on
-a compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and
+a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and
 this is reached with the above pattern if the outer repetition is increased
 this is reached with the above pattern if the outer repetition is increased
 from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
 from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
 handle larger compiled patterns, but it is better to try to rewrite your
 handle larger compiled patterns, but it is better to try to rewrite your
@@ -68,14 +68,14 @@ facility. Re-writing the above pattern as
 <pre>
 <pre>
   ((ab)(?2){0,999}c)(?1){0,2}
   ((ab)(?2){0,999}c)(?1){0,2}
 </pre>
 </pre>
-reduces the memory requirements to around 16K, and indeed it remains under 20K
-even with the outer repetition increased to 100. However, this kind of pattern
-is not always exactly equivalent, because any captures within subroutine calls
-are lost when the subroutine completes. If this is not a problem, this kind of
-rewriting will allow you to process patterns that PCRE2 cannot otherwise
-handle. The matching performance of the two different versions of the pattern
-are roughly the same. (This applies from release 10.30 - things were different
-in earlier releases.)
+reduces the memory requirements to around 16KiB, and indeed it remains under
+20KiB even with the outer repetition increased to 100. However, this kind of
+pattern is not always exactly equivalent, because any captures within
+subroutine calls are lost when the subroutine completes. If this is not a
+problem, this kind of rewriting will allow you to process patterns that PCRE2
+cannot otherwise handle. The matching performance of the two different versions
+of the pattern are roughly the same. (This applies from release 10.30 - things
+were different in earlier releases.)
 </P>
 </P>
 <br><a name="SEC3" href="#TOC1">STACK AND HEAP USAGE AT RUN TIME</a><br>
 <br><a name="SEC3" href="#TOC1">STACK AND HEAP USAGE AT RUN TIME</a><br>
 <P>
 <P>
@@ -83,7 +83,7 @@ From release 10.30, the interpretive (non-JIT) version of <b>pcre2_match()</b>
 uses very little system stack at run time. In earlier releases recursive
 uses very little system stack at run time. In earlier releases recursive
 function calls could use a great deal of stack, and this could cause problems,
 function calls could use a great deal of stack, and this could cause problems,
 but this usage has been eliminated. Backtracking positions are now explicitly
 but this usage has been eliminated. Backtracking positions are now explicitly
-remembered in memory frames controlled by the code. An initial 20K vector of
+remembered in memory frames controlled by the code. An initial 20KiB vector of
 frames is allocated on the system stack (enough for about 100 frames for small
 frames is allocated on the system stack (enough for about 100 frames for small
 patterns), but if this is insufficient, heap memory is used. The amount of heap
 patterns), but if this is insufficient, heap memory is used. The amount of heap
 memory can be limited; if the limit is set to zero, only the initial stack
 memory can be limited; if the limit is set to zero, only the initial stack
@@ -93,9 +93,17 @@ may also reduce the memory requirements.
 <P>
 <P>
 In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
 In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
 function calls, but only for processing atomic groups, lookaround assertions,
 function calls, but only for processing atomic groups, lookaround assertions,
-and recursion within the pattern. Too much nested recursion may cause stack
-issues. The "match depth" parameter can be used to limit the depth of function
-recursion in <b>pcre2_dfa_match()</b>.
+and recursion within the pattern. The original version of the code used to
+allocate quite large internal workspace vectors on the stack, which caused some
+problems for some patterns in environments with small stacks. From release
+10.32 the code for <b>pcre2_dfa_match()</b> has been re-factored to use heap
+memory when necessary for internal workspace when recursing, though recursive
+function calls are still used.
+</P>
+<P>
+The "match depth" parameter can be used to limit the depth of function
+recursion, and the "match heap" parameter to limit heap memory in
+<b>pcre2_dfa_match()</b>.
 </P>
 </P>
 <br><a name="SEC4" href="#TOC1">PROCESSING TIME</a><br>
 <br><a name="SEC4" href="#TOC1">PROCESSING TIME</a><br>
 <P>
 <P>
@@ -244,9 +252,9 @@ Cambridge, England.
 </P>
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 08 April 2017
+Last updated: 03 February 2019
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2019 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 92 - 67
regex.mod/pcre/doc/html/pcre2posix.html

@@ -15,51 +15,78 @@ please consult the man page, in case the conversion went wrong.
 <ul>
 <ul>
 <li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
 <li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
 <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
 <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
-<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
-<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
-<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
-<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
-<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
-<li><a name="TOC8" href="#SEC8">AUTHOR</a>
-<li><a name="TOC9" href="#SEC9">REVISION</a>
+<li><a name="TOC3" href="#SEC3">USING THE POSIX FUNCTIONS</a>
+<li><a name="TOC4" href="#SEC4">COMPILING A PATTERN</a>
+<li><a name="TOC5" href="#SEC5">MATCHING NEWLINE CHARACTERS</a>
+<li><a name="TOC6" href="#SEC6">MATCHING A PATTERN</a>
+<li><a name="TOC7" href="#SEC7">ERROR MESSAGES</a>
+<li><a name="TOC8" href="#SEC8">MEMORY USAGE</a>
+<li><a name="TOC9" href="#SEC9">AUTHOR</a>
+<li><a name="TOC10" href="#SEC10">REVISION</a>
 </ul>
 </ul>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
 <P>
 <P>
 <b>#include &#60;pcre2posix.h&#62;</b>
 <b>#include &#60;pcre2posix.h&#62;</b>
 </P>
 </P>
 <P>
 <P>
-<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
+<b>int pcre2_regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
 <b>     int <i>cflags</i>);</b>
 <b>     int <i>cflags</i>);</b>
 <br>
 <br>
 <br>
 <br>
-<b>int regexec(const regex_t *<i>preg</i>, const char *<i>string</i>,</b>
+<b>int pcre2_regexec(const regex_t *<i>preg</i>, const char *<i>string</i>,</b>
 <b>     size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
 <b>     size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
 <br>
 <br>
 <br>
 <br>
-<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
+<b>size_t pcre2_regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
 <b>     char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
 <b>     char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
 <br>
 <br>
 <br>
 <br>
-<b>void regfree(regex_t *<i>preg</i>);</b>
+<b>void pcre2_regfree(regex_t *<i>preg</i>);</b>
 </P>
 </P>
 <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
 <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
 <P>
 <P>
 This set of functions provides a POSIX-style API for the PCRE2 regular
 This set of functions provides a POSIX-style API for the PCRE2 regular
-expression 8-bit library. See the
+expression 8-bit library. There are no POSIX-style wrappers for PCRE2's 16-bit
+and 32-bit libraries. See the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation for a description of PCRE2's native API, which contains much
 documentation for a description of PCRE2's native API, which contains much
-additional functionality. There are no POSIX-style wrappers for PCRE2's 16-bit
-and 32-bit libraries.
+additional functionality.
 </P>
 </P>
 <P>
 <P>
-The functions described here are just wrapper functions that ultimately call
-the PCRE2 native API. Their prototypes are defined in the <b>pcre2posix.h</b>
-header file, and on Unix systems the library itself is called
-<b>libpcre2-posix.a</b>, so can be accessed by adding <b>-lpcre2-posix</b> to the
-command for linking an application that uses them. Because the POSIX functions
-call the native ones, it is also necessary to add <b>-lpcre2-8</b>.
+The functions described here are wrapper functions that ultimately call the
+PCRE2 native API. Their prototypes are defined in the <b>pcre2posix.h</b> header
+file, and they all have unique names starting with <b>pcre2_</b>. However, the
+<b>pcre2posix.h</b> header also contains macro definitions that convert the
+standard POSIX names such <b>regcomp()</b> into <b>pcre2_regcomp()</b> etc. This
+means that a program can use the usual POSIX names without running the risk of
+accidentally linking with POSIX functions from a different library.
 </P>
 </P>
 <P>
 <P>
+On Unix-like systems the PCRE2 POSIX library is called <b>libpcre2-posix</b>, so
+can be accessed by adding <b>-lpcre2-posix</b> to the command for linking an
+application. Because the POSIX functions call the native ones, it is also
+necessary to add <b>-lpcre2-8</b>.
+</P>
+<P>
+Although they were not defined as protypes in <b>pcre2posix.h</b>, releases
+10.33 to 10.36 of the library contained functions with the POSIX names
+<b>regcomp()</b> etc. These simply passed their arguments to the PCRE2
+functions. These functions were provided for backwards compatibility with
+earlier versions of PCRE2, which had only POSIX names. However, this has proved
+troublesome in situations where a program links with several libraries, some of
+which use PCRE2's POSIX interface while others use the real POSIX functions.
+For this reason, the POSIX names have been removed since release 10.37.
+</P>
+<P>
+Calling the header file <b>pcre2posix.h</b> avoids any conflict with other POSIX
+libraries. It can, of course, be renamed or aliased as <b>regex.h</b>, which is
+the "correct" name, if there is no clash. It provides two structure types,
+<i>regex_t</i> for compiled internal forms, and <i>regmatch_t</i> for returning
+captured substrings. It also defines some constants whose names start with
+"REG_"; these are used for setting options and identifying error codes.
+</P>
+<br><a name="SEC3" href="#TOC1">USING THE POSIX FUNCTIONS</a><br>
+<P>
 Those POSIX option bits that can reasonably be mapped to PCRE2 native options
 Those POSIX option bits that can reasonably be mapped to PCRE2 native options
 have been implemented. In addition, the option REG_EXTENDED is defined with the
 have been implemented. In addition, the option REG_EXTENDED is defined with the
 value zero. This has no effect, but since programs that are written to the
 value zero. This has no effect, but since programs that are written to the
@@ -80,17 +107,13 @@ POSIX definition; it is not fully POSIX-compatible, and in multi-unit encoding
 domains it is probably even less compatible.
 domains it is probably even less compatible.
 </P>
 </P>
 <P>
 <P>
-The header for these functions is supplied as <b>pcre2posix.h</b> to avoid any
-potential clash with other POSIX libraries. It can, of course, be renamed or
-aliased as <b>regex.h</b>, which is the "correct" name. It provides two
-structure types, <i>regex_t</i> for compiled internal forms, and
-<i>regmatch_t</i> for returning captured substrings. It also defines some
-constants whose names start with "REG_"; these are used for setting options and
-identifying error codes.
+The descriptions below use the actual names of the functions, but, as described
+above, the standard POSIX names (without the <b>pcre2_</b> prefix) may also be
+used.
 </P>
 </P>
-<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
+<br><a name="SEC4" href="#TOC1">COMPILING A PATTERN</a><br>
 <P>
 <P>
-The function <b>regcomp()</b> is called to compile a pattern into an
+The function <b>pcre2_regcomp()</b> is called to compile a pattern into an
 internal form. By default, the pattern is a C string terminated by a binary
 internal form. By default, the pattern is a C string terminated by a binary
 zero (but see REG_PEND below). The <i>preg</i> argument is a pointer to a
 zero (but see REG_PEND below). The <i>preg</i> argument is a pointer to a
 <b>regex_t</b> structure that is used as a base for storing information about
 <b>regex_t</b> structure that is used as a base for storing information about
@@ -128,18 +151,18 @@ REG_UTF. Note that REG_NOSPEC is not part of the POSIX standard.
 <pre>
 <pre>
   REG_NOSUB
   REG_NOSUB
 </pre>
 </pre>
-When a pattern that is compiled with this flag is passed to <b>regexec()</b> for
-matching, the <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no
-captured strings are returned. Versions of the PCRE library prior to 10.22 used
-to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
-because it disables the use of back references.
+When a pattern that is compiled with this flag is passed to
+<b>pcre2_regexec()</b> for matching, the <i>nmatch</i> and <i>pmatch</i> arguments
+are ignored, and no captured strings are returned. Versions of the PCRE library
+prior to 10.22 used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this
+no longer happens because it disables the use of backreferences.
 <pre>
 <pre>
   REG_PEND
   REG_PEND
 </pre>
 </pre>
 If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
 If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
 (which has the type const char *) must be set to point to the character beyond
 (which has the type const char *) must be set to point to the character beyond
-the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
-now contain binary zeroes, which are treated as data characters. Without
+the end of the pattern before calling <b>pcre2_regcomp()</b>. The pattern itself
+may now contain binary zeros, which are treated as data characters. Without
 REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
 REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
 ignored. This is a GNU extension to the POSIX standard and should be used with
 ignored. This is a GNU extension to the POSIX standard and should be used with
 caution in software intended to be portable to other systems.
 caution in software intended to be portable to other systems.
@@ -174,18 +197,19 @@ newlines are matched by the dot metacharacter (they are not) or by a negative
 class such as [^a] (they are).
 class such as [^a] (they are).
 </P>
 </P>
 <P>
 <P>
-The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
-<i>preg</i> structure is filled in on success, and one other member of the
+The yield of <b>pcre2_regcomp()</b> is zero on success, and non-zero otherwise.
+The <i>preg</i> structure is filled in on success, and one other member of the
 structure (as well as <i>re_endp</i>) is public: <i>re_nsub</i> contains the
 structure (as well as <i>re_endp</i>) is public: <i>re_nsub</i> contains the
 number of capturing subpatterns in the regular expression. Various error codes
 number of capturing subpatterns in the regular expression. Various error codes
 are defined in the header file.
 are defined in the header file.
 </P>
 </P>
 <P>
 <P>
-NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
-use the contents of the <i>preg</i> structure. If, for example, you pass it to
-<b>regexec()</b>, the result is undefined and your program is likely to crash.
+NOTE: If the yield of <b>pcre2_regcomp()</b> is non-zero, you must not attempt
+to use the contents of the <i>preg</i> structure. If, for example, you pass it
+to <b>pcre2_regexec()</b>, the result is undefined and your program is likely to
+crash.
 </P>
 </P>
-<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
+<br><a name="SEC5" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
 <P>
 <P>
 This area is not simple, because POSIX and Perl take different views of things.
 This area is not simple, because POSIX and Perl take different views of things.
 It is not possible to get PCRE2 to obey POSIX semantics, but then PCRE2 was
 It is not possible to get PCRE2 to obey POSIX semantics, but then PCRE2 was
@@ -219,16 +243,16 @@ is no way to stop newline from matching [^a].
 Default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
 Default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
 PCRE2_DOLLAR_ENDONLY when calling <b>pcre2_compile()</b> directly, but there is
 PCRE2_DOLLAR_ENDONLY when calling <b>pcre2_compile()</b> directly, but there is
 no way to make PCRE2 behave exactly as for the REG_NEWLINE action. When using
 no way to make PCRE2 behave exactly as for the REG_NEWLINE action. When using
-the POSIX API, passing REG_NEWLINE to PCRE2's <b>regcomp()</b> function
+the POSIX API, passing REG_NEWLINE to PCRE2's <b>pcre2_regcomp()</b> function
 causes PCRE2_MULTILINE to be passed to <b>pcre2_compile()</b>, and REG_DOTALL
 causes PCRE2_MULTILINE to be passed to <b>pcre2_compile()</b>, and REG_DOTALL
 passes PCRE2_DOTALL. There is no way to pass PCRE2_DOLLAR_ENDONLY.
 passes PCRE2_DOTALL. There is no way to pass PCRE2_DOLLAR_ENDONLY.
 </P>
 </P>
-<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
+<br><a name="SEC6" href="#TOC1">MATCHING A PATTERN</a><br>
 <P>
 <P>
-The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
-against a given <i>string</i>, which is by default terminated by a zero byte
-(but see REG_STARTEND below), subject to the options in <i>eflags</i>. These can
-be:
+The function <b>pcre2_regexec()</b> is called to match a compiled pattern
+<i>preg</i> against a given <i>string</i>, which is by default terminated by a
+zero byte (but see REG_STARTEND below), subject to the options in <i>eflags</i>.
+These can be:
 <pre>
 <pre>
   REG_NOTBOL
   REG_NOTBOL
 </pre>
 </pre>
@@ -248,10 +272,10 @@ function.
 <pre>
 <pre>
   REG_STARTEND
   REG_STARTEND
 </pre>
 </pre>
-When this option is set, the subject string is starts at <i>string</i> +
+When this option is set, the subject string starts at <i>string</i> +
 <i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
 <i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
 should point to the first character beyond the string. There may be binary
 should point to the first character beyond the string. There may be binary
-zeroes within the subject string, and indeed, using REG_STARTEND is the only
+zeros within the subject string, and indeed, using REG_STARTEND is the only
 way to pass a subject string that contains a binary zero.
 way to pass a subject string that contains a binary zero.
 </P>
 </P>
 <P>
 <P>
@@ -272,7 +296,7 @@ are mutually exclusive; the error REG_INVARG is returned.
 <P>
 <P>
 If the pattern was compiled with the REG_NOSUB flag, no data about any matched
 If the pattern was compiled with the REG_NOSUB flag, no data about any matched
 strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
 strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
-<b>regexec()</b> are ignored (except possibly as input for REG_STARTEND).
+<b>pcre2_regexec()</b> are ignored (except possibly as input for REG_STARTEND).
 </P>
 </P>
 <P>
 <P>
 The value of <i>nmatch</i> may be zero, and the value <i>pmatch</i> may be NULL
 The value of <i>nmatch</i> may be zero, and the value <i>pmatch</i> may be NULL
@@ -294,24 +318,25 @@ array have both structure members set to -1.
 A successful match yields a zero return; various error codes are defined in the
 A successful match yields a zero return; various error codes are defined in the
 header file, of which REG_NOMATCH is the "expected" failure code.
 header file, of which REG_NOMATCH is the "expected" failure code.
 </P>
 </P>
-<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
+<br><a name="SEC7" href="#TOC1">ERROR MESSAGES</a><br>
 <P>
 <P>
-The <b>regerror()</b> function maps a non-zero errorcode from either
-<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
-NULL, the error should have arisen from the use of that structure. A message
-terminated by a binary zero is placed in <i>errbuf</i>. If the buffer is too
-short, only the first <i>errbuf_size</i> - 1 characters of the error message are
-used. The yield of the function is the size of buffer needed to hold the whole
-message, including the terminating zero. This value is greater than
-<i>errbuf_size</i> if the message was truncated.
+The <b>pcre2_regerror()</b> function maps a non-zero errorcode from either
+<b>pcre2_regcomp()</b> or <b>pcre2_regexec()</b> to a printable message. If
+<i>preg</i> is not NULL, the error should have arisen from the use of that
+structure. A message terminated by a binary zero is placed in <i>errbuf</i>. If
+the buffer is too short, only the first <i>errbuf_size</i> - 1 characters of the
+error message are used. The yield of the function is the size of buffer needed
+to hold the whole message, including the terminating zero. This value is
+greater than <i>errbuf_size</i> if the message was truncated.
 </P>
 </P>
-<br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
+<br><a name="SEC8" href="#TOC1">MEMORY USAGE</a><br>
 <P>
 <P>
 Compiling a regular expression causes memory to be allocated and associated
 Compiling a regular expression causes memory to be allocated and associated
-with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
-memory, after which <i>preg</i> may no longer be used as a compiled expression.
+with the <i>preg</i> structure. The function <b>pcre2_regfree()</b> frees all
+such memory, after which <i>preg</i> may no longer be used as a compiled
+expression.
 </P>
 </P>
-<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC9" href="#TOC1">AUTHOR</a><br>
 <P>
 <P>
 Philip Hazel
 Philip Hazel
 <br>
 <br>
@@ -320,11 +345,11 @@ University Computing Service
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
 </P>
 </P>
-<br><a name="SEC9" href="#TOC1">REVISION</a><br>
+<br><a name="SEC10" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 15 June 2017
+Last updated: 26 April 2021
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 19 - 8
regex.mod/pcre/doc/html/pcre2serialize.html

@@ -49,6 +49,15 @@ and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using
 PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be
 PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be
 reloaded using the 8-bit library.
 reloaded using the 8-bit library.
 </P>
 </P>
+<P>
+Note that "serialization" in PCRE2 does not convert compiled patterns to an
+abstract format like Java or .NET serialization. The serialized output is
+really just a bytecode dump, which is why it can only be reloaded in the same
+environment as the one that created it. Hence the restrictions mentioned above.
+Applications that are not statically linked with a fixed version of PCRE2 must
+be prepared to recompile patterns from their sources, in order to be immune to
+PCRE2 upgrades.
+</P>
 <br><a name="SEC2" href="#TOC1">SECURITY CONCERNS</a><br>
 <br><a name="SEC2" href="#TOC1">SECURITY CONCERNS</a><br>
 <P>
 <P>
 The facility for saving and restoring compiled patterns is intended for use
 The facility for saving and restoring compiled patterns is intended for use
@@ -62,11 +71,11 @@ the byte stream that is passed to it.
 </P>
 </P>
 <br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
 <br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
 <P>
 <P>
-Before compiled patterns can be saved they must be serialized, that is,
-converted to a stream of bytes. A single byte stream may contain any number of
-compiled patterns, but they must all use the same character tables. A single
-copy of the tables is included in the byte stream (its size is 1088 bytes). For
-more details of character tables, see the
+Before compiled patterns can be saved they must be serialized, which in PCRE2
+means converting the pattern to a stream of bytes. A single byte stream may
+contain any number of compiled patterns, but they must all use the same
+character tables. A single copy of the tables is included in the byte stream
+(its size is 1088 bytes). For more details of character tables, see the
 <a href="pcre2api.html#localesupport">section on locale support</a>
 <a href="pcre2api.html#localesupport">section on locale support</a>
 in the
 in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 <a href="pcre2api.html"><b>pcre2api</b></a>
@@ -120,7 +129,9 @@ non-binary data, be sure that the file is opened for binary output.
 Serializing a set of patterns leaves the original data untouched, so they can
 Serializing a set of patterns leaves the original data untouched, so they can
 still be used for matching. Their memory must eventually be freed in the usual
 still be used for matching. Their memory must eventually be freed in the usual
 way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
 way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
-stream, it too must be freed by calling <b>pcre2_serialize_free()</b>.
+stream, it too must be freed by calling <b>pcre2_serialize_free()</b>. If this
+function is called with a NULL argument, it returns immediately without doing
+anything.
 </P>
 </P>
 <br><a name="SEC4" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
 <br><a name="SEC4" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
 <P>
 <P>
@@ -193,9 +204,9 @@ Cambridge, England.
 </P>
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 21 March 2017
+Last updated: 27 June 2018
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 166 - 69
regex.mod/pcre/doc/html/pcre2syntax.html

@@ -23,7 +23,7 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
 <li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
 <li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
 <li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
 <li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
 <li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
-<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a>
+<li><a name="TOC11" href="#SEC11">REPORTED MATCH POINT SETTING</a>
 <li><a name="TOC12" href="#SEC12">ALTERNATION</a>
 <li><a name="TOC12" href="#SEC12">ALTERNATION</a>
 <li><a name="TOC13" href="#SEC13">CAPTURING</a>
 <li><a name="TOC13" href="#SEC13">CAPTURING</a>
 <li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
 <li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
@@ -32,14 +32,16 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
 <li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
 <li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
 <li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
 <li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
 <li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
-<li><a name="TOC20" href="#SEC20">BACKREFERENCES</a>
-<li><a name="TOC21" href="#SEC21">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
-<li><a name="TOC22" href="#SEC22">CONDITIONAL PATTERNS</a>
-<li><a name="TOC23" href="#SEC23">BACKTRACKING CONTROL</a>
-<li><a name="TOC24" href="#SEC24">CALLOUTS</a>
-<li><a name="TOC25" href="#SEC25">SEE ALSO</a>
-<li><a name="TOC26" href="#SEC26">AUTHOR</a>
-<li><a name="TOC27" href="#SEC27">REVISION</a>
+<li><a name="TOC20" href="#SEC20">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
+<li><a name="TOC21" href="#SEC21">SCRIPT RUNS</a>
+<li><a name="TOC22" href="#SEC22">BACKREFERENCES</a>
+<li><a name="TOC23" href="#SEC23">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC24" href="#SEC24">CONDITIONAL PATTERNS</a>
+<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a>
+<li><a name="TOC26" href="#SEC26">CALLOUTS</a>
+<li><a name="TOC27" href="#SEC27">SEE ALSO</a>
+<li><a name="TOC28" href="#SEC28">AUTHOR</a>
+<li><a name="TOC29" href="#SEC29">REVISION</a>
 </ul>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
 <br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
 <P>
 <P>
@@ -57,7 +59,8 @@ documentation. This document contains a quick-reference summary of the syntax.
 </P>
 </P>
 <br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
 <br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
 <P>
 <P>
-This table applies to ASCII and Unicode environments.
+This table applies to ASCII and Unicode environments. An unrecognized escape
+sequence causes an error.
 <pre>
 <pre>
   \a         alarm, that is, the BEL character (hex 07)
   \a         alarm, that is, the BEL character (hex 07)
   \cx        "control-x", where x is any ASCII printing character
   \cx        "control-x", where x is any ASCII printing character
@@ -69,25 +72,34 @@ This table applies to ASCII and Unicode environments.
   \0dd       character with octal code 0dd
   \0dd       character with octal code 0dd
   \ddd       character with octal code ddd, or backreference
   \ddd       character with octal code ddd, or backreference
   \o{ddd..}  character with octal code ddd..
   \o{ddd..}  character with octal code ddd..
-  \U         "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
-  \uhhhh     character with hex code hhhh (if PCRE2_ALT_BSUX is set)
+  \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
   \xhh       character with hex code hh
   \xhh       character with hex code hh
-  \x{hhh..}  character with hex code hhh..
+  \x{hh..}   character with hex code hh..
 </pre>
 </pre>
+If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
+following are also recognized:
+<pre>
+  \U         the character "U"
+  \uhhhh     character with hex code hhhh
+  \u{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
+</pre>
+When \x is not followed by {, from zero to two hexadecimal digits are read,
+but in ALT_BSUX mode \x must be followed by two hexadecimal digits to be
+recognized as a hexadecimal escape; otherwise it matches a literal "x".
+Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits
+or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
+matches a literal "u".
+</P>
+<P>
 Note that \0dd is always an octal code. The treatment of backslash followed by
 Note that \0dd is always an octal code. The treatment of backslash followed by
 a non-zero digit is complicated; for details see the section
 a non-zero digit is complicated; for details see the section
 <a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
 <a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
 in the
 in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation, where details of escape processing in EBCDIC environments are
 documentation, where details of escape processing in EBCDIC environments are
-also given.
-</P>
-<P>
-When \x is not followed by {, from zero to two hexadecimal digits are read,
-but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadecimal digits to
-be recognized as a hexadecimal escape; otherwise it matches a literal "x".
-Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits,
-it matches a literal "u".
+also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
+supported in EBCDIC environments. Note that \N not followed by an opening
+curly bracket has a different meaning (see below).
 </P>
 </P>
 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
 <P>
 <P>
@@ -188,6 +200,7 @@ at release 5.18.
 </P>
 </P>
 <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
 <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
 <P>
 <P>
+Adlam,
 Ahom,
 Ahom,
 Anatolian_Hieroglyphs,
 Anatolian_Hieroglyphs,
 Arabic,
 Arabic,
@@ -198,6 +211,7 @@ Bamum,
 Bassa_Vah,
 Bassa_Vah,
 Batak,
 Batak,
 Bengali,
 Bengali,
+Bhaiksuki,
 Bopomofo,
 Bopomofo,
 Brahmi,
 Brahmi,
 Braille,
 Braille,
@@ -209,16 +223,21 @@ Caucasian_Albanian,
 Chakma,
 Chakma,
 Cham,
 Cham,
 Cherokee,
 Cherokee,
+Chorasmian,
 Common,
 Common,
 Coptic,
 Coptic,
 Cuneiform,
 Cuneiform,
 Cypriot,
 Cypriot,
+Cypro_Minoan,
 Cyrillic,
 Cyrillic,
 Deseret,
 Deseret,
 Devanagari,
 Devanagari,
+Dives_Akuru,
+Dogra,
 Duployan,
 Duployan,
 Egyptian_Hieroglyphs,
 Egyptian_Hieroglyphs,
 Elbasan,
 Elbasan,
+Elymaic,
 Ethiopic,
 Ethiopic,
 Georgian,
 Georgian,
 Glagolitic,
 Glagolitic,
@@ -226,9 +245,11 @@ Gothic,
 Grantha,
 Grantha,
 Greek,
 Greek,
 Gujarati,
 Gujarati,
+Gunjala_Gondi,
 Gurmukhi,
 Gurmukhi,
 Han,
 Han,
 Hangul,
 Hangul,
+Hanifi_Rohingya,
 Hanunoo,
 Hanunoo,
 Hatran,
 Hatran,
 Hebrew,
 Hebrew,
@@ -243,6 +264,7 @@ Kannada,
 Katakana,
 Katakana,
 Kayah_Li,
 Kayah_Li,
 Kharoshthi,
 Kharoshthi,
+Khitan_Small_Script,
 Khmer,
 Khmer,
 Khojki,
 Khojki,
 Khudawadi,
 Khudawadi,
@@ -256,9 +278,13 @@ Lisu,
 Lycian,
 Lycian,
 Lydian,
 Lydian,
 Mahajani,
 Mahajani,
+Makasar,
 Malayalam,
 Malayalam,
 Mandaic,
 Mandaic,
 Manichaean,
 Manichaean,
+Marchen,
+Masaram_Gondi,
+Medefaidrin,
 Meetei_Mayek,
 Meetei_Mayek,
 Mende_Kikakui,
 Mende_Kikakui,
 Meroitic_Cursive,
 Meroitic_Cursive,
@@ -270,8 +296,12 @@ Mro,
 Multani,
 Multani,
 Myanmar,
 Myanmar,
 Nabataean,
 Nabataean,
+Nandinagari,
 New_Tai_Lue,
 New_Tai_Lue,
+Newa,
 Nko,
 Nko,
+Nushu,
+Nyakeng_Puachue_Hmong,
 Ogham,
 Ogham,
 Ol_Chiki,
 Ol_Chiki,
 Old_Hungarian,
 Old_Hungarian,
@@ -279,9 +309,12 @@ Old_Italic,
 Old_North_Arabian,
 Old_North_Arabian,
 Old_Permic,
 Old_Permic,
 Old_Persian,
 Old_Persian,
+Old_Sogdian,
 Old_South_Arabian,
 Old_South_Arabian,
 Old_Turkic,
 Old_Turkic,
+Old_Uyghur,
 Oriya,
 Oriya,
+Osage,
 Osmanya,
 Osmanya,
 Pahawh_Hmong,
 Pahawh_Hmong,
 Palmyrene,
 Palmyrene,
@@ -298,7 +331,9 @@ Shavian,
 Siddham,
 Siddham,
 SignWriting,
 SignWriting,
 Sinhala,
 Sinhala,
+Sogdian,
 Sora_Sompeng,
 Sora_Sompeng,
+Soyombo,
 Sundanese,
 Sundanese,
 Syloti_Nagri,
 Syloti_Nagri,
 Syriac,
 Syriac,
@@ -309,16 +344,23 @@ Tai_Tham,
 Tai_Viet,
 Tai_Viet,
 Takri,
 Takri,
 Tamil,
 Tamil,
+Tangsa,
+Tangut,
 Telugu,
 Telugu,
 Thaana,
 Thaana,
 Thai,
 Thai,
 Tibetan,
 Tibetan,
 Tifinagh,
 Tifinagh,
 Tirhuta,
 Tirhuta,
+Toto,
 Ugaritic,
 Ugaritic,
 Vai,
 Vai,
+Vithkuqi,
+Wancho,
 Warang_Citi,
 Warang_Citi,
-Yi.
+Yezidi,
+Yi,
+Zanabazar_Square.
 </P>
 </P>
 <br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
 <br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
 <P>
 <P>
@@ -387,11 +429,14 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
   \G          first matching position in subject
   \G          first matching position in subject
 </PRE>
 </PRE>
 </P>
 </P>
-<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br>
+<br><a name="SEC11" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
 <P>
 <P>
 <pre>
 <pre>
-  \K          reset start of match
+  \K          set reported start of match
 </pre>
 </pre>
+From release 10.38 \K is not permitted by default in lookaround assertions,
+for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
+option is set, the previous behaviour is re-enabled. When this option is set,
 \K is honoured in positive assertions, but ignored in negative ones.
 \K is honoured in positive assertions, but ignored in negative ones.
 </P>
 </P>
 <br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
 <br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
@@ -403,19 +448,23 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
 <br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
 <br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
 <P>
 <P>
 <pre>
 <pre>
-  (...)           capturing group
-  (?&#60;name&#62;...)    named capturing group (Perl)
-  (?'name'...)    named capturing group (Perl)
-  (?P&#60;name&#62;...)   named capturing group (Python)
-  (?:...)         non-capturing group
-  (?|...)         non-capturing group; reset group numbers for
-                   capturing groups in each alternative
-</PRE>
+  (...)           capture group
+  (?&#60;name&#62;...)    named capture group (Perl)
+  (?'name'...)    named capture group (Perl)
+  (?P&#60;name&#62;...)   named capture group (Python)
+  (?:...)         non-capture group
+  (?|...)         non-capture group; reset group numbers for
+                   capture groups in each alternative
+</pre>
+In non-UTF modes, names may contain underscores and ASCII letters and digits;
+in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
+both cases, a name must not start with a digit.
 </P>
 </P>
 <br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
 <br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
 <P>
 <P>
 <pre>
 <pre>
-  (?&#62;...)         atomic, non-capturing group
+  (?&#62;...)         atomic non-capture group
+  (*atomic:...)   atomic non-capture group
 </PRE>
 </PRE>
 </P>
 </P>
 <br><a name="SEC15" href="#TOC1">COMMENT</a><br>
 <br><a name="SEC15" href="#TOC1">COMMENT</a><br>
@@ -426,9 +475,11 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
 </P>
 </P>
 <br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
 <br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
 <P>
 <P>
+Changes of these options within a group are automatically cancelled at the end
+of the group.
 <pre>
 <pre>
   (?i)            caseless
   (?i)            caseless
-  (?J)            allow duplicate names
+  (?J)            allow duplicate named groups
   (?m)            multiline
   (?m)            multiline
   (?n)            no auto capture
   (?n)            no auto capture
   (?s)            single line (dotall)
   (?s)            single line (dotall)
@@ -436,13 +487,21 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
   (?x)            extended: ignore white space except in classes
   (?x)            extended: ignore white space except in classes
   (?xx)           as (?x) but also ignore space and tab in classes
   (?xx)           as (?x) but also ignore space and tab in classes
   (?-...)         unset option(s)
   (?-...)         unset option(s)
+  (?^)            unset imnsx options
 </pre>
 </pre>
+Unsetting x or xx unsets both. Several options may be set at once, and a
+mixture of setting and unsetting such as (?i-x) is allowed, but there may be
+only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
+(?^in). An option setting may appear at the start of a non-capture group, for
+example (?i:...).
+</P>
+<P>
 The following are recognized only at the very start of a pattern or after one
 The following are recognized only at the very start of a pattern or after one
 of the newline or \R options with similar syntax. More than one of them may
 of the newline or \R options with similar syntax. More than one of them may
 appear. For the first three, d is a decimal number.
 appear. For the first three, d is a decimal number.
 <pre>
 <pre>
   (*LIMIT_DEPTH=d) set the backtracking limit to d
   (*LIMIT_DEPTH=d) set the backtracking limit to d
-  (*LIMIT_HEAP=d)  set the heap size limit to d kilobytes
+  (*LIMIT_HEAP=d)  set the heap size limit to d * 1024 bytes
   (*LIMIT_MATCH=d) set the match limit to d
   (*LIMIT_MATCH=d) set the match limit to d
   (*NOTEMPTY)      set PCRE2_NOTEMPTY when matching
   (*NOTEMPTY)      set PCRE2_NOTEMPTY when matching
   (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
   (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
@@ -484,14 +543,48 @@ setting with a similar syntax.
 <br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
 <br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
 <P>
 <P>
 <pre>
 <pre>
-  (?=...)         positive look ahead
-  (?!...)         negative look ahead
-  (?&#60;=...)        positive look behind
-  (?&#60;!...)        negative look behind
+  (?=...)                     )
+  (*pla:...)                  ) positive lookahead
+  (*positive_lookahead:...)   )
+
+  (?!...)                     )
+  (*nla:...)                  ) negative lookahead
+  (*negative_lookahead:...)   )
+
+  (?&#60;=...)                    )
+  (*plb:...)                  ) positive lookbehind
+  (*positive_lookbehind:...)  )
+
+  (?&#60;!...)                    )
+  (*nlb:...)                  ) negative lookbehind
+  (*negative_lookbehind:...)  )
 </pre>
 </pre>
-Each top-level branch of a look behind must be of a fixed length.
+Each top-level branch of a lookbehind must be of a fixed length.
 </P>
 </P>
-<br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br>
+<br><a name="SEC20" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
+<P>
+These assertions are specific to PCRE2 and are not Perl-compatible.
+<pre>
+  (?*...)                                )
+  (*napla:...)                           ) synonyms
+  (*non_atomic_positive_lookahead:...)   )
+
+  (?&#60;*...)                               )
+  (*naplb:...)                           ) synonyms
+  (*non_atomic_positive_lookbehind:...)  )
+</PRE>
+</P>
+<br><a name="SEC21" href="#TOC1">SCRIPT RUNS</a><br>
+<P>
+<pre>
+  (*script_run:...)           ) script run, can be backtracked into
+  (*sr:...)                   )
+
+  (*atomic_script_run:...)    ) atomic script run
+  (*asr:...)                  )
+</PRE>
+</P>
+<br><a name="SEC22" href="#TOC1">BACKREFERENCES</a><br>
 <P>
 <P>
 <pre>
 <pre>
   \n              reference by number (can be ambiguous)
   \n              reference by number (can be ambiguous)
@@ -508,26 +601,26 @@ Each top-level branch of a look behind must be of a fixed length.
   (?P=name)       reference by name (Python)
   (?P=name)       reference by name (Python)
 </PRE>
 </PRE>
 </P>
 </P>
-<br><a name="SEC21" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<br><a name="SEC23" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
 <P>
 <P>
 <pre>
 <pre>
   (?R)            recurse whole pattern
   (?R)            recurse whole pattern
-  (?n)            call subpattern by absolute number
-  (?+n)           call subpattern by relative number
-  (?-n)           call subpattern by relative number
-  (?&name)        call subpattern by name (Perl)
-  (?P&#62;name)       call subpattern by name (Python)
-  \g&#60;name&#62;        call subpattern by name (Oniguruma)
-  \g'name'        call subpattern by name (Oniguruma)
-  \g&#60;n&#62;           call subpattern by absolute number (Oniguruma)
-  \g'n'           call subpattern by absolute number (Oniguruma)
-  \g&#60;+n&#62;          call subpattern by relative number (PCRE2 extension)
-  \g'+n'          call subpattern by relative number (PCRE2 extension)
-  \g&#60;-n&#62;          call subpattern by relative number (PCRE2 extension)
-  \g'-n'          call subpattern by relative number (PCRE2 extension)
+  (?n)            call subroutine by absolute number
+  (?+n)           call subroutine by relative number
+  (?-n)           call subroutine by relative number
+  (?&name)        call subroutine by name (Perl)
+  (?P&#62;name)       call subroutine by name (Python)
+  \g&#60;name&#62;        call subroutine by name (Oniguruma)
+  \g'name'        call subroutine by name (Oniguruma)
+  \g&#60;n&#62;           call subroutine by absolute number (Oniguruma)
+  \g'n'           call subroutine by absolute number (Oniguruma)
+  \g&#60;+n&#62;          call subroutine by relative number (PCRE2 extension)
+  \g'+n'          call subroutine by relative number (PCRE2 extension)
+  \g&#60;-n&#62;          call subroutine by relative number (PCRE2 extension)
+  \g'-n'          call subroutine by relative number (PCRE2 extension)
 </PRE>
 </PRE>
 </P>
 </P>
-<br><a name="SEC22" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<br><a name="SEC24" href="#TOC1">CONDITIONAL PATTERNS</a><br>
 <P>
 <P>
 <pre>
 <pre>
   (?(condition)yes-pattern)
   (?(condition)yes-pattern)
@@ -542,7 +635,7 @@ Each top-level branch of a look behind must be of a fixed length.
   (?(R)               overall recursion condition
   (?(R)               overall recursion condition
   (?(Rn)              specific numbered group recursion condition
   (?(Rn)              specific numbered group recursion condition
   (?(R&name)          specific named group recursion condition
   (?(R&name)          specific named group recursion condition
-  (?(DEFINE)          define subpattern for reference
+  (?(DEFINE)          define groups for reference
   (?(VERSION[&#62;]=n.m)  test PCRE2 version
   (?(VERSION[&#62;]=n.m)  test PCRE2 version
   (?(assert)          assertion condition
   (?(assert)          assertion condition
 </pre>
 </pre>
@@ -550,9 +643,13 @@ Note the ambiguity of (?(R) and (?(Rn) which might be named reference
 conditions or recursion tests. Such a condition is interpreted as a reference
 conditions or recursion tests. Such a condition is interpreted as a reference
 condition if the relevant named group exists.
 condition if the relevant named group exists.
 </P>
 </P>
-<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
 <P>
-The following act immediately they are reached:
+All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
+name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
+if :NAME is present. The others just set a name for passing back to the caller,
+but this is not a name that (*SKIP) can see. The following act immediately they
+are reached:
 <pre>
 <pre>
   (*ACCEPT)       force successful match
   (*ACCEPT)       force successful match
   (*FAIL)         force backtrack; synonym (*F)
   (*FAIL)         force backtrack; synonym (*F)
@@ -565,15 +662,15 @@ pattern is not anchored.
 <pre>
 <pre>
   (*COMMIT)       overall failure, no advance of starting point
   (*COMMIT)       overall failure, no advance of starting point
   (*PRUNE)        advance to next starting character
   (*PRUNE)        advance to next starting character
-  (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
   (*SKIP)         advance to current matching position
   (*SKIP)         advance to current matching position
   (*SKIP:NAME)    advance to position corresponding to an earlier
   (*SKIP:NAME)    advance to position corresponding to an earlier
                   (*MARK:NAME); if not found, the (*SKIP) is ignored
                   (*MARK:NAME); if not found, the (*SKIP) is ignored
   (*THEN)         local failure, backtrack to next alternation
   (*THEN)         local failure, backtrack to next alternation
-  (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
-</PRE>
+</pre>
+The effect of one of these verbs in a group called as a subroutine is confined
+to the subroutine call.
 </P>
 </P>
-<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
 <P>
 <P>
 <pre>
 <pre>
   (?C)            callout (assumed number 0)
   (?C)            callout (assumed number 0)
@@ -584,25 +681,25 @@ The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
 start and the end), and the starting delimiter { matched with the ending
 start and the end), and the starting delimiter { matched with the ending
 delimiter }. To encode the ending delimiter within the string, double it.
 delimiter }. To encode the ending delimiter within the string, double it.
 </P>
 </P>
-<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC27" href="#TOC1">SEE ALSO</a><br>
 <P>
 <P>
 <b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
 <b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
 <b>pcre2matching</b>(3), <b>pcre2</b>(3).
 <b>pcre2matching</b>(3), <b>pcre2</b>(3).
 </P>
 </P>
-<br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC28" href="#TOC1">AUTHOR</a><br>
 <P>
 <P>
 Philip Hazel
 Philip Hazel
 <br>
 <br>
-University Computing Service
+Retired from University Computing Service
 <br>
 <br>
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
 </P>
 </P>
-<br><a name="SEC27" href="#TOC1">REVISION</a><br>
+<br><a name="SEC29" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 17 June 2017
+Last updated: 30 August 2021
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 266 - 126
regex.mod/pcre/doc/html/pcre2test.html

@@ -59,12 +59,7 @@ patterns, and the subject lines specify PCRE2 function options, control how the
 subject is processed, and what output is produced.
 subject is processed, and what output is produced.
 </P>
 </P>
 <P>
 <P>
-As the original fairly simple PCRE library evolved, it acquired many different
-features, and as a result, the original <b>pcretest</b> program ended up with a
-lot of options in a messy, arcane syntax for testing all the features. The
-move to the new PCRE2 API provided an opportunity to re-implement the test
-program as <b>pcre2test</b>, with a cleaner modifier syntax. Nevertheless, there
-are still many obscure modifiers, some of which are specifically designed for
+There are many obscure modifiers, some of which are specifically designed for
 use in conjunction with the test script and data files that are distributed as
 use in conjunction with the test script and data files that are distributed as
 part of PCRE2. All the modifiers are documented here, some without much
 part of PCRE2. All the modifiers are documented here, some without much
 justification, but many of them are unlikely to be of use except when testing
 justification, but many of them are unlikely to be of use except when testing
@@ -89,10 +84,10 @@ names used in the libraries have a suffix _8, _16, or _32, as appropriate.
 <br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
 <br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
 <P>
 <P>
 Input to <b>pcre2test</b> is processed line by line, either by calling the C
 Input to <b>pcre2test</b> is processed line by line, either by calling the C
-library's <b>fgets()</b> function, or via the <b>libreadline</b> library. In some
-Windows environments character 26 (hex 1A) causes an immediate end of file, and
-no further data is read, so this character should be avoided unless you really
-want that action.
+library's <b>fgets()</b> function, or via the <b>libreadline</b> or <b>libedit</b>
+library. In some Windows environments character 26 (hex 1A) causes an immediate
+end of file, and no further data is read, so this character should be avoided
+unless you really want that action.
 </P>
 </P>
 <P>
 <P>
 The input is processed using using C's string functions, so must not
 The input is processed using using C's string functions, so must not
@@ -129,7 +124,7 @@ to occur).
 UTF-8 (in its original definition) is not capable of encoding values greater
 UTF-8 (in its original definition) is not capable of encoding values greater
 than 0x7fffffff, but such values can be handled by the 32-bit library. When
 than 0x7fffffff, but such values can be handled by the 32-bit library. When
 testing this library in non-UTF mode with <b>utf8_input</b> set, if any
 testing this library in non-UTF mode with <b>utf8_input</b> set, if any
-character is preceded by the byte 0xff (which is an illegal byte in UTF-8)
+character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
 0x80000000 is added to the character's value. This is the only way of passing
 0x80000000 is added to the character's value. This is the only way of passing
 such code points in a pattern string. For subject strings, using an escape
 such code points in a pattern string. For subject strings, using an escape
 sequence is preferable.
 sequence is preferable.
@@ -242,10 +237,17 @@ Behave as if each pattern line has the <b>jit</b> modifier; after successful
 compilation, each pattern is passed to the just-in-time compiler, if available.
 compilation, each pattern is passed to the just-in-time compiler, if available.
 </P>
 </P>
 <P>
 <P>
+<b>-jitfast</b>
+Behave as if each pattern line has the <b>jitfast</b> modifier; after
+successful compilation, each pattern is passed to the just-in-time compiler, if
+available, and each subject line is passed directly to the JIT matcher via its
+"fast path".
+</P>
+<P>
 <b>-jitverify</b>
 <b>-jitverify</b>
 Behave as if each pattern line has the <b>jitverify</b> modifier; after
 Behave as if each pattern line has the <b>jitverify</b> modifier; after
 successful compilation, each pattern is passed to the just-in-time compiler, if
 successful compilation, each pattern is passed to the just-in-time compiler, if
-available, and the use of JIT is verified.
+available, and the use of JIT for matching is verified.
 </P>
 </P>
 <P>
 <P>
 <b>-LM</b>
 <b>-LM</b>
@@ -254,7 +256,7 @@ standard output, then exit with zero exit code. All other options are ignored.
 If both -C and -LM are present, whichever is first is recognized.
 If both -C and -LM are present, whichever is first is recognized.
 </P>
 </P>
 <P>
 <P>
-\fB-pattern\fB <i>modifier-list</i>
+<b>-pattern</b> <i>modifier-list</i>
 Behave as if each pattern line contains the given modifiers.
 Behave as if each pattern line contains the given modifiers.
 </P>
 </P>
 <P>
 <P>
@@ -264,7 +266,7 @@ Do not output the version number of <b>pcre2test</b> at the start of execution.
 <P>
 <P>
 <b>-S</b> <i>size</i>
 <b>-S</b> <i>size</i>
 On Unix-like systems, set the size of the run-time stack to <i>size</i>
 On Unix-like systems, set the size of the run-time stack to <i>size</i>
-megabytes.
+mebibytes (units of 1024*1024 bytes).
 </P>
 </P>
 <P>
 <P>
 <b>-subject</b> <i>modifier-list</i>
 <b>-subject</b> <i>modifier-list</i>
@@ -315,7 +317,8 @@ number of subject lines to be matched against that pattern. In between sets of
 test data, command lines that begin with # may appear. This file format, with
 test data, command lines that begin with # may appear. This file format, with
 some restrictions, can also be processed by the <b>perltest.sh</b> script that
 some restrictions, can also be processed by the <b>perltest.sh</b> script that
 is distributed with PCRE2 as a means of checking that the behaviour of PCRE2
 is distributed with PCRE2 as a means of checking that the behaviour of PCRE2
-and Perl is the same.
+and Perl is the same. For a specification of <b>perltest.sh</b>, see the
+comments near its beginning. See also the #perltest command below.
 </P>
 </P>
 <P>
 <P>
 When the input is a terminal, <b>pcre2test</b> prompts for each line of input,
 When the input is a terminal, <b>pcre2test</b> prompts for each line of input,
@@ -367,6 +370,12 @@ output.
 This command is used to load a set of precompiled patterns from a file, as
 This command is used to load a set of precompiled patterns from a file, as
 described in the section entitled "Saving and restoring compiled patterns"
 described in the section entitled "Saving and restoring compiled patterns"
 <a href="#saverestore">below.</a>
 <a href="#saverestore">below.</a>
+<pre>
+  #loadtables &#60;filename&#62;
+</pre>
+This command is used to load a set of binary character tables that can be
+accessed by the tables=3 qualifier. Such tables can be created by the
+<b>pcre2_dftables</b> program with the -b option.
 <pre>
 <pre>
   #newline_default [&#60;newline-list&#62;]
   #newline_default [&#60;newline-list&#62;]
 </pre>
 </pre>
@@ -406,13 +415,20 @@ patterns. Modifiers on a pattern can change these settings.
 <pre>
 <pre>
   #perltest
   #perltest
 </pre>
 </pre>
-The appearance of this line causes all subsequent modifier settings to be
-checked for compatibility with the <b>perltest.sh</b> script, which is used to
-confirm that Perl gives the same results as PCRE2. Also, apart from comment
-lines, none of the other command lines are permitted, because they and many
-of the modifiers are specific to <b>pcre2test</b>, and should not be used in
-test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
-command helps detect tests that are accidentally put in the wrong file.
+This line is used in test files that can also be processed by <b>perltest.sh</b>
+to confirm that Perl gives the same results as PCRE2. Subsequent tests are
+checked for the use of <b>pcre2test</b> features that are incompatible with the
+<b>perltest.sh</b> script.
+</P>
+<P>
+Patterns must use '/' as their delimiter, and only certain modifiers are
+supported. Comment lines, #pattern commands, and #subject commands that set or
+unset "mark" are recognized and acted on. The #perltest, #forbid_utf, and
+#newline_default commands, which are needed in the relevant pcre2test files,
+are silently ignored. All other command lines are ignored, but give a warning
+message. The <b>#perltest</b> command helps detect tests that are accidentally
+put in the wrong file or use the wrong delimiter. For more details of the
+<b>perltest.sh</b> script see the comments it contains.
 <pre>
 <pre>
   #pop [&#60;modifiers&#62;]
   #pop [&#60;modifiers&#62;]
   #popcopy [&#60;modifiers&#62;]
   #popcopy [&#60;modifiers&#62;]
@@ -465,15 +481,17 @@ excluding pattern meta-characters):
 </pre>
 </pre>
 This is interpreted as the pattern's delimiter. A regular expression may be
 This is interpreted as the pattern's delimiter. A regular expression may be
 continued over several input lines, in which case the newline characters are
 continued over several input lines, in which case the newline characters are
-included within it. It is possible to include the delimiter within the pattern
-by escaping it with a backslash, for example
+included within it. It is possible to include the delimiter as a literal within
+the pattern by escaping it with a backslash, for example
 <pre>
 <pre>
   /abc\/def/
   /abc\/def/
 </pre>
 </pre>
 If you do this, the escape and the delimiter form part of the pattern, but
 If you do this, the escape and the delimiter form part of the pattern, but
-since the delimiters are all non-alphanumeric, this does not affect its
-interpretation. If the terminating delimiter is immediately followed by a
-backslash, for example,
+since the delimiters are all non-alphanumeric, the inclusion of the backslash
+does not affect the pattern's interpretation. Note, however, that this trick
+does not work within \Q...\E literal bracketing because the backslash will
+itself be interpreted as a literal. If the terminating delimiter is immediately
+followed by a backslash, for example,
 <pre>
 <pre>
   /abc/\
   /abc/\
 </pre>
 </pre>
@@ -491,11 +509,11 @@ A pattern can be followed by a modifier list (details below).
 </P>
 </P>
 <br><a name="SEC9" href="#TOC1">SUBJECT LINE SYNTAX</a><br>
 <br><a name="SEC9" href="#TOC1">SUBJECT LINE SYNTAX</a><br>
 <P>
 <P>
-Before each subject line is passed to <b>pcre2_match()</b> or
-<b>pcre2_dfa_match()</b>, leading and trailing white space is removed, and the
-line is scanned for backslash escapes, unless the <b>subject_literal</b>
-modifier was set for the pattern. The following provide a means of encoding
-non-printing characters in a visible way:
+Before each subject line is passed to <b>pcre2_match()</b>,
+<b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>, leading and trailing white
+space is removed, and the line is scanned for backslash escapes, unless the
+<b>subject_literal</b> modifier was set for the pattern. The following provide a
+means of encoding non-printing characters in a visible way:
 <pre>
 <pre>
   \a         alarm (BEL, \x07)
   \a         alarm (BEL, \x07)
   \b         backspace (\x08)
   \b         backspace (\x08)
@@ -592,6 +610,7 @@ way <b>pcre2_compile()</b> behaves. See
 for a description of the effects of these options.
 for a description of the effects of these options.
 <pre>
 <pre>
       allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
       allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
+      allow_lookaround_bsk      set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
       allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
       allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
       alt_bsux                  set PCRE2_ALT_BSUX
       alt_bsux                  set PCRE2_ALT_BSUX
       alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
       alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
@@ -604,11 +623,14 @@ for a description of the effects of these options.
   /s  dotall                    set PCRE2_DOTALL
   /s  dotall                    set PCRE2_DOTALL
       dupnames                  set PCRE2_DUPNAMES
       dupnames                  set PCRE2_DUPNAMES
       endanchored               set PCRE2_ENDANCHORED
       endanchored               set PCRE2_ENDANCHORED
+      escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF
   /x  extended                  set PCRE2_EXTENDED
   /x  extended                  set PCRE2_EXTENDED
   /xx extended_more             set PCRE2_EXTENDED_MORE
   /xx extended_more             set PCRE2_EXTENDED_MORE
+      extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX
       firstline                 set PCRE2_FIRSTLINE
       firstline                 set PCRE2_FIRSTLINE
       literal                   set PCRE2_LITERAL
       literal                   set PCRE2_LITERAL
       match_line                set PCRE2_EXTRA_MATCH_LINE
       match_line                set PCRE2_EXTRA_MATCH_LINE
+      match_invalid_utf         set PCRE2_MATCH_INVALID_UTF
       match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
       match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
       match_word                set PCRE2_EXTRA_MATCH_WORD
       match_word                set PCRE2_EXTRA_MATCH_WORD
   /m  multiline                 set PCRE2_MULTILINE
   /m  multiline                 set PCRE2_MULTILINE
@@ -667,7 +689,7 @@ heavily used in the test files.
       pushcopy                  push a copy onto the stack
       pushcopy                  push a copy onto the stack
       stackguard=&#60;number&#62;       test the stackguard feature
       stackguard=&#60;number&#62;       test the stackguard feature
       subject_literal           treat all subject lines as literal
       subject_literal           treat all subject lines as literal
-      tables=[0|1|2]            select internal tables
+      tables=[0|1|2|3]          select internal tables
       use_length                do not zero-terminate the pattern
       use_length                do not zero-terminate the pattern
       utf8_input                treat input as UTF-8
       utf8_input                treat input as UTF-8
 </pre>
 </pre>
@@ -679,8 +701,8 @@ Newline and \R handling
 <P>
 <P>
 The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
 The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
 set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
 set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
-\R matches any Unicode newline sequence. The default is specified when PCRE2
-is built, with the default default being Unicode.
+\R matches any Unicode newline sequence. The default can be specified when
+PCRE2 is built; if it is not, the default is set to Unicode.
 </P>
 </P>
 <P>
 <P>
 The <b>newline</b> modifier specifies which characters are to be interpreted as
 The <b>newline</b> modifier specifies which characters are to be interpreted as
@@ -713,14 +735,14 @@ information is obtained from the <b>pcre2_pattern_info()</b> function. Here are
 some typical examples:
 some typical examples:
 <pre>
 <pre>
     re&#62; /(?i)(^a|^b)/m,info
     re&#62; /(?i)(^a|^b)/m,info
-  Capturing subpattern count = 1
+  Capture group count = 1
   Compile options: multiline
   Compile options: multiline
   Overall options: caseless multiline
   Overall options: caseless multiline
   First code unit at start or follows newline
   First code unit at start or follows newline
   Subject length lower bound = 1
   Subject length lower bound = 1
 
 
     re&#62; /(?i)abc/info
     re&#62; /(?i)abc/info
-  Capturing subpattern count = 0
+  Capture group count = 0
   Compile options: &#60;none&#62;
   Compile options: &#60;none&#62;
   Overall options: caseless
   Overall options: caseless
   First code unit = 'a' (caseless)
   First code unit = 'a' (caseless)
@@ -734,7 +756,9 @@ options, the line is omitted. "First code unit" is where any match must start;
 if there is more than one they are listed as "starting code units". "Last code
 if there is more than one they are listed as "starting code units". "Last code
 unit" is the last literal code unit that must be present in any match. This is
 unit" is the last literal code unit that must be present in any match. This is
 not necessarily the last character. These lines are omitted if no starting or
 not necessarily the last character. These lines are omitted if no starting or
-ending code units are recorded.
+ending code units are recorded. The subject length line is omitted when
+<b>no_start_optimize</b> is set because the minimum length is not calculated
+when it can never be used.
 </P>
 </P>
 <P>
 <P>
 The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
 The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
@@ -1013,18 +1037,20 @@ Using alternative character tables
 </b><br>
 </b><br>
 <P>
 <P>
 The value specified for the <b>tables</b> modifier must be one of the digits 0,
 The value specified for the <b>tables</b> modifier must be one of the digits 0,
-1, or 2. It causes a specific set of built-in character tables to be passed to
-<b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour with
-different character tables. The digit specifies the tables as follows:
+1, 2, or 3. It causes a specific set of built-in character tables to be passed
+to <b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour
+with different character tables. The digit specifies the tables as follows:
 <pre>
 <pre>
   0   do not pass any special character tables
   0   do not pass any special character tables
   1   the default ASCII tables, as distributed in
   1   the default ASCII tables, as distributed in
         pcre2_chartables.c.dist
         pcre2_chartables.c.dist
   2   a set of tables defining ISO 8859 characters
   2   a set of tables defining ISO 8859 characters
+  3   a set of tables loaded by the #loadtables command
 </pre>
 </pre>
-In table 2, some characters whose codes are greater than 128 are identified as
-letters, digits, spaces, etc. Setting alternate character tables and a locale
-are mutually exclusive.
+In tables 2, some characters whose codes are greater than 128 are identified as
+letters, digits, spaces, etc. Tables 3 can be used only after a
+<b>#loadtables</b> command has loaded them from a binary file. Setting alternate
+character tables and a locale are mutually exclusive.
 </P>
 </P>
 <br><b>
 <br><b>
 Setting certain match controls
 Setting certain match controls
@@ -1036,20 +1062,27 @@ modifier list, in which case they are applied to every subject line that is
 processed with that pattern. These modifiers do not affect the compilation
 processed with that pattern. These modifiers do not affect the compilation
 process.
 process.
 <pre>
 <pre>
-      aftertext                  show text after match
-      allaftertext               show text after captures
-      allcaptures                show all captures
-      allusedtext                show all consulted text
-      altglobal                  alternative global matching
-  /g  global                     global matching
-      jitstack=&#60;n&#62;               set size of JIT stack
-      mark                       show mark values
-      replace=&#60;string&#62;           specify a replacement string
-      startchar                  show starting character when relevant
-      substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
-      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
-      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
-      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
+      aftertext                   show text after match
+      allaftertext                show text after captures
+      allcaptures                 show all captures
+      allvector                   show the entire ovector
+      allusedtext                 show all consulted text
+      altglobal                   alternative global matching
+  /g  global                      global matching
+      jitstack=&#60;n&#62;                set size of JIT stack
+      mark                        show mark values
+      replace=&#60;string&#62;            specify a replacement string
+      startchar                   show starting character when relevant
+      substitute_callout          use substitution callouts
+      substitute_extended         use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_literal          use PCRE2_SUBSTITUTE_LITERAL
+      substitute_matched          use PCRE2_SUBSTITUTE_MATCHED
+      substitute_overflow_length  use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
+      substitute_skip=&#60;n&#62;         skip substitution &#60;n&#62;
+      substitute_stop=&#60;n&#62;         skip substitution &#60;n&#62; and following
+      substitute_unknown_unset    use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+      substitute_unset_empty      use PCRE2_SUBSTITUTE_UNSET_EMPTY
 </pre>
 </pre>
 These modifiers may not appear in a <b>#pattern</b> command. If you want them as
 These modifiers may not appear in a <b>#pattern</b> command. If you want them as
 defaults, set them in a <b>#subject</b> command.
 defaults, set them in a <b>#subject</b> command.
@@ -1178,11 +1211,12 @@ Setting match controls
 The following modifiers affect the matching process or request additional
 The following modifiers affect the matching process or request additional
 information. Some of them may also be specified on a pattern line (see above),
 information. Some of them may also be specified on a pattern line (see above),
 in which case they apply to every subject line that is matched against that
 in which case they apply to every subject line that is matched against that
-pattern.
+pattern, but can be overridden by modifiers on the subject.
 <pre>
 <pre>
       aftertext                  show text after match
       aftertext                  show text after match
       allaftertext               show text after captures
       allaftertext               show text after captures
       allcaptures                show all captures
       allcaptures                show all captures
+      allvector                  show the entire ovector
       allusedtext                show all consulted text (non-JIT only)
       allusedtext                show all consulted text (non-JIT only)
       altglobal                  alternative global matching
       altglobal                  alternative global matching
       callout_capture            show captures at callout time
       callout_capture            show captures at callout time
@@ -1199,7 +1233,7 @@ pattern.
       get=&#60;number or name&#62;       extract captured substring
       get=&#60;number or name&#62;       extract captured substring
       getall                     extract all captured substrings
       getall                     extract all captured substrings
   /g  global                     global matching
   /g  global                     global matching
-      heap_limit=&#60;n&#62;             set a limit on heap memory
+      heap_limit=&#60;n&#62;             set a limit on heap memory (Kbytes)
       jitstack=&#60;n&#62;               set size of JIT stack
       jitstack=&#60;n&#62;               set size of JIT stack
       mark                       show mark values
       mark                       show mark values
       match_limit=&#60;n&#62;            set a match limit
       match_limit=&#60;n&#62;            set a match limit
@@ -1212,8 +1246,14 @@ pattern.
       replace=&#60;string&#62;           specify a replacement string
       replace=&#60;string&#62;           specify a replacement string
       startchar                  show startchar when relevant
       startchar                  show startchar when relevant
       startoffset=&#60;n&#62;            same as offset=&#60;n&#62;
       startoffset=&#60;n&#62;            same as offset=&#60;n&#62;
+      substitute_callout         use substitution callouts
       substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
       substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_literal         use PCRE2_SUBSTITUTE_LITERAL
+      substitute_matched         use PCRE2_SUBSTITUTE_MATCHED
       substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
       substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
+      substitute_skip=&#60;n&#62;        skip substitution number n
+      substitute_stop=&#60;n&#62;        skip substitution number n and greater
       substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
       substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
       substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
       substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
       zero_terminate             pass the subject as zero-terminated
       zero_terminate             pass the subject as zero-terminated
@@ -1237,22 +1277,27 @@ following line with a plus character following the capture number.
 </P>
 </P>
 <P>
 <P>
 The <b>allusedtext</b> modifier requests that all the text that was consulted
 The <b>allusedtext</b> modifier requests that all the text that was consulted
-during a successful pattern match by the interpreter should be shown. This
-feature is not supported for JIT matching, and if requested with JIT it is
-ignored (with a warning message). Setting this modifier affects the output if
-there is a lookbehind at the start of a match, or a lookahead at the end, or if
-\K is used in the pattern. Characters that precede or follow the start and end
-of the actual match are indicated in the output by '&#60;' or '&#62;' characters
-underneath them. Here is an example:
+during a successful pattern match by the interpreter should be shown, for both
+full and partial matches. This feature is not supported for JIT matching, and
+if requested with JIT it is ignored (with a warning message). Setting this
+modifier affects the output if there is a lookbehind at the start of a match,
+or, for a complete match, a lookahead at the end, or if \K is used in the
+pattern. Characters that precede or follow the start and end of the actual
+match are indicated in the output by '&#60;' or '&#62;' characters underneath them.
+Here is an example:
 <pre>
 <pre>
     re&#62; /(?&#60;=pqr)abc(?=xyz)/
     re&#62; /(?&#60;=pqr)abc(?=xyz)/
   data&#62; 123pqrabcxyz456\=allusedtext
   data&#62; 123pqrabcxyz456\=allusedtext
    0: pqrabcxyz
    0: pqrabcxyz
       &#60;&#60;&#60;   &#62;&#62;&#62;
       &#60;&#60;&#60;   &#62;&#62;&#62;
+  data&#62; 123pqrabcxy\=ph,allusedtext
+  Partial match: pqrabcxy
+                 &#60;&#60;&#60;
 </pre>
 </pre>
-This shows that the matched string is "abc", with the preceding and following
-strings "pqr" and "xyz" having been consulted during the match (when processing
-the assertions).
+The first, complete match shows that the matched string is "abc", with the
+preceding and following strings "pqr" and "xyz" having been consulted during
+the match (when processing the assertions). The partial match can indicate only
+the preceding string.
 </P>
 </P>
 <P>
 <P>
 The <b>startchar</b> modifier requests that the starting character for the match
 The <b>startchar</b> modifier requests that the starting character for the match
@@ -1279,10 +1324,28 @@ captured parentheses be output after a match. By default, only those up to the
 highest one actually used in the match are output (corresponding to the return
 highest one actually used in the match are output (corresponding to the return
 code from <b>pcre2_match()</b>). Groups that did not take part in the match
 code from <b>pcre2_match()</b>). Groups that did not take part in the match
 are output as "&#60;unset&#62;". This modifier is not relevant for DFA matching (which
 are output as "&#60;unset&#62;". This modifier is not relevant for DFA matching (which
-does no capturing); it is ignored, with a warning message, if present.
+does no capturing) and does not apply when <b>replace</b> is specified; it is
+ignored, with a warning message, if present.
 </P>
 </P>
 <br><b>
 <br><b>
-Testing callouts
+Showing the entire ovector, for all outcomes
+</b><br>
+<P>
+The <b>allvector</b> modifier requests that the entire ovector be shown,
+whatever the outcome of the match. Compare <b>allcaptures</b>, which shows only
+up to the maximum number of capture groups for the pattern, and then only for a
+successful complete non-DFA match. This modifier, which acts after any match
+result, and also for DFA matching, provides a means of checking that there are
+no unexpected modifications to ovector fields. Before each match attempt, the
+ovector is filled with a special value, and if this is found in both elements
+of a capturing pair, "&#60;unchanged&#62;" is output. After a successful match, this
+applies to all groups after the maximum capture group for the pattern. In other
+cases it applies to the entire ovector. After a partial match, the first two
+elements are the only ones that should be set. After a DFA match, the amount of
+ovector that is used depends on the number of matches that were found.
+</P>
+<br><b>
+Testing pattern callouts
 </b><br>
 </b><br>
 <P>
 <P>
 A callout function is supplied when <b>pcre2test</b> calls the library matching
 A callout function is supplied when <b>pcre2test</b> calls the library matching
@@ -1290,6 +1353,9 @@ functions, unless <b>callout_none</b> is specified. Its behaviour can be
 controlled by various modifiers listed above whose names begin with
 controlled by various modifiers listed above whose names begin with
 <b>callout_</b>. Details are given in the section entitled "Callouts"
 <b>callout_</b>. Details are given in the section entitled "Callouts"
 <a href="#callouts">below.</a>
 <a href="#callouts">below.</a>
+Testing callouts from <b>pcre2_substitute()</b> is decribed separately in
+"Testing the substitution function"
+<a href="#substitution">below.</a>
 </P>
 </P>
 <br><b>
 <br><b>
 Finding all matches in a string
 Finding all matches in a string
@@ -1321,8 +1387,8 @@ Testing substring extraction functions
 <P>
 <P>
 The <b>copy</b> and <b>get</b> modifiers can be used to test the
 The <b>copy</b> and <b>get</b> modifiers can be used to test the
 <b>pcre2_substring_copy_xxx()</b> and <b>pcre2_substring_get_xxx()</b> functions.
 <b>pcre2_substring_copy_xxx()</b> and <b>pcre2_substring_get_xxx()</b> functions.
-They can be given more than once, and each can specify a group name or number,
-for example:
+They can be given more than once, and each can specify a capture group name or
+number, for example:
 <pre>
 <pre>
    abcd\=copy=1,copy=3,get=G1
    abcd\=copy=1,copy=3,get=G1
 </pre>
 </pre>
@@ -1341,15 +1407,21 @@ instead of a colon. This is in addition to the normal full list. The string
 length (that is, the return from the extraction function) is given in
 length (that is, the return from the extraction function) is given in
 parentheses after each substring, followed by the name when the extraction was
 parentheses after each substring, followed by the name when the extraction was
 by name.
 by name.
-</P>
+<a name="substitution"></a></P>
 <br><b>
 <br><b>
 Testing the substitution function
 Testing the substitution function
 </b><br>
 </b><br>
 <P>
 <P>
 If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
 If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
-called instead of one of the matching functions. Note that replacement strings
-cannot contain commas, because a comma signifies the end of a modifier. This is
-not thought to be an issue in a test program.
+called instead of one of the matching functions (or after one call of
+<b>pcre2_match()</b> in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
+replacement strings cannot contain commas, because a comma signifies the end of
+a modifier. This is not thought to be an issue in a test program.
+</P>
+<P>
+Specifying a completely empty replacement string disables this modifier.
+However, it is possible to specify an empty replacement by providing a buffer
+length, as described below, for an otherwise empty replacement.
 </P>
 </P>
 <P>
 <P>
 Unlike subject strings, <b>pcre2test</b> does not process replacement strings
 Unlike subject strings, <b>pcre2test</b> does not process replacement strings
@@ -1365,11 +1437,16 @@ for <b>pcre2_substitute()</b>:
 <pre>
 <pre>
   global                      PCRE2_SUBSTITUTE_GLOBAL
   global                      PCRE2_SUBSTITUTE_GLOBAL
   substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
   substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
+  substitute_literal          PCRE2_SUBSTITUTE_LITERAL
+  substitute_matched          PCRE2_SUBSTITUTE_MATCHED
   substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
   substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+  substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
   substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
   substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
   substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
   substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
-
-</PRE>
+</pre>
+See the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+documentation for details of these options.
 </P>
 </P>
 <P>
 <P>
 After a successful substitution, the modified string is output, preceded by the
 After a successful substitution, the modified string is output, preceded by the
@@ -1399,10 +1476,10 @@ The default action of <b>pcre2_substitute()</b> is to return
 PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
 PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
 <b>substitute_overflow_length</b> modifier), <b>pcre2_substitute()</b> continues
 <b>substitute_overflow_length</b> modifier), <b>pcre2_substitute()</b> continues
-to go through the motions of matching and substituting, in order to compute the
-size of buffer that is required. When this happens, <b>pcre2test</b> shows the
-required buffer length (which includes space for the trailing zero) as part of
-the error message. For example:
+to go through the motions of matching and substituting (but not doing any
+callouts), in order to compute the size of buffer that is required. When this
+happens, <b>pcre2test</b> shows the required buffer length (which includes space
+for the trailing zero) as part of the error message. For example:
 <pre>
 <pre>
   /abc/substitute_overflow_length
   /abc/substitute_overflow_length
       123abc123\=replace=[9]XYZ
       123abc123\=replace=[9]XYZ
@@ -1413,16 +1490,59 @@ matching provokes an error return ("bad option value") from
 <b>pcre2_substitute()</b>.
 <b>pcre2_substitute()</b>.
 </P>
 </P>
 <br><b>
 <br><b>
+Testing substitute callouts
+</b><br>
+<P>
+If the <b>substitute_callout</b> modifier is set, a substitution callout
+function is set up. The <b>null_context</b> modifier must not be set, because
+the address of the callout function is passed in a match context. When the
+callout function is called (after each substitution), details of the the input
+and output strings are output. For example:
+<pre>
+  /abc/g,replace=&#60;$0&#62;,substitute_callout
+      abcdefabcpqr
+   1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62;"
+   2(1) Old 6 9 "abc" New 8 13 "&#60;abc&#62;"
+   2: &#60;abc&#62;def&#60;abc&#62;pqr
+</pre>
+The first number on each callout line is the count of matches. The
+parenthesized number is the number of pairs that are set in the ovector (that
+is, one more than the number of capturing groups that were set). Then are
+listed the offsets of the old substring, its contents, and the same for the
+replacement.
+</P>
+<P>
+By default, the substitution callout function returns zero, which accepts the
+replacement and causes matching to continue if /g was used. Two further
+modifiers can be used to test other return values. If <b>substitute_skip</b> is
+set to a value greater than zero the callout function returns +1 for the match
+of that number, and similarly <b>substitute_stop</b> returns -1. These cause the
+replacement to be rejected, and -1 causes no further matching to take place. If
+either of them are set, <b>substitute_callout</b> is assumed. For example:
+<pre>
+  /abc/g,replace=&#60;$0&#62;,substitute_skip=1
+      abcdefabcpqr
+   1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; SKIPPED"
+   2(1) Old 6 9 "abc" New 6 11 "&#60;abc&#62;"
+   2: abcdef&#60;abc&#62;pqr
+      abcdefabcpqr\=substitute_stop=1
+   1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; STOPPED"
+   1: abcdefabcpqr
+</pre>
+If both are set for the same number, stop takes precedence. Only a single skip
+or stop is supported, which is sufficient for testing that the feature works.
+</P>
+<br><b>
 Setting the JIT stack size
 Setting the JIT stack size
 </b><br>
 </b><br>
 <P>
 <P>
 The <b>jitstack</b> modifier provides a way of setting the maximum stack size
 The <b>jitstack</b> modifier provides a way of setting the maximum stack size
 that is used by the just-in-time optimization code. It is ignored if JIT
 that is used by the just-in-time optimization code. It is ignored if JIT
-optimization is not being used. The value is a number of kilobytes. Setting
-zero reverts to the default of 32K. Providing a stack that is larger than the
-default is necessary only for very complicated patterns. If <b>jitstack</b> is
-set non-zero on a subject line it overrides any value that was set on the
-pattern.
+optimization is not being used. The value is a number of kibibytes (units of
+1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
+that is larger than the default is necessary only for very complicated
+patterns. If <b>jitstack</b> is set non-zero on a subject line it overrides any
+value that was set on the pattern.
 </P>
 </P>
 <br><b>
 <br><b>
 Setting heap, match, and depth limits
 Setting heap, match, and depth limits
@@ -1438,20 +1558,17 @@ Finding minimum limits
 <P>
 <P>
 If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
 If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
 calls the relevant matching function several times, setting different values in
 calls the relevant matching function several times, setting different values in
-the match context via <b>pcre2_set_heap_limit(), \fBpcre2_set_match_limit()</b>,
-or <b>pcre2_set_depth_limit()</b> until it finds the minimum values for each
-parameter that allows the match to complete without error.
+the match context via <b>pcre2_set_heap_limit()</b>,
+<b>pcre2_set_match_limit()</b>, or <b>pcre2_set_depth_limit()</b> until it finds
+the minimum values for each parameter that allows the match to complete without
+error. If JIT is being used, only the match limit is relevant.
 </P>
 </P>
 <P>
 <P>
-If JIT is being used, only the match limit is relevant. If DFA matching is
-being used, only the depth limit is relevant.
-</P>
-<P>
-The <i>match_limit</i> number is a measure of the amount of backtracking
-that takes place, and learning the minimum value can be instructive. For most
-simple matches, the number is quite small, but for patterns with very large
-numbers of matching possibilities, it can become large very quickly with
-increasing length of subject string.
+When using this modifier, the pattern should not contain any limit settings
+such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
+lower than the minimum matching value, the minimum value cannot be found
+because <b>pcre2_set_match_limit()</b> etc. are only able to reduce the value of
+an in-pattern limit; they cannot increase it.
 </P>
 </P>
 <P>
 <P>
 For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
 For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
@@ -1460,6 +1577,22 @@ searched). In the case of DFA matching, <i>depth_limit</i> controls the depth of
 recursive calls of the internal function that is used for handling pattern
 recursive calls of the internal function that is used for handling pattern
 recursion, lookaround assertions, and atomic groups.
 recursion, lookaround assertions, and atomic groups.
 </P>
 </P>
+<P>
+For non-DFA matching, the <i>match_limit</i> number is a measure of the amount
+of backtracking that takes place, and learning the minimum value can be
+instructive. For most simple matches, the number is quite small, but for
+patterns with very large numbers of matching possibilities, it can become large
+very quickly with increasing length of subject string. In the case of DFA
+matching, <i>match_limit</i> controls the total number of calls, both recursive
+and non-recursive, to the internal matching function, thus controlling the
+overall amount of computing resource that is used.
+</P>
+<P>
+For both kinds of matching, the <i>heap_limit</i> number, which is in kibibytes
+(units of 1024 bytes), limits the amount of heap memory used for matching. A
+value of zero disables the use of any heap memory; many simple pattern matches
+can be done without using the heap, so zero is not an unreasonable setting.
+</P>
 <br><b>
 <br><b>
 Showing MARK names
 Showing MARK names
 </b><br>
 </b><br>
@@ -1476,13 +1609,14 @@ Showing memory usage
 <P>
 <P>
 The <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
 The <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
 memory allocation and freeing calls that occur during a call to
 memory allocation and freeing calls that occur during a call to
-<b>pcre2_match()</b>. These occur only when a match requires a bigger vector
-than the default for remembering backtracking points. In many cases there will
-be no heap memory used and therefore no additional output. No heap memory is
-allocated during matching with <b>pcre2_dfa_match</b> or with JIT, so in those
-cases the <b>memory</b> modifier never has any effect. For this modifier to
-work, the <b>null_context</b> modifier must not be set on both the pattern and
-the subject, though it can be set on one or the other.
+<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>. These occur only when a match
+requires a bigger vector than the default for remembering backtracking points
+(<b>pcre2_match()</b>) or for internal workspace (<b>pcre2_dfa_match()</b>). In
+many cases there will be no heap memory used and therefore no additional
+output. No heap memory is allocated during matching with JIT, so in that case
+the <b>memory</b> modifier never has any effect. For this modifier to work, the
+<b>null_context</b> modifier must not be set on both the pattern and the
+subject, though it can be set on one or the other.
 </P>
 </P>
 <br><b>
 <br><b>
 Setting a starting offset
 Setting a starting offset
@@ -1538,11 +1672,11 @@ Passing a NULL context
 </b><br>
 </b><br>
 <P>
 <P>
 Normally, <b>pcre2test</b> passes a context block to <b>pcre2_match()</b>,
 Normally, <b>pcre2test</b> passes a context block to <b>pcre2_match()</b>,
-<b>pcre2_dfa_match()</b> or <b>pcre2_jit_match()</b>. If the <b>null_context</b>
-modifier is set, however, NULL is passed. This is for testing that the matching
-functions behave correctly in this case (they use default values). This
-modifier cannot be used with the <b>find_limits</b> modifier or when testing the
-substitution function.
+<b>pcre2_dfa_match()</b>, <b>pcre2_jit_match()</b> or <b>pcre2_substitute()</b>.
+If the <b>null_context</b> modifier is set, however, NULL is passed. This is for
+testing that the matching and substitution functions behave correctly in this
+case (they use default values). This modifier cannot be used with the
+<b>find_limits</b> or <b>substitute_callout</b> modifiers.
 </P>
 </P>
 <br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
 <br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
 <P>
 <P>
@@ -1689,7 +1823,7 @@ restart the match with additional subject data by means of the
 <b>dfa_restart</b> modifier. For example:
 <b>dfa_restart</b> modifier. For example:
 <pre>
 <pre>
     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
-  data&#62; 23ja\=P,dfa
+  data&#62; 23ja\=ps,dfa
   Partial match: 23ja
   Partial match: 23ja
   data&#62; n05\=dfa,dfa_restart
   data&#62; n05\=dfa,dfa_restart
    0: n05
    0: n05
@@ -1913,15 +2047,21 @@ documentation. In this section we describe the features of <b>pcre2test</b> that
 can be used to test these functions.
 can be used to test these functions.
 </P>
 </P>
 <P>
 <P>
-When a pattern with <b>push</b> modifier is successfully compiled, it is pushed
-onto a stack of compiled patterns, and <b>pcre2test</b> expects the next line to
-contain a new pattern (or command) instead of a subject line. By contrast,
-the <b>pushcopy</b> modifier causes a copy of the compiled pattern to be
-stacked, leaving the original available for immediate matching. By using
-<b>push</b> and/or <b>pushcopy</b>, a number of patterns can be compiled and
-retained. These modifiers are incompatible with <b>posix</b>, and control
-modifiers that act at match time are ignored (with a message) for the stacked
-patterns. The <b>jitverify</b> modifier applies only at compile time.
+Note that "serialization" in PCRE2 does not convert compiled patterns to an
+abstract format like Java or .NET. It just makes a reloadable byte code stream.
+Hence the restrictions on reloading mentioned above.
+</P>
+<P>
+In <b>pcre2test</b>, when a pattern with <b>push</b> modifier is successfully
+compiled, it is pushed onto a stack of compiled patterns, and <b>pcre2test</b>
+expects the next line to contain a new pattern (or command) instead of a
+subject line. By contrast, the <b>pushcopy</b> modifier causes a copy of the
+compiled pattern to be stacked, leaving the original available for immediate
+matching. By using <b>push</b> and/or <b>pushcopy</b>, a number of patterns can
+be compiled and retained. These modifiers are incompatible with <b>posix</b>,
+and control modifiers that act at match time are ignored (with a message) for
+the stacked patterns. The <b>jitverify</b> modifier applies only at compile
+time.
 </P>
 </P>
 <P>
 <P>
 The command
 The command
@@ -1975,16 +2115,16 @@ on the stack.
 <P>
 <P>
 Philip Hazel
 Philip Hazel
 <br>
 <br>
-University Computing Service
+Retired from University Computing Service
 <br>
 <br>
 Cambridge, England.
 Cambridge, England.
 <br>
 <br>
 </P>
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
 <P>
-Last updated: 21 December 2017
+Last updated: 30 August 2021
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 261 - 69
regex.mod/pcre/doc/html/pcre2unicode.html

@@ -16,31 +16,44 @@ please consult the man page, in case the conversion went wrong.
 UNICODE AND UTF SUPPORT
 UNICODE AND UTF SUPPORT
 </b><br>
 </b><br>
 <P>
 <P>
-When PCRE2 is built with Unicode support (which is the default), it has
-knowledge of Unicode character properties and can process text strings in
-UTF-8, UTF-16, or UTF-32 format (depending on the code unit width). However, by
-default, PCRE2 assumes that one code unit is one character. To process a
-pattern as a UTF string, where a character may require more than one code unit,
-you must call
+PCRE2 is normally built with Unicode support, though if you do not need it, you
+can build it without, in which case the library will be smaller. With Unicode
+support, PCRE2 has knowledge of Unicode character properties and can process
+strings of text in UTF-8, UTF-16, and UTF-32 format (depending on the code unit
+width), but this is not the default. Unless specifically requested, PCRE2
+treats each code unit in a string as one character.
+</P>
+<P>
+There are two ways of telling PCRE2 to switch to UTF mode, where characters may
+consist of more than one code unit and the range of values is constrained. The
+program can call
 <a href="pcre2_compile.html"><b>pcre2_compile()</b></a>
 <a href="pcre2_compile.html"><b>pcre2_compile()</b></a>
-with the PCRE2_UTF option flag, or the pattern must start with the sequence
-(*UTF). When either of these is the case, both the pattern and any subject
-strings that are matched against it are treated as UTF strings instead of
-strings of individual one-code-unit characters.
+with the PCRE2_UTF option, or the pattern may start with the sequence (*UTF).
+However, the latter facility can be locked out by the PCRE2_NEVER_UTF option.
+That is, the programmer can prevent the supplier of the pattern from switching
+to UTF mode.
+</P>
+<P>
+Note that the PCRE2_MATCH_INVALID_UTF option (see
+<a href="#matchinvalid">below)</a>
+forces PCRE2_UTF to be set.
 </P>
 </P>
 <P>
 <P>
-If you do not need Unicode support you can build PCRE2 without it, in which
-case the library will be smaller.
+In UTF mode, both the pattern and any subject strings that are matched against
+it are treated as UTF strings instead of strings of individual one-code-unit
+characters. There are also some other changes to the way characters are
+handled, as documented below.
 </P>
 </P>
 <br><b>
 <br><b>
 UNICODE PROPERTY SUPPORT
 UNICODE PROPERTY SUPPORT
 </b><br>
 </b><br>
 <P>
 <P>
 When PCRE2 is built with Unicode support, the escape sequences \p{..},
 When PCRE2 is built with Unicode support, the escape sequences \p{..},
-\P{..}, and \X can be used. The Unicode properties that can be tested are
-limited to the general category properties such as Lu for an upper case letter
-or Nd for a decimal number, the Unicode script names such as Arabic or Han, and
-the derived properties Any and L&. Full lists are given in the
+\P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting.
+The Unicode properties that can be tested are limited to the general category
+properties such as Lu for an upper case letter or Nd for a decimal number, the
+Unicode script names such as Arabic or Han, and the derived properties Any and
+L&. Full lists are given in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 and
 and
 <a href="pcre2syntax.html"><b>pcre2syntax</b></a>
 <a href="pcre2syntax.html"><b>pcre2syntax</b></a>
@@ -53,25 +66,36 @@ compatibility with Perl 5.6. PCRE2 does not support this.
 WIDE CHARACTERS AND UTF MODES
 WIDE CHARACTERS AND UTF MODES
 </b><br>
 </b><br>
 <P>
 <P>
-Codepoints less than 256 can be specified in patterns by either braced or
+Code points less than 256 can be specified in patterns by either braced or
 unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). Larger
 unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). Larger
 values have to use braced sequences. Unbraced octal code points up to \777 are
 values have to use braced sequences. Unbraced octal code points up to \777 are
 also recognized; larger ones can be coded using \o{...}.
 also recognized; larger ones can be coded using \o{...}.
 </P>
 </P>
 <P>
 <P>
-In UTF modes, repeat quantifiers apply to complete UTF characters, not to
+The escape sequence \N{U+&#60;hex digits&#62;} is recognized as another way of
+specifying a Unicode character by code point in a UTF mode. It is not allowed
+in non-UTF mode.
+</P>
+<P>
+In UTF mode, repeat quantifiers apply to complete UTF characters, not to
 individual code units.
 individual code units.
 </P>
 </P>
 <P>
 <P>
-In UTF modes, the dot metacharacter matches one UTF character instead of a
+In UTF mode, the dot metacharacter matches one UTF character instead of a
 single code unit.
 single code unit.
 </P>
 </P>
 <P>
 <P>
-The escape sequence \C can be used to match a single code unit in a UTF mode,
+In UTF mode, capture group names are not restricted to ASCII, and may contain
+any Unicode letters and decimal digits, as well as underscore.
+</P>
+<P>
+The escape sequence \C can be used to match a single code unit in UTF mode,
 but its use can lead to some strange effects because it breaks up multi-unit
 but its use can lead to some strange effects because it breaks up multi-unit
 characters (see the description of \C in the
 characters (see the description of \C in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
-documentation).
+documentation). For this reason, there is a build-time option that disables
+support for \C completely. There is also a less draconian compile-time option
+for locking out the use of \C when a pattern is compiled.
 </P>
 </P>
 <P>
 <P>
 The use of \C is not supported by the alternative matching function
 The use of \C is not supported by the alternative matching function
@@ -80,7 +104,7 @@ may consist of more than one code unit. The use of \C in these modes provokes
 a match-time error. Also, the JIT optimization does not support \C in these
 a match-time error. Also, the JIT optimization does not support \C in these
 modes. If JIT optimization is requested for a UTF-8 or UTF-16 pattern that
 modes. If JIT optimization is requested for a UTF-8 or UTF-16 pattern that
 contains \C, it will not succeed, and so when <b>pcre2_match()</b> is called,
 contains \C, it will not succeed, and so when <b>pcre2_match()</b> is called,
-the matching will be carried out by the normal interpretive function.
+the matching will be carried out by the interpretive function.
 </P>
 </P>
 <P>
 <P>
 The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly test
 The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly test
@@ -110,25 +134,159 @@ However, the special horizontal and vertical white space matching escapes (\h,
 not PCRE2_UCP is set.
 not PCRE2_UCP is set.
 </P>
 </P>
 <br><b>
 <br><b>
-CASE-EQUIVALENCE IN UTF MODES
+UNICODE CASE-EQUIVALENCE
+</b><br>
+<P>
+If either PCRE2_UTF or PCRE2_UCP is set, upper/lower case processing makes use
+of Unicode properties except for characters whose code points are less than 128
+and that have at most two case-equivalent values. For these, a direct table
+lookup is used for speed. A few Unicode characters such as Greek sigma have
+more than two code points that are case-equivalent, and these are treated
+specially. Setting PCRE2_UCP without PCRE2_UTF allows Unicode-style case
+processing for non-UTF character encodings such as UCS-2.
+<a name="scriptruns"></a></P>
+<br><b>
+SCRIPT RUNS
+</b><br>
+<P>
+The pattern constructs (*script_run:...) and (*atomic_script_run:...), with
+synonyms (*sr:...) and (*asr:...), verify that the string matched within the
+parentheses is a script run. In concept, a script run is a sequence of
+characters that are all from the same Unicode script. However, because some
+scripts are commonly used together, and because some diacritical and other
+marks are used with multiple scripts, it is not that simple.
+</P>
+<P>
+Every Unicode character has a Script property, mostly with a value
+corresponding to the name of a script, such as Latin, Greek, or Cyrillic. There
+are also three special values:
+</P>
+<P>
+"Unknown" is used for code points that have not been assigned, and also for the
+surrogate code points. In the PCRE2 32-bit library, characters whose code
+points are greater than the Unicode maximum (U+10FFFF), which are accessible
+only in non-UTF mode, are assigned the Unknown script.
+</P>
+<P>
+"Common" is used for characters that are used with many scripts. These include
+punctuation, emoji, mathematical, musical, and currency symbols, and the ASCII
+digits 0 to 9.
+</P>
+<P>
+"Inherited" is used for characters such as diacritical marks that modify a
+previous character. These are considered to take on the script of the character
+that they modify.
+</P>
+<P>
+Some Inherited characters are used with many scripts, but many of them are only
+normally used with a small number of scripts. For example, U+102E0 (Coptic
+Epact thousands mark) is used only with Arabic and Coptic. In order to make it
+possible to check this, a Unicode property called Script Extension exists. Its
+value is a list of scripts that apply to the character. For the majority of
+characters, the list contains just one script, the same one as the Script
+property. However, for characters such as U+102E0 more than one Script is
+listed. There are also some Common characters that have a single, non-Common
+script in their Script Extension list.
+</P>
+<P>
+The next section describes the basic rules for deciding whether a given string
+of characters is a script run. Note, however, that there are some special cases
+involving the Chinese Han script, and an additional constraint for decimal
+digits. These are covered in subsequent sections.
+</P>
+<br><b>
+Basic script run rules
+</b><br>
+<P>
+A string that is less than two characters long is a script run. This is the
+only case in which an Unknown character can be part of a script run. Longer
+strings are checked using only the Script Extensions property, not the basic
+Script property.
+</P>
+<P>
+If a character's Script Extension property is the single value "Inherited", it
+is always accepted as part of a script run. This is also true for the property
+"Common", subject to the checking of decimal digits described below. All the
+remaining characters in a script run must have at least one script in common in
+their Script Extension lists. In set-theoretic terminology, the intersection of
+all the sets of scripts must not be empty.
+</P>
+<P>
+A simple example is an Internet name such as "google.com". The letters are all
+in the Latin script, and the dot is Common, so this string is a script run.
+However, the Cyrillic letter "o" looks exactly the same as the Latin "o"; a
+string that looks the same, but with Cyrillic "o"s is not a script run.
+</P>
+<P>
+More interesting examples involve characters with more than one script in their
+Script Extension. Consider the following characters:
+<pre>
+  U+060C  Arabic comma
+  U+06D4  Arabic full stop
+</pre>
+The first has the Script Extension list Arabic, Hanifi Rohingya, Syriac, and
+Thaana; the second has just Arabic and Hanifi Rohingya. Both of them could
+appear in script runs of either Arabic or Hanifi Rohingya. The first could also
+appear in Syriac or Thaana script runs, but the second could not.
+</P>
+<br><b>
+The Chinese Han script
 </b><br>
 </b><br>
 <P>
 <P>
-Case-insensitive matching in a UTF mode makes use of Unicode properties except
-for characters whose code points are less than 128 and that have at most two
-case-equivalent values. For these, a direct table lookup is used for speed. A
-few Unicode characters such as Greek sigma have more than two codepoints that
-are case-equivalent, and these are treated as such.
+The Chinese Han script is commonly used in conjunction with other scripts for
+writing certain languages. Japanese uses the Hiragana and Katakana scripts
+together with Han; Korean uses Hangul and Han; Taiwanese Mandarin uses Bopomofo
+and Han. These three combinations are treated as special cases when checking
+script runs and are, in effect, "virtual scripts". Thus, a script run may
+contain a mixture of Hiragana, Katakana, and Han, or a mixture of Hangul and
+Han, or a mixture of Bopomofo and Han, but not, for example, a mixture of
+Hangul and Bopomofo and Han. PCRE2 (like Perl) follows Unicode's Technical
+Standard 39 ("Unicode Security Mechanisms", http://unicode.org/reports/tr39/)
+in allowing such mixtures.
+</P>
+<br><b>
+Decimal digits
+</b><br>
+<P>
+Unicode contains many sets of 10 decimal digits in different scripts, and some
+scripts (including the Common script) contain more than one set. Some of these
+decimal digits them are visually indistinguishable from the common ASCII
+digits. In addition to the script checking described above, if a script run
+contains any decimal digits, they must all come from the same set of 10
+adjacent characters.
 </P>
 </P>
 <br><b>
 <br><b>
 VALIDITY OF UTF STRINGS
 VALIDITY OF UTF STRINGS
 </b><br>
 </b><br>
 <P>
 <P>
 When the PCRE2_UTF option is set, the strings passed as patterns and subjects
 When the PCRE2_UTF option is set, the strings passed as patterns and subjects
-are (by default) checked for validity on entry to the relevant functions.
-If an invalid UTF string is passed, an negative error code is returned. The
-code unit offset to the offending character can be extracted from the match
-data block by calling <b>pcre2_get_startchar()</b>, which is used for this
-purpose after a UTF error.
+are (by default) checked for validity on entry to the relevant functions. If an
+invalid UTF string is passed, a negative error code is returned. The code unit
+offset to the offending character can be extracted from the match data block by
+calling <b>pcre2_get_startchar()</b>, which is used for this purpose after a UTF
+error.
+</P>
+<P>
+In some situations, you may already know that your strings are valid, and
+therefore want to skip these checks in order to improve performance, for
+example in the case of a long subject string that is being scanned repeatedly.
+If you set the PCRE2_NO_UTF_CHECK option at compile time or at match time,
+PCRE2 assumes that the pattern or subject it is given (respectively) contains
+only valid UTF code unit sequences.
+</P>
+<P>
+If you pass an invalid UTF string when PCRE2_NO_UTF_CHECK is set, the result
+is undefined and your program may crash or loop indefinitely or give incorrect
+results. There is, however, one mode of matching that can handle invalid UTF
+subject strings. This is enabled by passing PCRE2_MATCH_INVALID_UTF to
+<b>pcre2_compile()</b> and is discussed below in the next section. The rest of
+this section covers the case when PCRE2_MATCH_INVALID_UTF is not set.
+</P>
+<P>
+Passing PCRE2_NO_UTF_CHECK to <b>pcre2_compile()</b> just disables the UTF check
+for the pattern; it does not also apply to subject strings. If you want to
+disable the check for a subject string you must pass this same option to
+<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>.
 </P>
 </P>
 <P>
 <P>
 UTF-16 and UTF-32 strings can indicate their endianness by special code knows
 UTF-16 and UTF-32 strings can indicate their endianness by special code knows
@@ -136,13 +294,14 @@ as a byte-order mark (BOM). The PCRE2 functions do not handle this, expecting
 strings to be in host byte order.
 strings to be in host byte order.
 </P>
 </P>
 <P>
 <P>
-A UTF string is checked before any other processing takes place. In the case of
-<b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> calls with a non-zero starting
-offset, the check is applied only to that part of the subject that could be
-inspected during matching, and there is a check that the starting offset points
-to the first code unit of a character or to the end of the subject. If there
-are no lookbehind assertions in the pattern, the check starts at the starting
-offset. Otherwise, it starts at the length of the longest lookbehind before the
+Unless PCRE2_NO_UTF_CHECK is set, a UTF string is checked before any other
+processing takes place. In the case of <b>pcre2_match()</b> and
+<b>pcre2_dfa_match()</b> calls with a non-zero starting offset, the check is
+applied only to that part of the subject that could be inspected during
+matching, and there is a check that the starting offset points to the first
+code unit of a character or to the end of the subject. If there are no
+lookbehind assertions in the pattern, the check starts at the starting offset.
+Otherwise, it starts at the length of the longest lookbehind before the
 starting offset, or at the start of the subject if there are not that many
 starting offset, or at the start of the subject if there are not that many
 characters before the starting offset. Note that the sequences \b and \B are
 characters before the starting offset. Note that the sequences \b and \B are
 one-character lookbehinds.
 one-character lookbehinds.
@@ -162,31 +321,12 @@ surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and
 UTF-32.)
 UTF-32.)
 </P>
 </P>
 <P>
 <P>
-In some situations, you may already know that your strings are valid, and
-therefore want to skip these checks in order to improve performance, for
-example in the case of a long subject string that is being scanned repeatedly.
-If you set the PCRE2_NO_UTF_CHECK option at compile time or at match time,
-PCRE2 assumes that the pattern or subject it is given (respectively) contains
-only valid UTF code unit sequences.
-</P>
-<P>
-Passing PCRE2_NO_UTF_CHECK to <b>pcre2_compile()</b> just disables the check for
-the pattern; it does not also apply to subject strings. If you want to disable
-the check for a subject string you must pass this option to <b>pcre2_match()</b>
-or <b>pcre2_dfa_match()</b>.
-</P>
-<P>
-If you pass an invalid UTF string when PCRE2_NO_UTF_CHECK is set, the result
-is undefined and your program may crash or loop indefinitely.
-</P>
-<P>
-Note that setting PCRE2_NO_UTF_CHECK at compile time does not disable the error
-that is given if an escape sequence for an invalid Unicode code point is
-encountered in the pattern. If you want to allow escape sequences such as
-\x{d800} (a surrogate code point) you can set the
-PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra option. However, this is possible
-only in UTF-8 and UTF-32 modes, because these values are not representable in
-UTF-16.
+Setting PCRE2_NO_UTF_CHECK at compile time does not disable the error that is
+given if an escape sequence for an invalid Unicode code point is encountered in
+the pattern. If you want to allow escape sequences such as \x{d800} (a
+surrogate code point) you can set the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra
+option. However, this is possible only in UTF-8 and UTF-32 modes, because these
+values are not representable in UTF-16.
 <a name="utf8strings"></a></P>
 <a name="utf8strings"></a></P>
 <br><b>
 <br><b>
 Errors in UTF-8 strings
 Errors in UTF-8 strings
@@ -224,7 +364,7 @@ these code points are excluded by RFC 3629.
 <pre>
 <pre>
   PCRE2_ERROR_UTF8_ERR13
   PCRE2_ERROR_UTF8_ERR13
 </pre>
 </pre>
-A 4-byte character has a value greater than 0x10fff; these code points are
+A 4-byte character has a value greater than 0x10ffff; these code points are
 excluded by RFC 3629.
 excluded by RFC 3629.
 <pre>
 <pre>
   PCRE2_ERROR_UTF8_ERR14
   PCRE2_ERROR_UTF8_ERR14
@@ -277,7 +417,59 @@ The following negative error codes are given for invalid UTF-32 strings:
   PCRE2_ERROR_UTF32_ERR1  Surrogate character (0xd800 to 0xdfff)
   PCRE2_ERROR_UTF32_ERR1  Surrogate character (0xd800 to 0xdfff)
   PCRE2_ERROR_UTF32_ERR2  Code point is greater than 0x10ffff
   PCRE2_ERROR_UTF32_ERR2  Code point is greater than 0x10ffff
 
 
-</PRE>
+<a name="matchinvalid"></a></PRE>
+</P>
+<br><b>
+MATCHING IN INVALID UTF STRINGS
+</b><br>
+<P>
+You can run pattern matches on subject strings that may contain invalid UTF
+sequences if you call <b>pcre2_compile()</b> with the PCRE2_MATCH_INVALID_UTF
+option. This is supported by <b>pcre2_match()</b>, including JIT matching, but
+not by <b>pcre2_dfa_match()</b>. When PCRE2_MATCH_INVALID_UTF is set, it forces
+PCRE2_UTF to be set as well. Note, however, that the pattern itself must be a
+valid UTF string.
+</P>
+<P>
+Setting PCRE2_MATCH_INVALID_UTF does not affect what <b>pcre2_compile()</b>
+generates, but if <b>pcre2_jit_compile()</b> is subsequently called, it does
+generate different code. If JIT is not used, the option affects the behaviour
+of the interpretive code in <b>pcre2_match()</b>. When PCRE2_MATCH_INVALID_UTF
+is set at compile time, PCRE2_NO_UTF_CHECK is ignored at match time.
+</P>
+<P>
+In this mode, an invalid code unit sequence in the subject never matches any
+pattern item. It does not match dot, it does not match \p{Any}, it does not
+even match negative items such as [^X]. A lookbehind assertion fails if it
+encounters an invalid sequence while moving the current point backwards. In
+other words, an invalid UTF code unit sequence acts as a barrier which no match
+can cross.
+</P>
+<P>
+You can also think of this as the subject being split up into fragments of
+valid UTF, delimited internally by invalid code unit sequences. The pattern is
+matched fragment by fragment. The result of a successful match, however, is
+given as code unit offsets in the entire subject string in the usual way. There
+are a few points to consider:
+</P>
+<P>
+The internal boundaries are not interpreted as the beginnings or ends of lines
+and so do not match circumflex or dollar characters in the pattern.
+</P>
+<P>
+If <b>pcre2_match()</b> is called with an offset that points to an invalid
+UTF-sequence, that sequence is skipped, and the match starts at the next valid
+UTF character, or the end of the subject.
+</P>
+<P>
+At internal fragment boundaries, \b and \B behave in the same way as at the
+beginning and end of the subject. For example, a sequence such as \bWORD\b
+would match an instance of WORD that is surrounded by invalid UTF code units.
+</P>
+<P>
+Using PCRE2_MATCH_INVALID_UTF, an application can run matches on arbitrary
+data, knowing that any matched strings that are returned are valid UTF. This
+can be useful when searching for UTF text in executable or other binary files.
 </P>
 </P>
 <br><b>
 <br><b>
 AUTHOR
 AUTHOR
@@ -294,9 +486,9 @@ Cambridge, England.
 REVISION
 REVISION
 </b><br>
 </b><br>
 <P>
 <P>
-Last updated: 17 May 2017
+Last updated: 23 February 2020
 <br>
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2020 University of Cambridge.
 <br>
 <br>
 <p>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 Return to the <a href="index.html">PCRE2 index page</a>.

+ 7 - 1
regex.mod/pcre/doc/index.html.src

@@ -141,11 +141,14 @@ in the library.
     <td>&nbsp;&nbsp;Free a general context</td></tr>
     <td>&nbsp;&nbsp;Free a general context</td></tr>
 
 
 <tr><td><a href="pcre2_get_error_message.html">pcre2_get_error_message</a></td>
 <tr><td><a href="pcre2_get_error_message.html">pcre2_get_error_message</a></td>
-    <td>&nbsp;&nbsp;Free study data</td></tr>
+    <td>&nbsp;&nbsp;Get textual error message for error number</td></tr>
 
 
 <tr><td><a href="pcre2_get_mark.html">pcre2_get_mark</a></td>
 <tr><td><a href="pcre2_get_mark.html">pcre2_get_mark</a></td>
     <td>&nbsp;&nbsp;Get a (*MARK) name</td></tr>
     <td>&nbsp;&nbsp;Get a (*MARK) name</td></tr>
 
 
+<tr><td><a href="pcre2_get_match_data_size.html">pcre2_get_match_data_size</a></td>
+    <td>&nbsp;&nbsp;Get the size of a match data block</td></tr>
+
 <tr><td><a href="pcre2_get_ovector_count.html">pcre2_get_ovector_count</a></td>
 <tr><td><a href="pcre2_get_ovector_count.html">pcre2_get_ovector_count</a></td>
     <td>&nbsp;&nbsp;Get the ovector count</td></tr>
     <td>&nbsp;&nbsp;Get the ovector count</td></tr>
 
 
@@ -176,6 +179,9 @@ in the library.
 <tr><td><a href="pcre2_maketables.html">pcre2_maketables</a></td>
 <tr><td><a href="pcre2_maketables.html">pcre2_maketables</a></td>
     <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
     <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
 
 
+<tr><td><a href="pcre2_maketables_free.html">pcre2_maketables_free</a></td>
+    <td>&nbsp;&nbsp;Free character tables</td></tr>
+
 <tr><td><a href="pcre2_match.html">pcre2_match</a></td>
 <tr><td><a href="pcre2_match.html">pcre2_match</a></td>
     <td>&nbsp;&nbsp;Match a compiled pattern to a subject string
     <td>&nbsp;&nbsp;Match a compiled pattern to a subject string
     (Perl compatible)</td></tr>
     (Perl compatible)</td></tr>

+ 15 - 15
regex.mod/pcre/doc/pcre2-config.txt

@@ -16,8 +16,8 @@ DESCRIPTION
 
 
        pcre2-config returns the configuration of the installed PCRE2 libraries
        pcre2-config returns the configuration of the installed PCRE2 libraries
        and the options required to compile a program to use them. Some of  the
        and the options required to compile a program to use them. Some of  the
-       options  apply  only  to  the  8-bit,  or  16-bit, or 32-bit libraries,
-       respectively, and are not available for libraries that  have  not  been
+       options  apply  only  to the 8-bit, or 16-bit, or 32-bit libraries, re-
+       spectively, and are not available for  libraries  that  have  not  been
        built. If an unavailable option is encountered, the "usage" information
        built. If an unavailable option is encountered, the "usage" information
        is output.
        is output.
 
 
@@ -36,30 +36,30 @@ OPTIONS
        --version Writes the version number of the installed PCRE2 libraries to
        --version Writes the version number of the installed PCRE2 libraries to
                  the standard output.
                  the standard output.
 
 
-       --libs8   Writes  to  the  standard  output  the  command  line options
-                 required to link with the 8-bit PCRE2 library  (-lpcre2-8  on
+       --libs8   Writes  to  the  standard output the command line options re-
+                 quired to link with the 8-bit  PCRE2  library  (-lpcre2-8  on
                  many systems).
                  many systems).
 
 
-       --libs16  Writes  to  the  standard  output  the  command  line options
-                 required to link with the 16-bit PCRE2 library (-lpcre2-16 on
+       --libs16  Writes  to  the  standard output the command line options re-
+                 quired to link with the 16-bit PCRE2 library  (-lpcre2-16  on
                  many systems).
                  many systems).
 
 
-       --libs32  Writes  to  the  standard  output  the  command  line options
-                 required to link with the 32-bit PCRE2 library (-lpcre2-32 on
+       --libs32  Writes  to  the  standard output the command line options re-
+                 quired to link with the 32-bit PCRE2 library  (-lpcre2-32  on
                  many systems).
                  many systems).
 
 
        --libs-posix
        --libs-posix
-                 Writes  to  the  standard  output  the  command  line options
-                 required to link  with  PCRE2's  POSIX  API  wrapper  library
+                 Writes  to  the  standard output the command line options re-
+                 quired  to  link  with  PCRE2's  POSIX  API  wrapper  library
                  (-lpcre2-posix -lpcre2-8 on many systems).
                  (-lpcre2-posix -lpcre2-8 on many systems).
 
 
-       --cflags  Writes  to  the  standard  output  the  command  line options
-                 required to compile files that use PCRE2  (this  may  include
-                 some -I options, but is blank on many systems).
+       --cflags  Writes  to  the  standard output the command line options re-
+                 quired to compile files that use PCRE2 (this may include some
+                 -I options, but is blank on many systems).
 
 
        --cflags-posix
        --cflags-posix
-                 Writes  to  the  standard  output  the  command  line options
-                 required to compile files that use PCRE2's POSIX API  wrapper
+                 Writes  to  the  standard output the command line options re-
+                 quired to compile files that use PCRE2's  POSIX  API  wrapper
                  library  (this  may  include some -I options, but is blank on
                  library  (this  may  include some -I options, but is blank on
                  many systems).
                  many systems).
 
 

+ 28 - 14
regex.mod/pcre/doc/pcre2.3

@@ -1,4 +1,4 @@
-.TH PCRE2 3 "01 April 2017" "PCRE2 10.30"
+.TH PCRE2 3 "27 August 2021" "PCRE2 10.38"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH INTRODUCTION
 .SH INTRODUCTION
@@ -6,15 +6,27 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
 .sp
 PCRE2 is the name used for a revised API for the PCRE library, which is a set
 PCRE2 is the name used for a revised API for the PCRE library, which is a set
 of functions, written in C, that implement regular expression pattern matching
 of functions, written in C, that implement regular expression pattern matching
-using the same syntax and semantics as Perl, with just a few differences. Some
-features that appeared in Python and the original PCRE before they appeared in
-Perl are also available using the Python syntax. There is also some support for
-one or two .NET and Oniguruma syntax items, and there are options for
-requesting some minor changes that give better ECMAScript (aka JavaScript)
-compatibility.
+using the same syntax and semantics as Perl, with just a few differences. After
+nearly two decades, the limitations of the original API were making development
+increasingly difficult. The new API is more extensible, and it was simplified
+by abolishing the separate "study" optimizing function; in PCRE2, patterns are
+automatically optimized where possible. Since forking from PCRE1, the code has
+been extensively refactored and new features introduced. The old library is now
+obsolete and is no longer maintained.
+.P
+As well as Perl-style regular expression patterns, some features that appeared
+in Python and the original PCRE before they appeared in Perl are available
+using the Python syntax. There is also some support for one or two .NET and
+Oniguruma syntax items, and there are options for requesting some minor changes
+that give better ECMAScript (aka JavaScript) compatibility.
+.P
+The source code for PCRE2 can be compiled to support strings of 8-bit, 16-bit,
+or 32-bit code units, which means that up to three separate libraries may be
+installed, one for each code unit size. The size of code unit is not related to
+the bit size of the underlying hardware. In a 64-bit environment that also
+supports 32-bit applications, versions of PCRE2 that are compiled in both
+64-bit and 32-bit modes may be needed.
 .P
 .P
-The source code for PCRE2 can be compiled to support 8-bit, 16-bit, or 32-bit
-code units, which means that up to three separate libraries may be installed.
 The original work to extend PCRE to 16-bit and 32-bit code units was done by
 The original work to extend PCRE to 16-bit and 32-bit code units was done by
 Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings
 Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings
 can be interpreted either as one character per code unit, or as UTF-encoded
 can be interpreted either as one character per code unit, or as UTF-encoded
@@ -150,8 +162,9 @@ listing), and the short pages for individual functions, are concatenated in
   pcre2-config       show PCRE2 installation configuration information
   pcre2-config       show PCRE2 installation configuration information
   pcre2api           details of PCRE2's native C API
   pcre2api           details of PCRE2's native C API
   pcre2build         building PCRE2
   pcre2build         building PCRE2
-  pcre2callout       details of the callout feature
+  pcre2callout       details of the pattern callout feature
   pcre2compat        discussion of Perl compatibility
   pcre2compat        discussion of Perl compatibility
+  pcre2convert       details of pattern conversion functions
   pcre2demo          a demonstration C program that uses PCRE2
   pcre2demo          a demonstration C program that uses PCRE2
   pcre2grep          description of the \fBpcre2grep\fP command (8-bit only)
   pcre2grep          description of the \fBpcre2grep\fP command (8-bit only)
   pcre2jit           discussion of just-in-time optimization support
   pcre2jit           discussion of just-in-time optimization support
@@ -164,6 +177,7 @@ listing), and the short pages for individual functions, are concatenated in
   pcre2perform       discussion of performance issues
   pcre2perform       discussion of performance issues
   pcre2posix         the POSIX-compatible C API for the 8-bit library
   pcre2posix         the POSIX-compatible C API for the 8-bit library
   pcre2sample        discussion of the pcre2demo program
   pcre2sample        discussion of the pcre2demo program
+  pcre2serialize     details of pattern serialization
   pcre2syntax        quick syntax reference
   pcre2syntax        quick syntax reference
   pcre2test          description of the \fBpcre2test\fP command
   pcre2test          description of the \fBpcre2test\fP command
   pcre2unicode       discussion of Unicode and UTF support
   pcre2unicode       discussion of Unicode and UTF support
@@ -177,18 +191,18 @@ function, listing its arguments and results.
 .sp
 .sp
 .nf
 .nf
 Philip Hazel
 Philip Hazel
-University Computing Service
+Retired from University Computing Service
 Cambridge, England.
 Cambridge, England.
 .fi
 .fi
 .P
 .P
 Putting an actual email address here is a spam magnet. If you want to email me,
 Putting an actual email address here is a spam magnet. If you want to email me,
-use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
+use my two names separated by a dot at gmail.com.
 .
 .
 .
 .
 .SH REVISION
 .SH REVISION
 .rs
 .rs
 .sp
 .sp
 .nf
 .nf
-Last updated: 01 April 2017
-Copyright (c) 1997-2017 University of Cambridge.
+Last updated: 27 August 2021
+Copyright (c) 1997-2021 University of Cambridge.
 .fi
 .fi

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 366 - 270
regex.mod/pcre/doc/pcre2.txt


+ 3 - 2
regex.mod/pcre/doc/pcre2_code_free.3

@@ -1,4 +1,4 @@
-.TH PCRE2_CODE_FREE 3 "23 March 2017" "PCRE2 10.30"
+.TH PCRE2_CODE_FREE 3 "28 June 2018" "PCRE2 10.32"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -13,7 +13,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .SH DESCRIPTION
 .SH DESCRIPTION
 .rs
 .rs
 .sp
 .sp
-This function frees the memory used for a compiled pattern, including any
+If \fIcode\fP is NULL, this function does nothing. Otherwise, \fIcode\fP must
+point to a compiled pattern. This function frees its memory, including any
 memory used by the JIT compiler. If the compiled pattern was created by a call
 memory used by the JIT compiler. If the compiled pattern was created by a call
 to \fBpcre2_code_copy_with_tables()\fP, the memory for the character tables is
 to \fBpcre2_code_copy_with_tables()\fP, the memory for the character tables is
 also freed.
 also freed.

+ 10 - 3
regex.mod/pcre/doc/pcre2_compile.3

@@ -1,4 +1,4 @@
-.TH PCRE2_COMPILE 3 "16 June 2017" "PCRE2 10.30"
+.TH PCRE2_COMPILE 3 "23 May 2019" "PCRE2 10.34"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -53,7 +53,8 @@ The option bits are:
   PCRE2_EXTENDED           Ignore white space and # comments
   PCRE2_EXTENDED           Ignore white space and # comments
   PCRE2_FIRSTLINE          Force matching to be before newline
   PCRE2_FIRSTLINE          Force matching to be before newline
   PCRE2_LITERAL            Pattern characters are all literal
   PCRE2_LITERAL            Pattern characters are all literal
-  PCRE2_MATCH_UNSET_BACKREF  Match unset back references
+  PCRE2_MATCH_INVALID_UTF  Enable support for matching invalid UTF
+  PCRE2_MATCH_UNSET_BACKREF  Match unset backreferences
   PCRE2_MULTILINE          ^ and $ match newlines within data
   PCRE2_MULTILINE          ^ and $ match newlines within data
   PCRE2_NEVER_BACKSLASH_C  Lock out the use of \eC in patterns
   PCRE2_NEVER_BACKSLASH_C  Lock out the use of \eC in patterns
   PCRE2_NEVER_UCP          Lock out PCRE2_UCP, e.g. via (*UCP)
   PCRE2_NEVER_UCP          Lock out PCRE2_UCP, e.g. via (*UCP)
@@ -73,7 +74,13 @@ The option bits are:
 PCRE2 must be built with Unicode support (the default) in order to use
 PCRE2 must be built with Unicode support (the default) in order to use
 PCRE2_UTF, PCRE2_UCP and related options.
 PCRE2_UTF, PCRE2_UCP and related options.
 .P
 .P
-The yield of the function is a pointer to a private data structure that
+Additional options may be set in the compile context via the
+.\" HREF
+\fBpcre2_set_compile_extra_options\fP
+.\"
+function.
+.P
+The yield of this function is a pointer to a private data structure that
 contains the compiled pattern, or NULL if an error was detected.
 contains the compiled pattern, or NULL if an error was detected.
 .P
 .P
 There is a complete description of the PCRE2 native API, with more detail on
 There is a complete description of the PCRE2 native API, with more detail on

+ 3 - 2
regex.mod/pcre/doc/pcre2_compile_context_free.3

@@ -1,4 +1,4 @@
-.TH PCRE2_COMPILE_CONTEXT_FREE 3 "22 October 2014" "PCRE2 10.00"
+.TH PCRE2_COMPILE_CONTEXT_FREE 3 "29 June 2018" "PCRE2 10.32"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -15,7 +15,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
 .sp
 This function frees the memory occupied by a compile context, using the memory
 This function frees the memory occupied by a compile context, using the memory
 freeing function from the general context with which it was created, or
 freeing function from the general context with which it was created, or
-\fBfree()\fP if that was not set.
+\fBfree()\fP if that was not set. If the argument is NULL, the function returns
+immediately without doing anything.
 .P
 .P
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the
 .\" HREF
 .\" HREF

+ 3 - 2
regex.mod/pcre/doc/pcre2_convert_context_free.3

@@ -1,4 +1,4 @@
-.TH PCRE2_CONVERT_CONTEXT_FREE 3 "10 July 2017" "PCRE2 10.30"
+.TH PCRE2_CONVERT_CONTEXT_FREE 3 "28 June 2018" "PCRE2 10.32"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -16,7 +16,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 This function is part of an experimental set of pattern conversion functions.
 This function is part of an experimental set of pattern conversion functions.
 It frees the memory occupied by a convert context, using the memory
 It frees the memory occupied by a convert context, using the memory
 freeing function from the general context with which it was created, or
 freeing function from the general context with which it was created, or
-\fBfree()\fP if that was not set.
+\fBfree()\fP if that was not set. If the argument is NULL, the function returns
+immediately without doing anything.
 .P
 .P
 The pattern conversion functions are described in the
 The pattern conversion functions are described in the
 .\" HREF
 .\" HREF

+ 3 - 2
regex.mod/pcre/doc/pcre2_converted_pattern_free.3

@@ -1,4 +1,4 @@
-.TH PCRE2_CONVERTED_PATTERN_FREE 3 "11 July 2017" "PCRE2 10.30"
+.TH PCRE2_CONVERTED_PATTERN_FREE 3 "28 June 2018" "PCRE2 10.32"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -16,7 +16,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 This function is part of an experimental set of pattern conversion functions.
 This function is part of an experimental set of pattern conversion functions.
 It frees the memory occupied by a converted pattern that was obtained by
 It frees the memory occupied by a converted pattern that was obtained by
 calling \fBpcre2_pattern_convert()\fP with arguments that caused it to place
 calling \fBpcre2_pattern_convert()\fP with arguments that caused it to place
-the converted pattern into newly obtained heap memory.
+the converted pattern into newly obtained heap memory. If the argument is NULL,
+the function returns immediately without doing anything.
 .P
 .P
 The pattern conversion functions are described in the
 The pattern conversion functions are described in the
 .\" HREF
 .\" HREF

+ 12 - 5
regex.mod/pcre/doc/pcre2_dfa_match.3

@@ -1,4 +1,4 @@
-.TH PCRE2_DFA_MATCH 3 "30 May 2017" "PCRE2 10.30"
+.TH PCRE2_DFA_MATCH 3 "28 August 2021" "PCRE2 10.38"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -33,12 +33,19 @@ just once (except when processing lookaround assertions). This function is
   \fIworkspace\fP    Points to a vector of ints used as working space
   \fIworkspace\fP    Points to a vector of ints used as working space
   \fIwscount\fP      Number of elements in the vector
   \fIwscount\fP      Number of elements in the vector
 .sp
 .sp
-For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
-up a callout function or specify the match and/or the recursion depth limits.
-The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
-The options are:
+The size of output vector needed to contain all the results depends on the
+number of simultaneous matches, not on the number of parentheses in the
+pattern. Using \fBpcre2_match_data_create_from_pattern()\fP to create the match
+data block is therefore not advisable when using this function.
+.P
+A match context is needed only if you want to set up a callout function or
+specify the heap limit or the match or the recursion depth limits. The
+\fIlength\fP and \fIstartoffset\fP values are code units, not characters. The
+options are:
 .sp
 .sp
   PCRE2_ANCHORED          Match only at the first position
   PCRE2_ANCHORED          Match only at the first position
+  PCRE2_COPY_MATCHED_SUBJECT
+                          On success, make a private subject copy
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_NOTBOL            Subject is not the beginning of a line
   PCRE2_NOTBOL            Subject is not the beginning of a line
   PCRE2_NOTEOL            Subject is not the end of a line
   PCRE2_NOTEOL            Subject is not the end of a line

+ 3 - 2
regex.mod/pcre/doc/pcre2_general_context_free.3

@@ -1,4 +1,4 @@
-.TH PCRE2_GENERAL_CONTEXT_FREE 3 "22 October 2014" "PCRE2 10.00"
+.TH PCRE2_GENERAL_CONTEXT_FREE 3 "28 June 2018" "PCRE2 10.32"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -14,7 +14,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .rs
 .rs
 .sp
 .sp
 This function frees the memory occupied by a general context, using the memory
 This function frees the memory occupied by a general context, using the memory
-freeing function within the context, if set.
+freeing function within the context, if set.  If the argument is NULL, the
+function returns immediately without doing anything.
 .P
 .P
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the
 .\" HREF
 .\" HREF

+ 27 - 0
regex.mod/pcre/doc/pcre2_get_match_data_size.3

@@ -0,0 +1,27 @@
+.TH PCRE2_GET_MATCH_DATA_SIZE 3 "16 July 2019" "PCRE2 10.34"
+.SH NAME
+PCRE2 - Perl-compatible regular expressions (revised API)
+.SH SYNOPSIS
+.rs
+.sp
+.B #include <pcre2.h>
+.PP
+.nf
+.B PCRE2_SIZE pcre2_get_match_data_size(pcre2_match_data *\fImatch_data\fP);
+.fi
+.
+.SH DESCRIPTION
+.rs
+.sp
+This function returns the size, in bytes, of the match data block that is its
+argument.
+.P
+There is a complete description of the PCRE2 native API in the
+.\" HREF
+\fBpcre2api\fP
+.\"
+page and a description of the POSIX API in the
+.\" HREF
+\fBpcre2posix\fP
+.\"
+page.

+ 8 - 2
regex.mod/pcre/doc/pcre2_jit_compile.3

@@ -1,4 +1,4 @@
-.TH PCRE2_JIT_COMPILE 3 "21 October 2014" "PCRE2 10.00"
+.TH PCRE2_JIT_COMPILE 3 "29 July 2019" "PCRE2 10.34"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -30,9 +30,15 @@ bits:
   PCRE2_JIT_PARTIAL_SOFT  compile code for soft partial matching
   PCRE2_JIT_PARTIAL_SOFT  compile code for soft partial matching
   PCRE2_JIT_PARTIAL_HARD  compile code for hard partial matching
   PCRE2_JIT_PARTIAL_HARD  compile code for hard partial matching
 .sp
 .sp
+There is also an obsolete option called PCRE2_JIT_INVALID_UTF, which has been
+superseded by the \fBpcre2_compile()\fP option PCRE2_MATCH_INVALID_UTF. The old
+option is deprecated and may be removed in the future.
+.P
 The yield of the function is 0 for success, or a negative error code otherwise.
 The yield of the function is 0 for success, or a negative error code otherwise.
 In particular, PCRE2_ERROR_JIT_BADOPTION is returned if JIT is not supported or
 In particular, PCRE2_ERROR_JIT_BADOPTION is returned if JIT is not supported or
-if an unknown bit is set in \fIoptions\fP.
+if an unknown bit is set in \fIoptions\fP. The function can also return
+PCRE2_ERROR_NOMEMORY if JIT is unable to allocate executable memory for the
+compiler, even if it was because of a system security restriction.
 .P
 .P
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the
 .\" HREF
 .\" HREF

+ 1 - 1
regex.mod/pcre/doc/pcre2_jit_free_unused_memory.3

@@ -17,7 +17,7 @@ This function frees unused JIT executable memory. The argument is a general
 context, for custom memory management, or NULL for standard memory management.
 context, for custom memory management, or NULL for standard memory management.
 JIT memory allocation retains some memory in order to improve future JIT
 JIT memory allocation retains some memory in order to improve future JIT
 compilation speed. In low memory conditions,
 compilation speed. In low memory conditions,
-\fBpcre2_jit_free_unused_memory()\fB can be used to cause this memory to be
+\fBpcre2_jit_free_unused_memory()\fP can be used to cause this memory to be
 freed.
 freed.
 .P
 .P
 There is a complete description of the PCRE2 native API in the
 There is a complete description of the PCRE2 native API in the

+ 4 - 2
regex.mod/pcre/doc/pcre2_jit_match.3

@@ -1,4 +1,4 @@
-.TH PCRE2_JIT_MATCH 3 "03 November 2014" "PCRE2 10.0"
+.TH PCRE2_JIT_MATCH 3 "11 February 2020" "PCRE2 10.35"
 .SH NAME
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
 .SH SYNOPSIS
@@ -22,8 +22,10 @@ algorithm that is similar to Perl's. It is a "fast path" interface to JIT, and
 it bypasses some of the sanity checks that \fBpcre2_match()\fP applies.
 it bypasses some of the sanity checks that \fBpcre2_match()\fP applies.
 Its arguments are exactly the same as for
 Its arguments are exactly the same as for
 .\" HREF
 .\" HREF
-\fBpcre2_match()\fP.
+\fBpcre2_match()\fP,
 .\"
 .\"
+except that the subject string must be specified with a length;
+PCRE2_ZERO_TERMINATED is not supported.
 .P
 .P
 The supported options are PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
 The supported options are PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
 PCRE2_NOTEMPTY_ATSTART, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Unsupported
 PCRE2_NOTEMPTY_ATSTART, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Unsupported

Một số tệp đã không được hiển thị bởi vì quá nhiều tập tin thay đổi trong này khác