README 44 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950
  1. README file for PCRE2 (Perl-compatible regular expression library)
  2. ------------------------------------------------------------------
  3. PCRE2 is a re-working of the original PCRE1 library to provide an entirely new
  4. API. Since its initial release in 2015, there has been further development of
  5. the code and it now differs from PCRE1 in more than just the API. There are new
  6. features, and the internals have been improved. The original PCRE1 library is
  7. now obsolete and no longer maintained. The latest release of PCRE2 is available
  8. in .tar.gz, tar.bz2, or .zip form from this GitHub repository:
  9. https://github.com/PCRE2Project/pcre2/releases
  10. There is a mailing list for discussion about the development of PCRE2 at
  11. [email protected]. You can subscribe by sending an email to
  12. [email protected].
  13. You can access the archives and also subscribe or manage your subscription
  14. here:
  15. https://groups.google.com/g/pcre2-dev
  16. Please read the NEWS file if you are upgrading from a previous release. The
  17. contents of this README file are:
  18. The PCRE2 APIs
  19. Documentation for PCRE2
  20. Contributions by users of PCRE2
  21. Building PCRE2 on non-Unix-like systems
  22. Building PCRE2 without using autotools
  23. Building PCRE2 using autotools
  24. Retrieving configuration information
  25. Shared libraries
  26. Cross-compiling using autotools
  27. Making new tarballs
  28. Testing PCRE2
  29. Character tables
  30. File manifest
  31. The PCRE2 APIs
  32. --------------
  33. PCRE2 is written in C, and it has its own API. There are three sets of
  34. functions, one for the 8-bit library, which processes strings of bytes, one for
  35. the 16-bit library, which processes strings of 16-bit values, and one for the
  36. 32-bit library, which processes strings of 32-bit values. Unlike PCRE1, there
  37. are no C++ wrappers.
  38. The distribution does contain a set of C wrapper functions for the 8-bit
  39. library that are based on the POSIX regular expression API (see the pcre2posix
  40. man page). These are built into a library called libpcre2-posix. Note that this
  41. just provides a POSIX calling interface to PCRE2; the regular expressions
  42. themselves still follow Perl syntax and semantics. The POSIX API is restricted,
  43. and does not give full access to all of PCRE2's facilities.
  44. The header file for the POSIX-style functions is called pcre2posix.h. The
  45. official POSIX name is regex.h, but I did not want to risk possible problems
  46. with existing files of that name by distributing it that way. To use PCRE2 with
  47. an existing program that uses the POSIX API, pcre2posix.h will have to be
  48. renamed or pointed at by a link (or the program modified, of course). See the
  49. pcre2posix documentation for more details.
  50. Documentation for PCRE2
  51. -----------------------
  52. If you install PCRE2 in the normal way on a Unix-like system, you will end up
  53. with a set of man pages whose names all start with "pcre2". The one that is
  54. just called "pcre2" lists all the others. In addition to these man pages, the
  55. PCRE2 documentation is supplied in two other forms:
  56. 1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
  57. doc/pcre2test.txt in the source distribution. The first of these is a
  58. concatenation of the text forms of all the section 3 man pages except the
  59. listing of pcre2demo.c and those that summarize individual functions. The
  60. other two are the text forms of the section 1 man pages for the pcre2grep
  61. and pcre2test commands. These text forms are provided for ease of scanning
  62. with text editors or similar tools. They are installed in
  63. <prefix>/share/doc/pcre2, where <prefix> is the installation prefix
  64. (defaulting to /usr/local).
  65. 2. A set of files containing all the documentation in HTML form, hyperlinked
  66. in various ways, and rooted in a file called index.html, is distributed in
  67. doc/html and installed in <prefix>/share/doc/pcre2/html.
  68. Building PCRE2 on non-Unix-like systems
  69. ---------------------------------------
  70. For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
  71. your system supports the use of "configure" and "make" you may be able to build
  72. PCRE2 using autotools in the same way as for many Unix-like systems.
  73. PCRE2 can also be configured using CMake, which can be run in various ways
  74. (command line, GUI, etc). This creates Makefiles, solution files, etc. The file
  75. NON-AUTOTOOLS-BUILD has information about CMake.
  76. PCRE2 has been compiled on many different operating systems. It should be
  77. straightforward to build PCRE2 on any system that has a Standard C compiler and
  78. library, because it uses only Standard C functions.
  79. Building PCRE2 without using autotools
  80. --------------------------------------
  81. The use of autotools (in particular, libtool) is problematic in some
  82. environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
  83. file for ways of building PCRE2 without using autotools.
  84. Building PCRE2 using autotools
  85. ------------------------------
  86. The following instructions assume the use of the widely used "configure; make;
  87. make install" (autotools) process.
  88. If you have downloaded and unpacked a PCRE2 release tarball, run the
  89. "configure" command from the PCRE2 directory, with your current directory set
  90. to the directory where you want the files to be created. This command is a
  91. standard GNU "autoconf" configuration script, for which generic instructions
  92. are supplied in the file INSTALL.
  93. The files in the GitHub repository do not contain "configure". If you have
  94. downloaded the PCRE2 source files from GitHub, before you can run "configure"
  95. you must run the shell script called autogen.sh. This runs a number of
  96. autotools to create a "configure" script (you must of course have the autotools
  97. commands installed in order to do this).
  98. Most commonly, people build PCRE2 within its own distribution directory, and in
  99. this case, on many systems, just running "./configure" is sufficient. However,
  100. the usual methods of changing standard defaults are available. For example:
  101. CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
  102. This command specifies that the C compiler should be run with the flags '-O2
  103. -Wall' instead of the default, and that "make install" should install PCRE2
  104. under /opt/local instead of the default /usr/local.
  105. If you want to build in a different directory, just run "configure" with that
  106. directory as current. For example, suppose you have unpacked the PCRE2 source
  107. into /source/pcre2/pcre2-xxx, but you want to build it in
  108. /build/pcre2/pcre2-xxx:
  109. cd /build/pcre2/pcre2-xxx
  110. /source/pcre2/pcre2-xxx/configure
  111. PCRE2 is written in C and is normally compiled as a C library. However, it is
  112. possible to build it as a C++ library, though the provided building apparatus
  113. does not have any features to support this.
  114. There are some optional features that can be included or omitted from the PCRE2
  115. library. They are also documented in the pcre2build man page.
  116. . By default, both shared and static libraries are built. You can change this
  117. by adding one of these options to the "configure" command:
  118. --disable-shared
  119. --disable-static
  120. Setting --disable-shared ensures that PCRE2 libraries are built as static
  121. libraries. The binaries that are then created as part of the build process
  122. (for example, pcre2test and pcre2grep) are linked statically with one or more
  123. PCRE2 libraries, but may also be dynamically linked with other libraries such
  124. as libc. If you want these binaries to be fully statically linked, you can
  125. set LDFLAGS like this:
  126. LDFLAGS=--static ./configure --disable-shared
  127. Note the two hyphens in --static. Of course, this works only if static
  128. versions of all the relevant libraries are available for linking. See also
  129. "Shared libraries" below.
  130. . By default, only the 8-bit library is built. If you add --enable-pcre2-16 to
  131. the "configure" command, the 16-bit library is also built. If you add
  132. --enable-pcre2-32 to the "configure" command, the 32-bit library is also
  133. built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
  134. to disable building the 8-bit library.
  135. . If you want to include support for just-in-time (JIT) compiling, which can
  136. give large performance improvements on certain platforms, add --enable-jit to
  137. the "configure" command. This support is available only for certain hardware
  138. architectures. If you try to enable it on an unsupported architecture, there
  139. will be a compile time error. If in doubt, use --enable-jit=auto, which
  140. enables JIT only if the current hardware is supported.
  141. . If you are enabling JIT under SELinux environment you may also want to add
  142. --enable-jit-sealloc, which enables the use of an executable memory allocator
  143. that is compatible with SELinux. Warning: this allocator is experimental!
  144. It does not support fork() operation and may crash when no disk space is
  145. available. This option has no effect if JIT is disabled.
  146. . If you do not want to make use of the default support for UTF-8 Unicode
  147. character strings in the 8-bit library, UTF-16 Unicode character strings in
  148. the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
  149. library, you can add --disable-unicode to the "configure" command. This
  150. reduces the size of the libraries. It is not possible to configure one
  151. library with Unicode support, and another without, in the same configuration.
  152. It is also not possible to use --enable-ebcdic (see below) with Unicode
  153. support, so if this option is set, you must also use --disable-unicode.
  154. When Unicode support is available, the use of a UTF encoding still has to be
  155. enabled by setting the PCRE2_UTF option at run time or starting a pattern
  156. with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
  157. either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
  158. As well as supporting UTF strings, Unicode support includes support for the
  159. \P, \p, and \X sequences that recognize Unicode character properties.
  160. However, only a subset of Unicode properties are supported; see the
  161. pcre2pattern man page for details. Escape sequences such as \d and \w in
  162. patterns do not by default make use of Unicode properties, but can be made to
  163. do so by setting the PCRE2_UCP option or starting a pattern with (*UCP).
  164. . You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
  165. of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
  166. character as indicating the end of a line. Whatever you specify at build time
  167. is the default; the caller of PCRE2 can change the selection at run time. The
  168. default newline indicator is a single LF character (the Unix standard). You
  169. can specify the default newline indicator by adding --enable-newline-is-cr,
  170. --enable-newline-is-lf, --enable-newline-is-crlf,
  171. --enable-newline-is-anycrlf, --enable-newline-is-any, or
  172. --enable-newline-is-nul to the "configure" command, respectively.
  173. . By default, the sequence \R in a pattern matches any Unicode line ending
  174. sequence. This is independent of the option specifying what PCRE2 considers
  175. to be the end of a line (see above). However, the caller of PCRE2 can
  176. restrict \R to match only CR, LF, or CRLF. You can make this the default by
  177. adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
  178. . In a pattern, the escape sequence \C matches a single code unit, even in a
  179. UTF mode. This can be dangerous because it breaks up multi-code-unit
  180. characters. You can build PCRE2 with the use of \C permanently locked out by
  181. adding --enable-never-backslash-C (note the upper case C) to the "configure"
  182. command. When \C is allowed by the library, individual applications can lock
  183. it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
  184. . PCRE2 has a counter that limits the depth of nesting of parentheses in a
  185. pattern. This limits the amount of system stack that a pattern uses when it
  186. is compiled. The default is 250, but you can change it by setting, for
  187. example,
  188. --with-parens-nest-limit=500
  189. . PCRE2 has a counter that can be set to limit the amount of computing resource
  190. it uses when matching a pattern. If the limit is exceeded during a match, the
  191. match fails. The default is ten million. You can change the default by
  192. setting, for example,
  193. --with-match-limit=500000
  194. on the "configure" command. This is just the default; individual calls to
  195. pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
  196. discussion in the pcre2api man page (search for pcre2_set_match_limit).
  197. . There is a separate counter that limits the depth of nested backtracking
  198. (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
  199. matching process, which indirectly limits the amount of heap memory that is
  200. used, and in the case of pcre2_dfa_match() the amount of stack as well. This
  201. counter also has a default of ten million, which is essentially "unlimited".
  202. You can change the default by setting, for example,
  203. --with-match-limit-depth=5000
  204. There is more discussion in the pcre2api man page (search for
  205. pcre2_set_depth_limit).
  206. . You can also set an explicit limit on the amount of heap memory used by
  207. the pcre2_match() and pcre2_dfa_match() interpreters:
  208. --with-heap-limit=500
  209. The units are kibibytes (units of 1024 bytes). This limit does not apply when
  210. the JIT optimization (which has its own memory control features) is used.
  211. There is more discussion on the pcre2api man page (search for
  212. pcre2_set_heap_limit).
  213. . In the 8-bit library, the default maximum compiled pattern size is around
  214. 64 kibibytes. You can increase this by adding --with-link-size=3 to the
  215. "configure" command. PCRE2 then uses three bytes instead of two for offsets
  216. to different parts of the compiled pattern. In the 16-bit library,
  217. --with-link-size=3 is the same as --with-link-size=4, which (in both
  218. libraries) uses four-byte offsets. Increasing the internal link size reduces
  219. performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
  220. link size setting is ignored, as 4-byte offsets are always used.
  221. . Lookbehind assertions in which one or more branches can match a variable
  222. number of characters are supported only if there is a maximum matching length
  223. for each top-level branch. There is a limit to this maximum that defaults to
  224. 255 characters. You can alter this default by a setting such as
  225. --with-max-varlookbehind=100
  226. The limit can be changed at runtime by calling pcre2_set_max_varlookbehind().
  227. Lookbehind assertions in which every branch matches a fixed number of
  228. characters (not necessarily all the same) are not constrained by this limit.
  229. . For speed, PCRE2 uses four tables for manipulating and identifying characters
  230. whose code point values are less than 256. By default, it uses a set of
  231. tables for ASCII encoding that is part of the distribution. If you specify
  232. --enable-rebuild-chartables
  233. a program called pcre2_dftables is compiled and run in the default C locale
  234. when you obey "make". It builds a source file called pcre2_chartables.c. If
  235. you do not specify this option, pcre2_chartables.c is created as a copy of
  236. pcre2_chartables.c.dist. See "Character tables" below for further
  237. information.
  238. . It is possible to compile PCRE2 for use on systems that use EBCDIC as their
  239. character code (as opposed to ASCII/Unicode) by specifying
  240. --enable-ebcdic --disable-unicode
  241. This automatically implies --enable-rebuild-chartables (see above). However,
  242. when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
  243. both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
  244. which specifies that the code value for the EBCDIC NL character is 0x25
  245. instead of the default 0x15.
  246. . If you specify --enable-debug, additional debugging code is included in the
  247. build. This option is intended for use by the PCRE2 maintainers.
  248. . In environments where valgrind is installed, if you specify
  249. --enable-valgrind
  250. PCRE2 will use valgrind annotations to mark certain memory regions as
  251. unaddressable. This allows it to detect invalid memory accesses, and is
  252. mostly useful for debugging PCRE2 itself.
  253. . In environments where the gcc compiler is used and lcov is installed, if you
  254. specify
  255. --enable-coverage
  256. the build process implements a code coverage report for the test suite. The
  257. report is generated by running "make coverage". If ccache is installed on
  258. your system, it must be disabled when building PCRE2 for coverage reporting.
  259. You can do this by setting the environment variable CCACHE_DISABLE=1 before
  260. running "make" to build PCRE2. There is more information about coverage
  261. reporting in the "pcre2build" documentation.
  262. . When JIT support is enabled, pcre2grep automatically makes use of it, unless
  263. you add --disable-pcre2grep-jit to the "configure" command.
  264. . There is support for calling external programs during matching in the
  265. pcre2grep command, using PCRE2's callout facility with string arguments. This
  266. support can be disabled by adding --disable-pcre2grep-callout to the
  267. "configure" command. There are two kinds of callout: one that generates
  268. output from inbuilt code, and another that calls an external program. The
  269. latter has special support for Windows and VMS; otherwise it assumes the
  270. existence of the fork() function. This facility can be disabled by adding
  271. --disable-pcre2grep-callout-fork to the "configure" command.
  272. . The pcre2grep program currently supports only 8-bit data files, and so
  273. requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
  274. libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
  275. specifying one or both of
  276. --enable-pcre2grep-libz
  277. --enable-pcre2grep-libbz2
  278. Of course, the relevant libraries must be installed on your system.
  279. . The default starting size (in bytes) of the internal buffer used by pcre2grep
  280. can be set by, for example:
  281. --with-pcre2grep-bufsize=51200
  282. The value must be a plain integer. The default is 20480. The amount of memory
  283. used by pcre2grep is actually three times this number, to allow for "before"
  284. and "after" lines. If very long lines are encountered, the buffer is
  285. automatically enlarged, up to a fixed maximum size.
  286. . The default maximum size of pcre2grep's internal buffer can be set by, for
  287. example:
  288. --with-pcre2grep-max-bufsize=2097152
  289. The default is either 1048576 or the value of --with-pcre2grep-bufsize,
  290. whichever is the larger.
  291. . It is possible to compile pcre2test so that it links with the libreadline
  292. or libedit libraries, by specifying, respectively,
  293. --enable-pcre2test-libreadline or --enable-pcre2test-libedit
  294. If this is done, when pcre2test's input is from a terminal, it reads it using
  295. the readline() function. This provides line-editing and history facilities.
  296. Note that libreadline is GPL-licenced, so if you distribute a binary of
  297. pcre2test linked in this way, there may be licensing issues. These can be
  298. avoided by linking with libedit (which has a BSD licence) instead.
  299. Enabling libreadline causes the -lreadline option to be added to the
  300. pcre2test build. In many operating environments with a system-installed
  301. readline library this is sufficient. However, in some environments (e.g. if
  302. an unmodified distribution version of readline is in use), it may be
  303. necessary to specify something like LIBS="-lncurses" as well. This is
  304. because, to quote the readline INSTALL, "Readline uses the termcap functions,
  305. but does not link with the termcap or curses library itself, allowing
  306. applications which link with readline the option to choose an appropriate
  307. library." If you get error messages about missing functions tgetstr, tgetent,
  308. tputs, tgetflag, or tgoto, this is the problem, and linking with the ncurses
  309. library should fix it.
  310. . The C99 standard defines formatting modifiers z and t for size_t and
  311. ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in
  312. environments other than Microsoft Visual Studio versions earlier than 2013
  313. when __STDC_VERSION__ is defined and has a value greater than or equal to
  314. 199901L (indicating C99). However, there is at least one environment that
  315. claims to be C99 but does not support these modifiers. If
  316. --disable-percent-zt is specified, no use is made of the z or t modifiers.
  317. Instead of %td or %zu, %lu is used, with a cast for size_t values.
  318. . There is a special option called --enable-fuzz-support for use by people who
  319. want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
  320. library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
  321. be built, but not installed. This contains a single function called
  322. LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
  323. length of the string. When called, this function tries to compile the string
  324. as a pattern, and if that succeeds, to match it. This is done both with no
  325. options and with some random options bits that are generated from the string.
  326. Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
  327. be created. This is normally run under valgrind or used when PCRE2 is
  328. compiled with address sanitizing enabled. It calls the fuzzing function and
  329. outputs information about what it is doing. The input strings are specified
  330. by arguments: if an argument starts with "=" the rest of it is a literal
  331. input string. Otherwise, it is assumed to be a file name, and the contents
  332. of the file are the test string.
  333. . Releases before 10.30 could be compiled with --disable-stack-for-recursion,
  334. which caused pcre2_match() to use individual blocks on the heap for
  335. backtracking instead of recursive function calls (which use the stack). This
  336. is now obsolete because pcre2_match() was refactored always to use the heap
  337. (in a much more efficient way than before). This option is retained for
  338. backwards compatibility, but has no effect other than to output a warning.
  339. The "configure" script builds the following files for the basic C library:
  340. . Makefile the makefile that builds the library
  341. . src/config.h build-time configuration options for the library
  342. . src/pcre2.h the public PCRE2 header file
  343. . pcre2-config script that shows the building settings such as CFLAGS
  344. that were set for "configure"
  345. . libpcre2-8.pc )
  346. . libpcre2-16.pc ) data for the pkg-config command
  347. . libpcre2-32.pc )
  348. . libpcre2-posix.pc )
  349. . libtool script that builds shared and/or static libraries
  350. Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
  351. tarballs under the names config.h.generic and pcre2.h.generic. These are
  352. provided for those who have to build PCRE2 without using "configure" or CMake.
  353. If you use "configure" or CMake, the .generic versions are not used.
  354. The "configure" script also creates config.status, which is an executable
  355. script that can be run to recreate the configuration, and config.log, which
  356. contains compiler output from tests that "configure" runs.
  357. Once "configure" has run, you can run "make". This builds whichever of the
  358. libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
  359. program called pcre2test. If you enabled JIT support with --enable-jit, another
  360. test program called pcre2_jit_test is built as well. If the 8-bit library is
  361. built, libpcre2-posix, pcre2posix_test, and the pcre2grep command are also
  362. built. Running "make" with the -j option may speed up compilation on
  363. multiprocessor systems.
  364. The command "make check" runs all the appropriate tests. Details of the PCRE2
  365. tests are given below in a separate section of this document. The -j option of
  366. "make" can also be used when running the tests.
  367. You can use "make install" to install PCRE2 into live directories on your
  368. system. The following are installed (file names are all relative to the
  369. <prefix> that is set when "configure" is run):
  370. Commands (bin):
  371. pcre2test
  372. pcre2grep (if 8-bit support is enabled)
  373. pcre2-config
  374. Libraries (lib):
  375. libpcre2-8 (if 8-bit support is enabled)
  376. libpcre2-16 (if 16-bit support is enabled)
  377. libpcre2-32 (if 32-bit support is enabled)
  378. libpcre2-posix (if 8-bit support is enabled)
  379. Configuration information (lib/pkgconfig):
  380. libpcre2-8.pc
  381. libpcre2-16.pc
  382. libpcre2-32.pc
  383. libpcre2-posix.pc
  384. Header files (include):
  385. pcre2.h
  386. pcre2posix.h
  387. Man pages (share/man/man{1,3}):
  388. pcre2grep.1
  389. pcre2test.1
  390. pcre2-config.1
  391. pcre2.3
  392. pcre2*.3 (lots more pages, all starting "pcre2")
  393. HTML documentation (share/doc/pcre2/html):
  394. index.html
  395. *.html (lots more pages, hyperlinked from index.html)
  396. Text file documentation (share/doc/pcre2):
  397. AUTHORS
  398. COPYING
  399. ChangeLog
  400. LICENCE
  401. NEWS
  402. README
  403. pcre2.txt (a concatenation of the man(3) pages)
  404. pcre2test.txt the pcre2test man page
  405. pcre2grep.txt the pcre2grep man page
  406. pcre2-config.txt the pcre2-config man page
  407. If you want to remove PCRE2 from your system, you can run "make uninstall".
  408. This removes all the files that "make install" installed. However, it does not
  409. remove any directories, because these are often shared with other programs.
  410. Retrieving configuration information
  411. ------------------------------------
  412. Running "make install" installs the command pcre2-config, which can be used to
  413. recall information about the PCRE2 configuration and installation. For example:
  414. pcre2-config --version
  415. prints the version number, and
  416. pcre2-config --libs8
  417. outputs information about where the 8-bit library is installed. This command
  418. can be included in makefiles for programs that use PCRE2, saving the programmer
  419. from having to remember too many details. Run pcre2-config with no arguments to
  420. obtain a list of possible arguments.
  421. The pkg-config command is another system for saving and retrieving information
  422. about installed libraries. Instead of separate commands for each library, a
  423. single command is used. For example:
  424. pkg-config --libs libpcre2-16
  425. The data is held in *.pc files that are installed in a directory called
  426. <prefix>/lib/pkgconfig.
  427. Shared libraries
  428. ----------------
  429. The default distribution builds PCRE2 as shared libraries and static libraries,
  430. as long as the operating system supports shared libraries. Shared library
  431. support relies on the "libtool" script which is built as part of the
  432. "configure" process.
  433. The libtool script is used to compile and link both shared and static
  434. libraries. They are placed in a subdirectory called .libs when they are newly
  435. built. The programs pcre2test and pcre2grep are built to use these uninstalled
  436. libraries (by means of wrapper scripts in the case of shared libraries). When
  437. you use "make install" to install shared libraries, pcre2grep and pcre2test are
  438. automatically re-built to use the newly installed shared libraries before being
  439. installed themselves. However, the versions left in the build directory still
  440. use the uninstalled libraries.
  441. To build PCRE2 using static libraries only you must use --disable-shared when
  442. configuring it. For example:
  443. ./configure --prefix=/usr/gnu --disable-shared
  444. Then run "make" in the usual way. Similarly, you can use --disable-static to
  445. build only shared libraries. Note, however, that when you build only static
  446. libraries, binary programs such as pcre2test and pcre2grep may still be
  447. dynamically linked with other libraries (for example, libc) unless you set
  448. LDFLAGS to --static when running "configure".
  449. Cross-compiling using autotools
  450. -------------------------------
  451. You can specify CC and CFLAGS in the normal way to the "configure" command, in
  452. order to cross-compile PCRE2 for some other host. However, you should NOT
  453. specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c
  454. source file is compiled and run on the local host, in order to generate the
  455. inbuilt character tables (the pcre2_chartables.c file). This will probably not
  456. work, because pcre2_dftables.c needs to be compiled with the local compiler,
  457. not the cross compiler.
  458. When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
  459. created by making a copy of pcre2_chartables.c.dist, which is a default set of
  460. tables that assumes ASCII code. Cross-compiling with the default tables should
  461. not be a problem.
  462. If you need to modify the character tables when cross-compiling, you should
  463. move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
  464. hand and run it on the local host to make a new version of
  465. pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
  466. at build time" for more details.
  467. Making new tarballs
  468. -------------------
  469. The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
  470. zip formats. The command "make distcheck" does the same, but then does a trial
  471. build of the new distribution to ensure that it works.
  472. If you have modified any of the man page sources in the doc directory, you
  473. should first run the PrepareRelease script before making a distribution. This
  474. script creates the .txt and HTML forms of the documentation from the man pages.
  475. Testing PCRE2
  476. -------------
  477. To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
  478. There is another script called RunGrepTest that tests the pcre2grep command.
  479. When the 8-bit library is built, a test program for the POSIX wrapper, called
  480. pcre2posix_test, is compiled, and when JIT support is enabled, a test program
  481. called pcre2_jit_test is built. The scripts and the program tests are all run
  482. when you obey "make check". For other environments, see the instructions in
  483. NON-AUTOTOOLS-BUILD.
  484. The RunTest script runs the pcre2test test program (which is documented in its
  485. own man page) on each of the relevant testinput files in the testdata
  486. directory, and compares the output with the contents of the corresponding
  487. testoutput files. RunTest uses a file called testtry to hold the main output
  488. from pcre2test. Other files whose names begin with "test" are used as working
  489. files in some tests.
  490. Some tests are relevant only when certain build-time options were selected. For
  491. example, the tests for UTF-8/16/32 features are run only when Unicode support
  492. is available. RunTest outputs a comment when it skips a test.
  493. Many (but not all) of the tests that are not skipped are run twice if JIT
  494. support is available. On the second run, JIT compilation is forced. This
  495. testing can be suppressed by putting "-nojit" on the RunTest command line.
  496. The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
  497. libraries that are enabled. If you want to run just one set of tests, call
  498. RunTest with either the -8, -16 or -32 option.
  499. If valgrind is installed, you can run the tests under it by putting "-valgrind"
  500. on the RunTest command line. To run pcre2test on just one or more specific test
  501. files, give their numbers as arguments to RunTest, for example:
  502. RunTest 2 7 11
  503. You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
  504. end), or a number preceded by ~ to exclude a test. For example:
  505. Runtest 3-15 ~10
  506. This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
  507. except test 13. Whatever order the arguments are in, the tests are always run
  508. in numerical order.
  509. You can also call RunTest with the single argument "list" to cause it to output
  510. a list of tests.
  511. The test sequence starts with "test 0", which is a special test that has no
  512. input file, and whose output is not checked. This is because it will be
  513. different on different hardware and with different configurations. The test
  514. exists in order to exercise some of pcre2test's code that would not otherwise
  515. be run.
  516. Tests 1 and 2 can always be run, as they expect only plain text strings (not
  517. UTF) and make no use of Unicode properties. The first test file can be fed
  518. directly into the perltest.sh script to check that Perl gives the same results.
  519. The only difference you should see is in the first few lines, where the Perl
  520. version is given instead of the PCRE2 version. The second set of tests check
  521. auxiliary functions, error detection, and run-time flags that are specific to
  522. PCRE2. It also uses the debugging flags to check some of the internals of
  523. pcre2_compile().
  524. If you build PCRE2 with a locale setting that is not the standard C locale, the
  525. character tables may be different (see next paragraph). In some cases, this may
  526. cause failures in the second set of tests. For example, in a locale where the
  527. isprint() function yields TRUE for characters in the range 128-255, the use of
  528. [:isascii:] inside a character class defines a different set of characters, and
  529. this shows up in this test as a difference in the compiled code, which is being
  530. listed for checking. For example, where the comparison test output contains
  531. [\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
  532. cases. This is not a bug in PCRE2.
  533. Test 3 checks pcre2_maketables(), the facility for building a set of character
  534. tables for a specific locale and using them instead of the default tables. The
  535. script uses the "locale" command to check for the availability of the "fr_FR",
  536. "french", or "fr" locale, and uses the first one that it finds. If the "locale"
  537. command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
  538. the list of available locales, the third test cannot be run, and a comment is
  539. output to say why. If running this test produces an error like this:
  540. ** Failed to set locale "fr_FR"
  541. it means that the given locale is not available on your system, despite being
  542. listed by "locale". This does not mean that PCRE2 is broken. There are three
  543. alternative output files for the third test, because three different versions
  544. of the French locale have been encountered. The test passes if its output
  545. matches any one of them.
  546. Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
  547. with the perltest.sh script, and test 5 checking PCRE2-specific things.
  548. Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
  549. non-UTF mode and UTF-mode with Unicode property support, respectively.
  550. Test 8 checks some internal offsets and code size features, but it is run only
  551. when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
  552. 32-bit modes and for different link sizes, so there are different output files
  553. for each mode and link size.
  554. Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
  555. 16-bit and 32-bit modes. These are tests that generate different output in
  556. 8-bit mode. Each pair are for general cases and Unicode support, respectively.
  557. Test 13 checks the handling of non-UTF characters greater than 255 by
  558. pcre2_dfa_match() in 16-bit and 32-bit modes.
  559. Test 14 contains some special UTF and UCP tests that give different output for
  560. different code unit widths.
  561. Test 15 contains a number of tests that must not be run with JIT. They check,
  562. among other non-JIT things, the match-limiting features of the interpretive
  563. matcher.
  564. Test 16 is run only when JIT support is not available. It checks that an
  565. attempt to use JIT has the expected behaviour.
  566. Test 17 is run only when JIT support is available. It checks JIT complete and
  567. partial modes, match-limiting under JIT, and other JIT-specific features.
  568. Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
  569. the 8-bit library, without and with Unicode support, respectively.
  570. Test 20 checks the serialization functions by writing a set of compiled
  571. patterns to a file, and then reloading and checking them.
  572. Tests 21 and 22 test \C support when the use of \C is not locked out, without
  573. and with UTF support, respectively. Test 23 tests \C when it is locked out.
  574. Tests 24 and 25 test the experimental pattern conversion functions, without and
  575. with UTF support, respectively.
  576. Test 26 checks Unicode property support using tests that are generated
  577. automatically from the Unicode data tables.
  578. Character tables
  579. ----------------
  580. For speed, PCRE2 uses four tables for manipulating and identifying characters
  581. whose code point values are less than 256. By default, a set of tables that is
  582. built into the library is used. The pcre2_maketables() function can be called
  583. by an application to create a new set of tables in the current locale. This are
  584. passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
  585. compile context.
  586. The source file called pcre2_chartables.c contains the default set of tables.
  587. By default, this is created as a copy of pcre2_chartables.c.dist, which
  588. contains tables for ASCII coding. However, if --enable-rebuild-chartables is
  589. specified for ./configure, a new version of pcre2_chartables.c is built by the
  590. program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C
  591. character handling functions such as isalnum(), isalpha(), isupper(),
  592. islower(), etc. to build the table sources. This means that the default C
  593. locale that is set for your system will control the contents of these default
  594. tables. You can change the default tables by editing pcre2_chartables.c and
  595. then re-building PCRE2. If you do this, you should take care to ensure that the
  596. file does not get automatically re-generated. The best way to do this is to
  597. move pcre2_chartables.c.dist out of the way and replace it with your customized
  598. tables.
  599. When the pcre2_dftables program is run as a result of specifying
  600. --enable-rebuild-chartables, it uses the default C locale that is set on your
  601. system. It does not pay attention to the LC_xxx environment variables. In other
  602. words, it uses the system's default locale rather than whatever the compiling
  603. user happens to have set. If you really do want to build a source set of
  604. character tables in a locale that is specified by the LC_xxx variables, you can
  605. run the pcre2_dftables program by hand with the -L option. For example:
  606. ./pcre2_dftables -L pcre2_chartables.c.special
  607. The second argument names the file where the source code for the tables is
  608. written. The first two 256-byte tables provide lower casing and case flipping
  609. functions, respectively. The next table consists of a number of 32-byte bit
  610. maps which identify certain character classes such as digits, "word"
  611. characters, white space, etc. These are used when building 32-byte bit maps
  612. that represent character classes for code points less than 256. The final
  613. 256-byte table has bits indicating various character types, as follows:
  614. 1 white space character
  615. 2 letter
  616. 4 lower case letter
  617. 8 decimal digit
  618. 16 alphanumeric or '_'
  619. You can also specify -b (with or without -L) when running pcre2_dftables. This
  620. causes the tables to be written in binary instead of as source code. A set of
  621. binary tables can be loaded into memory by an application and passed to
  622. pcre2_compile() in the same way as tables created dynamically by calling
  623. pcre2_maketables(). The tables are just a string of bytes, independent of
  624. hardware characteristics such as endianness. This means they can be bundled
  625. with an application that runs in different environments, to ensure consistent
  626. behaviour.
  627. See also the pcre2build section "Creating character tables at build time".
  628. File manifest
  629. -------------
  630. The distribution should contain the files listed below.
  631. (A) Source files for the PCRE2 library functions and their headers are found in
  632. the src directory:
  633. src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c
  634. when --enable-rebuild-chartables is specified
  635. src/pcre2_chartables.c.dist a default set of character tables that assume
  636. ASCII coding; unless --enable-rebuild-chartables is
  637. specified, used by copying to pcre2_chartables.c
  638. src/pcre2posix.c )
  639. src/pcre2_auto_possess.c )
  640. src/pcre2_chkdint.c )
  641. src/pcre2_compile.c )
  642. src/pcre2_config.c )
  643. src/pcre2_context.c )
  644. src/pcre2_convert.c )
  645. src/pcre2_dfa_match.c )
  646. src/pcre2_error.c )
  647. src/pcre2_extuni.c )
  648. src/pcre2_find_bracket.c )
  649. src/pcre2_jit_compile.c )
  650. src/pcre2_jit_match.c ) sources for the functions in the library,
  651. src/pcre2_jit_misc.c ) and some internal functions that they use
  652. src/pcre2_maketables.c )
  653. src/pcre2_match.c )
  654. src/pcre2_match_data.c )
  655. src/pcre2_newline.c )
  656. src/pcre2_ord2utf.c )
  657. src/pcre2_pattern_info.c )
  658. src/pcre2_script_run.c )
  659. src/pcre2_serialize.c )
  660. src/pcre2_string_utils.c )
  661. src/pcre2_study.c )
  662. src/pcre2_substitute.c )
  663. src/pcre2_substring.c )
  664. src/pcre2_tables.c )
  665. src/pcre2_ucd.c )
  666. src/pcre2_ucptables.c )
  667. src/pcre2_valid_utf.c )
  668. src/pcre2_xclass.c )
  669. src/pcre2_printint.c debugging function that is used by pcre2test,
  670. src/pcre2_fuzzsupport.c function for (optional) fuzzing support
  671. src/config.h.in template for config.h, when built by "configure"
  672. src/pcre2.h.in template for pcre2.h when built by "configure"
  673. src/pcre2posix.h header for the external POSIX wrapper API
  674. src/pcre2_internal.h header for internal use
  675. src/pcre2_intmodedep.h a mode-specific internal header
  676. src/pcre2_jit_neon_inc.h header used by JIT
  677. src/pcre2_jit_simd_inc.h header used by JIT
  678. src/pcre2_ucp.h header for Unicode property handling
  679. sljit/* source files for the JIT compiler
  680. (B) Source files for programs that use PCRE2:
  681. src/pcre2demo.c simple demonstration of coding calls to PCRE2
  682. src/pcre2grep.c source of a grep utility that uses PCRE2
  683. src/pcre2test.c comprehensive test program
  684. src/pcre2_jit_test.c JIT test program
  685. src/pcre2posix_test.c POSIX wrapper API test program
  686. (C) Auxiliary files:
  687. 132html script to turn "man" pages into HTML
  688. AUTHORS information about the author of PCRE2
  689. ChangeLog log of changes to the code
  690. CleanTxt script to clean nroff output for txt man pages
  691. Detrail script to remove trailing spaces
  692. HACKING some notes about the internals of PCRE2
  693. INSTALL generic installation instructions
  694. LICENCE conditions for the use of PCRE2
  695. COPYING the same, using GNU's standard name
  696. Makefile.in ) template for Unix Makefile, which is built by
  697. ) "configure"
  698. Makefile.am ) the automake input that was used to create
  699. ) Makefile.in
  700. NEWS important changes in this release
  701. NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools
  702. PrepareRelease script to make preparations for "make dist"
  703. README this file
  704. RunTest a Unix shell script for running tests
  705. RunGrepTest a Unix shell script for pcre2grep tests
  706. aclocal.m4 m4 macros (generated by "aclocal")
  707. config.guess ) files used by libtool,
  708. config.sub ) used only when building a shared library
  709. configure a configuring shell script (built by autoconf)
  710. configure.ac ) the autoconf input that was used to build
  711. ) "configure" and config.h
  712. depcomp ) script to find program dependencies, generated by
  713. ) automake
  714. doc/*.3 man page sources for PCRE2
  715. doc/*.1 man page sources for pcre2grep and pcre2test
  716. doc/index.html.src the base HTML page
  717. doc/html/* HTML documentation
  718. doc/pcre2.txt plain text version of the man pages
  719. doc/pcre2test.txt plain text documentation of test program
  720. install-sh a shell script for installing files
  721. libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config
  722. libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config
  723. libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config
  724. libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config
  725. ltmain.sh file used to build a libtool script
  726. missing ) common stub for a few missing GNU programs while
  727. ) installing, generated by automake
  728. mkinstalldirs script for making install directories
  729. perltest.sh Script for running a Perl test program
  730. pcre2-config.in source of script which retains PCRE2 information
  731. testdata/testinput* test data for main library tests
  732. testdata/testoutput* expected test results
  733. testdata/grep* input and output for pcre2grep tests
  734. testdata/* other supporting test files
  735. (D) Auxiliary files for cmake support
  736. cmake/COPYING-CMAKE-SCRIPTS
  737. cmake/FindPackageHandleStandardArgs.cmake
  738. cmake/FindEditline.cmake
  739. cmake/FindReadline.cmake
  740. CMakeLists.txt
  741. config-cmake.h.in
  742. (E) Auxiliary files for building PCRE2 "by hand"
  743. src/pcre2.h.generic ) a version of the public PCRE2 header file
  744. ) for use in non-"configure" environments
  745. src/config.h.generic ) a version of config.h for use in non-"configure"
  746. ) environments
  747. Philip Hazel
  748. Email local part: Philip.Hazel
  749. Email domain: gmail.com
  750. Last updated: 24 November 2023