|
@@ -17,17 +17,18 @@ please consult the man page, in case the conversion went wrong.
|
|
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
|
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
|
<li><a name="TOC3" href="#SEC3">SUPPORT FOR COMPRESSED FILES</a>
|
|
<li><a name="TOC3" href="#SEC3">SUPPORT FOR COMPRESSED FILES</a>
|
|
<li><a name="TOC4" href="#SEC4">BINARY FILES</a>
|
|
<li><a name="TOC4" href="#SEC4">BINARY FILES</a>
|
|
-<li><a name="TOC5" href="#SEC5">OPTIONS</a>
|
|
|
|
-<li><a name="TOC6" href="#SEC6">ENVIRONMENT VARIABLES</a>
|
|
|
|
-<li><a name="TOC7" href="#SEC7">NEWLINES</a>
|
|
|
|
-<li><a name="TOC8" href="#SEC8">OPTIONS COMPATIBILITY</a>
|
|
|
|
-<li><a name="TOC9" href="#SEC9">OPTIONS WITH DATA</a>
|
|
|
|
-<li><a name="TOC10" href="#SEC10">USING PCRE2'S CALLOUT FACILITY</a>
|
|
|
|
-<li><a name="TOC11" href="#SEC11">MATCHING ERRORS</a>
|
|
|
|
-<li><a name="TOC12" href="#SEC12">DIAGNOSTICS</a>
|
|
|
|
-<li><a name="TOC13" href="#SEC13">SEE ALSO</a>
|
|
|
|
-<li><a name="TOC14" href="#SEC14">AUTHOR</a>
|
|
|
|
-<li><a name="TOC15" href="#SEC15">REVISION</a>
|
|
|
|
|
|
+<li><a name="TOC5" href="#SEC5">BINARY ZEROS IN PATTERNS</a>
|
|
|
|
+<li><a name="TOC6" href="#SEC6">OPTIONS</a>
|
|
|
|
+<li><a name="TOC7" href="#SEC7">ENVIRONMENT VARIABLES</a>
|
|
|
|
+<li><a name="TOC8" href="#SEC8">NEWLINES</a>
|
|
|
|
+<li><a name="TOC9" href="#SEC9">OPTIONS COMPATIBILITY</a>
|
|
|
|
+<li><a name="TOC10" href="#SEC10">OPTIONS WITH DATA</a>
|
|
|
|
+<li><a name="TOC11" href="#SEC11">USING PCRE2'S CALLOUT FACILITY</a>
|
|
|
|
+<li><a name="TOC12" href="#SEC12">MATCHING ERRORS</a>
|
|
|
|
+<li><a name="TOC13" href="#SEC13">DIAGNOSTICS</a>
|
|
|
|
+<li><a name="TOC14" href="#SEC14">SEE ALSO</a>
|
|
|
|
+<li><a name="TOC15" href="#SEC15">AUTHOR</a>
|
|
|
|
+<li><a name="TOC16" href="#SEC16">REVISION</a>
|
|
</ul>
|
|
</ul>
|
|
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
|
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
|
<P>
|
|
<P>
|
|
@@ -85,9 +86,10 @@ controlled by parameters that can be set by the <b>--buffer-size</b> and
|
|
that is obtained at the start of processing. If an input file contains very
|
|
that is obtained at the start of processing. If an input file contains very
|
|
long lines, a larger buffer may be needed; this is handled by automatically
|
|
long lines, a larger buffer may be needed; this is handled by automatically
|
|
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
|
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
|
-default values for these parameters are specified when <b>pcre2grep</b> is
|
|
|
|
-built, with the default defaults being 20K and 1M respectively. An error occurs
|
|
|
|
-if a line is too long and the buffer can no longer be expanded.
|
|
|
|
|
|
+default values for these parameters can be set when <b>pcre2grep</b> is
|
|
|
|
+built; if nothing is specified, the defaults are set to 20KiB and 1MiB
|
|
|
|
+respectively. An error occurs if a line is too long and the buffer can no
|
|
|
|
+longer be expanded.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
The block of memory that is actually used is three times the "buffer size", to
|
|
The block of memory that is actually used is three times the "buffer size", to
|
|
@@ -95,7 +97,7 @@ allow for buffering "before" and "after" lines. If the buffer size is too
|
|
small, fewer than requested "before" and "after" lines may be output.
|
|
small, fewer than requested "before" and "after" lines may be output.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
-Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
|
|
|
|
|
+Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
|
|
BUFSIZ is defined in <b><stdio.h></b>. When there is more than one pattern
|
|
BUFSIZ is defined in <b><stdio.h></b>. When there is more than one pattern
|
|
(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
|
|
(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
|
|
each line in the order in which they are defined, except that all the <b>-e</b>
|
|
each line in the order in which they are defined, except that all the <b>-e</b>
|
|
@@ -109,8 +111,8 @@ matching substrings, or if <b>--only-matching</b>, <b>--file-offsets</b>, or
|
|
(either shown literally, or as an offset), scanning resumes immediately
|
|
(either shown literally, or as an offset), scanning resumes immediately
|
|
following the match, so that further matches on the same line can be found. If
|
|
following the match, so that further matches on the same line can be found. If
|
|
there are multiple patterns, they are all tried on the remainder of the line,
|
|
there are multiple patterns, they are all tried on the remainder of the line,
|
|
-but patterns that follow the one that matched are not tried on the earlier part
|
|
|
|
-of the line.
|
|
|
|
|
|
+but patterns that follow the one that matched are not tried on the earlier
|
|
|
|
+matched part of the line.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
This behaviour means that the order in which multiple patterns are specified
|
|
This behaviour means that the order in which multiple patterns are specified
|
|
@@ -144,13 +146,18 @@ ignored.
|
|
<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
|
|
<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
|
|
<P>
|
|
<P>
|
|
By default, a file that contains a binary zero byte within the first 1024 bytes
|
|
By default, a file that contains a binary zero byte within the first 1024 bytes
|
|
-is identified as a binary file, and is processed specially. (GNU grep
|
|
|
|
-identifies binary files in this manner.) However, if the newline type is
|
|
|
|
-specified as "nul", that is, the line terminator is a binary zero, the test for
|
|
|
|
-a binary file is not applied. See the <b>--binary-files</b> option for a means
|
|
|
|
-of changing the way binary files are handled.
|
|
|
|
|
|
+is identified as a binary file, and is processed specially. However, if the
|
|
|
|
+newline type is specified as NUL, that is, the line terminator is a binary
|
|
|
|
+zero, the test for a binary file is not applied. See the <b>--binary-files</b>
|
|
|
|
+option for a means of changing the way binary files are handled.
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
|
|
|
|
|
|
+<br><a name="SEC5" href="#TOC1">BINARY ZEROS IN PATTERNS</a><br>
|
|
|
|
+<P>
|
|
|
|
+Patterns passed from the command line are strings that are terminated by a
|
|
|
|
+binary zero, so cannot contain internal zeros. However, patterns that are read
|
|
|
|
+from a file via the <b>-f</b> option may contain binary zeros.
|
|
|
|
+</P>
|
|
|
|
+<br><a name="SEC6" href="#TOC1">OPTIONS</a><br>
|
|
<P>
|
|
<P>
|
|
The order in which some of the options appear can affect the output. For
|
|
The order in which some of the options appear can affect the output. For
|
|
example, both the <b>-H</b> and <b>-l</b> options affect the printing of file
|
|
example, both the <b>-H</b> and <b>-l</b> options affect the printing of file
|
|
@@ -181,6 +188,12 @@ Treat binary files as text. This is equivalent to
|
|
<b>--binary-files</b>=<i>text</i>.
|
|
<b>--binary-files</b>=<i>text</i>.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
|
|
+<b>--allow-lookaround-bsk</b>
|
|
|
|
+PCRE2 now forbids the use of \K in lookarounds by default, in line with Perl.
|
|
|
|
+This option causes <b>pcre2grep</b> to set the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
|
|
|
+option, which enables this somewhat dangerous usage.
|
|
|
|
+</P>
|
|
|
|
+<P>
|
|
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
|
|
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
|
|
Output up to <i>number</i> lines of context before each matching line. Fewer
|
|
Output up to <i>number</i> lines of context before each matching line. Fewer
|
|
lines are output if the previous match or the start of the file is within
|
|
lines are output if the previous match or the start of the file is within
|
|
@@ -355,12 +368,15 @@ files; it does not apply to patterns specified by any of the <b>--include</b> or
|
|
<P>
|
|
<P>
|
|
<b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
|
|
<b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
|
|
Read patterns from the file, one per line, and match them against each line of
|
|
Read patterns from the file, one per line, and match them against each line of
|
|
-input. What constitutes a newline when reading the file is the operating
|
|
|
|
-system's default. The <b>--newline</b> option has no effect on this option.
|
|
|
|
-Trailing white space is removed from each line, and blank lines are ignored. An
|
|
|
|
-empty file contains no patterns and therefore matches nothing. See also the
|
|
|
|
-comments about multiple patterns versus a single pattern with alternatives in
|
|
|
|
-the description of <b>-e</b> above.
|
|
|
|
|
|
+input. As is the case with patterns on the command line, no delimiters should
|
|
|
|
+be used. What constitutes a newline when reading the file is the operating
|
|
|
|
+system's default interpretation of \n. The <b>--newline</b> option has no
|
|
|
|
+effect on this option. Trailing white space is removed from each line, and
|
|
|
|
+blank lines are ignored. An empty file contains no patterns and therefore
|
|
|
|
+matches nothing. Patterns read from a file in this way may contain binary
|
|
|
|
+zeros, which are treated as ordinary data characters. See also the comments
|
|
|
|
+about multiple patterns versus a single pattern with alternatives in the
|
|
|
|
+description of <b>-e</b> above.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
If this option is given more than once, all the specified files are read. A
|
|
If this option is given more than once, all the specified files are read. A
|
|
@@ -373,14 +389,15 @@ command line; all arguments are treated as the names of paths to be searched.
|
|
<P>
|
|
<P>
|
|
<b>--file-list</b>=<i>filename</i>
|
|
<b>--file-list</b>=<i>filename</i>
|
|
Read a list of files and/or directories that are to be scanned from the given
|
|
Read a list of files and/or directories that are to be scanned from the given
|
|
-file, one per line. Trailing white space is removed from each line, and blank
|
|
|
|
-lines are ignored. These paths are processed before any that are listed on the
|
|
|
|
-command line. The file name can be given as "-" to refer to the standard input.
|
|
|
|
-If <b>--file</b> and <b>--file-list</b> are both specified as "-", patterns are
|
|
|
|
-read first. This is useful only when the standard input is a terminal, from
|
|
|
|
-which further lines (the list of files) can be read after an end-of-file
|
|
|
|
-indication. If this option is given more than once, all the specified files are
|
|
|
|
-read.
|
|
|
|
|
|
+file, one per line. What constitutes a newline when reading the file is the
|
|
|
|
+operating system's default. Trailing white space is removed from each line, and
|
|
|
|
+blank lines are ignored. These paths are processed before any that are listed
|
|
|
|
+on the command line. The file name can be given as "-" to refer to the standard
|
|
|
|
+input. If <b>--file</b> and <b>--file-list</b> are both specified as "-",
|
|
|
|
+patterns are read first. This is useful only when the standard input is a
|
|
|
|
+terminal, from which further lines (the list of files) can be read after an
|
|
|
|
+end-of-file indication. If this option is given more than once, all the
|
|
|
|
+specified files are read.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
<b>--file-offsets</b>
|
|
<b>--file-offsets</b>
|
|
@@ -431,8 +448,8 @@ Ignore upper/lower case distinctions during comparisons.
|
|
<P>
|
|
<P>
|
|
<b>--include</b>=<i>pattern</i>
|
|
<b>--include</b>=<i>pattern</i>
|
|
If any <b>--include</b> patterns are specified, the only files that are
|
|
If any <b>--include</b> patterns are specified, the only files that are
|
|
-processed are those that match one of the patterns (and do not match an
|
|
|
|
-<b>--exclude</b> pattern). This option does not affect directories, but it
|
|
|
|
|
|
+processed are those whose names match one of the patterns and do not match an
|
|
|
|
+<b>--exclude</b> pattern. This option does not affect directories, but it
|
|
applies to all files, whether listed on the command line, obtained from
|
|
applies to all files, whether listed on the command line, obtained from
|
|
<b>--file-list</b>, or by scanning a directory. The pattern is a PCRE2 regular
|
|
<b>--file-list</b>, or by scanning a directory. The pattern is a PCRE2 regular
|
|
expression, and is matched against the final component of the file name, not
|
|
expression, and is matched against the final component of the file name, not
|
|
@@ -451,8 +468,8 @@ may be given any number of times; all the files are read.
|
|
<P>
|
|
<P>
|
|
<b>--include-dir</b>=<i>pattern</i>
|
|
<b>--include-dir</b>=<i>pattern</i>
|
|
If any <b>--include-dir</b> patterns are specified, the only directories that
|
|
If any <b>--include-dir</b> patterns are specified, the only directories that
|
|
-are processed are those that match one of the patterns (and do not match an
|
|
|
|
-<b>--exclude-dir</b> pattern). This applies to all directories, whether listed
|
|
|
|
|
|
+are processed are those whose names match one of the patterns and do not match
|
|
|
|
+an <b>--exclude-dir</b> pattern. This applies to all directories, whether listed
|
|
on the command line, obtained from <b>--file-list</b>, or by scanning a parent
|
|
on the command line, obtained from <b>--file-list</b>, or by scanning a parent
|
|
directory. The pattern is a PCRE2 regular expression, and is matched against
|
|
directory. The pattern is a PCRE2 regular expression, and is matched against
|
|
the final component of the directory name, not the entire path. The <b>-F</b>,
|
|
the final component of the directory name, not the entire path. The <b>-F</b>,
|
|
@@ -475,8 +492,9 @@ a separate line. Searching normally stops as soon as a matching line is found
|
|
in a file. However, if the <b>-c</b> (count) option is also used, matching
|
|
in a file. However, if the <b>-c</b> (count) option is also used, matching
|
|
continues in order to obtain the correct count, and those files that have at
|
|
continues in order to obtain the correct count, and those files that have at
|
|
least one match are listed along with their counts. Using this option with
|
|
least one match are listed along with their counts. Using this option with
|
|
-<b>-c</b> is a way of suppressing the listing of files with no matches. This
|
|
|
|
-opeion overrides any previous <b>-H</b>, <b>-h</b>, or <b>-L</b> options.
|
|
|
|
|
|
+<b>-c</b> is a way of suppressing the listing of files with no matches that
|
|
|
|
+occurs with <b>-c</b> on its own. This option overrides any previous <b>-H</b>,
|
|
|
|
+<b>-h</b>, or <b>-L</b> options.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
<b>--label</b>=<i>name</i>
|
|
<b>--label</b>=<i>name</i>
|
|
@@ -489,13 +507,13 @@ short form for this option.
|
|
When this option is given, non-compressed input is read and processed line by
|
|
When this option is given, non-compressed input is read and processed line by
|
|
line, and the output is flushed after each write. By default, input is read in
|
|
line, and the output is flushed after each write. By default, input is read in
|
|
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
|
|
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
|
|
-terminal (which is currently possible only in Unix-like environments). Output
|
|
|
|
-to terminal is normally automatically flushed by the operating system. This
|
|
|
|
-option can be useful when the input or output is attached to a pipe and you do
|
|
|
|
-not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use
|
|
|
|
-will affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
|
|
|
-When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is
|
|
|
|
-ignored.
|
|
|
|
|
|
+terminal, which is currently possible only in Unix-like environments or
|
|
|
|
+Windows. Output to terminal is normally automatically flushed by the operating
|
|
|
|
+system. This option can be useful when the input or output is attached to a
|
|
|
|
+pipe and you do not want <b>pcre2grep</b> to buffer up large amounts of data.
|
|
|
|
+However, its use will affect performance, and the <b>-M</b> (multiline) option
|
|
|
|
+ceases to work. When input is from a compressed .gz or .bz2 file,
|
|
|
|
+<b>--line-buffered</b> is ignored.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
<b>--line-offsets</b>
|
|
<b>--line-offsets</b>
|
|
@@ -516,6 +534,49 @@ locale is specified, the PCRE2 library's default (usually the "C" locale) is
|
|
used. There is no short form for this option.
|
|
used. There is no short form for this option.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
|
|
+<b>-M</b>, <b>--multiline</b>
|
|
|
|
+Allow patterns to match more than one line. When this option is set, the PCRE2
|
|
|
|
+library is called in "multiline" mode. This allows a matched string to extend
|
|
|
|
+past the end of a line and continue on one or more subsequent lines. Patterns
|
|
|
|
+used with <b>-M</b> may usefully contain literal newline characters and internal
|
|
|
|
+occurrences of ^ and $ characters. The output for a successful match may
|
|
|
|
+consist of more than one line. The first line is the line in which the match
|
|
|
|
+started, and the last line is the line in which the match ended. If the matched
|
|
|
|
+string ends with a newline sequence, the output ends at the end of that line.
|
|
|
|
+If <b>-v</b> is set, none of the lines in a multi-line match are output. Once a
|
|
|
|
+match has been handled, scanning restarts at the beginning of the line after
|
|
|
|
+the one in which the match ended.
|
|
|
|
+<br>
|
|
|
|
+<br>
|
|
|
|
+The newline sequence that separates multiple lines must be matched as part of
|
|
|
|
+the pattern. For example, to find the phrase "regular expression" in a file
|
|
|
|
+where "regular" might be at the end of a line and "expression" at the start of
|
|
|
|
+the next line, you could use this command:
|
|
|
|
+<pre>
|
|
|
|
+ pcre2grep -M 'regular\s+expression' <file>
|
|
|
|
+</pre>
|
|
|
|
+The \s escape sequence matches any white space character, including newlines,
|
|
|
|
+and is followed by + so as to match trailing white space on the first line as
|
|
|
|
+well as possibly handling a two-character newline sequence.
|
|
|
|
+<br>
|
|
|
|
+<br>
|
|
|
|
+There is a limit to the number of lines that can be matched, imposed by the way
|
|
|
|
+that <b>pcre2grep</b> buffers the input file as it scans it. With a sufficiently
|
|
|
|
+large processing buffer, this should not be a problem, but the <b>-M</b> option
|
|
|
|
+does not work when input is read line by line (see <b>--line-buffered</b>.)
|
|
|
|
+</P>
|
|
|
|
+<P>
|
|
|
|
+<b>-m</b> <i>number</i>, <b>--max-count</b>=<i>number</i>
|
|
|
|
+Stop processing after finding <i>number</i> matching lines, or non-matching
|
|
|
|
+lines if <b>-v</b> is also set. Any trailing context lines are output after the
|
|
|
|
+final match. In multiline mode, each multiline match counts as just one line
|
|
|
|
+for this purpose. If this limit is reached when reading the standard input from
|
|
|
|
+a regular file, the file is left positioned just after the last matching line.
|
|
|
|
+If <b>-c</b> is also set, the count that is output is never greater than
|
|
|
|
+<i>number</i>. This option has no effect if used with <b>-L</b>, <b>-l</b>, or
|
|
|
|
+<b>-q</b>, or when just checking for a match in a binary file.
|
|
|
|
+</P>
|
|
|
|
+<P>
|
|
<b>--match-limit</b>=<i>number</i>
|
|
<b>--match-limit</b>=<i>number</i>
|
|
Processing some regular expression patterns may take a very long time to search
|
|
Processing some regular expression patterns may take a very long time to search
|
|
for all possible matching strings. Others may require a very large amount of
|
|
for all possible matching strings. Others may require a very large amount of
|
|
@@ -530,11 +591,11 @@ counter that is incremented each time around its main processing loop. If the
|
|
value set by <b>--match-limit</b> is reached, an error occurs.
|
|
value set by <b>--match-limit</b> is reached, an error occurs.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
-The <b>--heap-limit</b> option specifies, as a number of kilobytes, the amount
|
|
|
|
-of heap memory that may be used for matching. Heap memory is needed only if
|
|
|
|
-matching the pattern requires a significant number of nested backtracking
|
|
|
|
-points to be remembered. This parameter can be set to zero to forbid the use of
|
|
|
|
-heap memory altogether.
|
|
|
|
|
|
+The <b>--heap-limit</b> option specifies, as a number of kibibytes (units of
|
|
|
|
+1024 bytes), the amount of heap memory that may be used for matching. Heap
|
|
|
|
+memory is needed only if matching the pattern requires a significant number of
|
|
|
|
+nested backtracking points to be remembered. This parameter can be set to zero
|
|
|
|
+to forbid the use of heap memory altogether.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
The <b>--depth-limit</b> option limits the depth of nested backtracking points,
|
|
The <b>--depth-limit</b> option limits the depth of nested backtracking points,
|
|
@@ -545,69 +606,44 @@ limit acts varies from pattern to pattern. This limit is of use only if it is
|
|
set smaller than <b>--match-limit</b>.
|
|
set smaller than <b>--match-limit</b>.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
-There are no short forms for these options. The default settings are specified
|
|
|
|
-when the PCRE2 library is compiled, with the default defaults being very large
|
|
|
|
-and so effectively unlimited.
|
|
|
|
|
|
+There are no short forms for these options. The default limits can be set
|
|
|
|
+when the PCRE2 library is compiled; if they are not specified, the defaults
|
|
|
|
+are very large and so effectively unlimited.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
-\fB--max-buffer-size=<i>number</i>
|
|
|
|
|
|
+<b>--max-buffer-size</b>=<i>number</i>
|
|
This limits the expansion of the processing buffer, whose initial size can be
|
|
This limits the expansion of the processing buffer, whose initial size can be
|
|
set by <b>--buffer-size</b>. The maximum buffer size is silently forced to be no
|
|
set by <b>--buffer-size</b>. The maximum buffer size is silently forced to be no
|
|
smaller than the starting buffer size.
|
|
smaller than the starting buffer size.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
-<b>-M</b>, <b>--multiline</b>
|
|
|
|
-Allow patterns to match more than one line. When this option is set, the PCRE2
|
|
|
|
-library is called in "multiline" mode. This allows a matched string to extend
|
|
|
|
-past the end of a line and continue on one or more subsequent lines. Patterns
|
|
|
|
-used with <b>-M</b> may usefully contain literal newline characters and internal
|
|
|
|
-occurrences of ^ and $ characters. The output for a successful match may
|
|
|
|
-consist of more than one line. The first line is the line in which the match
|
|
|
|
-started, and the last line is the line in which the match ended. If the matched
|
|
|
|
-string ends with a newline sequence, the output ends at the end of that line.
|
|
|
|
-If <b>-v</b> is set, none of the lines in a multi-line match are output. Once a
|
|
|
|
-match has been handled, scanning restarts at the beginning of the line after
|
|
|
|
-the one in which the match ended.
|
|
|
|
-<br>
|
|
|
|
-<br>
|
|
|
|
-The newline sequence that separates multiple lines must be matched as part of
|
|
|
|
-the pattern. For example, to find the phrase "regular expression" in a file
|
|
|
|
-where "regular" might be at the end of a line and "expression" at the start of
|
|
|
|
-the next line, you could use this command:
|
|
|
|
|
|
+<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
|
|
|
|
+Six different conventions for indicating the ends of lines in scanned files are
|
|
|
|
+supported. For example:
|
|
<pre>
|
|
<pre>
|
|
- pcre2grep -M 'regular\s+expression' <file>
|
|
|
|
|
|
+ pcre2grep -N CRLF 'some pattern' <file>
|
|
</pre>
|
|
</pre>
|
|
-The \s escape sequence matches any white space character, including newlines,
|
|
|
|
-and is followed by + so as to match trailing white space on the first line as
|
|
|
|
-well as possibly handling a two-character newline sequence.
|
|
|
|
-<br>
|
|
|
|
-<br>
|
|
|
|
-There is a limit to the number of lines that can be matched, imposed by the way
|
|
|
|
-that <b>pcre2grep</b> buffers the input file as it scans it. With a sufficiently
|
|
|
|
-large processing buffer, this should not be a problem, but the <b>-M</b> option
|
|
|
|
-does not work when input is read line by line (see \fP--line-buffered\fP.)
|
|
|
|
-</P>
|
|
|
|
-<P>
|
|
|
|
-<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
|
|
|
|
-The PCRE2 library supports five different conventions for indicating
|
|
|
|
-the ends of lines. They are the single-character sequences CR (carriage return)
|
|
|
|
-and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
|
|
|
|
-which recognizes any of the preceding three types, and an "any" convention, in
|
|
|
|
-which any Unicode line ending sequence is assumed to end a line. The Unicode
|
|
|
|
-sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF
|
|
|
|
-(form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
|
|
|
|
-PS (paragraph separator, U+2029).
|
|
|
|
|
|
+The newline type may be specified in upper, lower, or mixed case. If the
|
|
|
|
+newline type is NUL, lines are separated by binary zero characters. The other
|
|
|
|
+types are the single-character sequences CR (carriage return) and LF
|
|
|
|
+(linefeed), the two-character sequence CRLF, an "anycrlf" type, which
|
|
|
|
+recognizes any of the preceding three types, and an "any" type, for which any
|
|
|
|
+Unicode line ending sequence is assumed to end a line. The Unicode sequences
|
|
|
|
+are the three just mentioned, plus VT (vertical tab, U+000B), FF (form feed,
|
|
|
|
+U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
|
|
|
|
+(paragraph separator, U+2029).
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
When the PCRE2 library is built, a default line-ending sequence is specified.
|
|
When the PCRE2 library is built, a default line-ending sequence is specified.
|
|
This is normally the standard sequence for the operating system. Unless
|
|
This is normally the standard sequence for the operating system. Unless
|
|
otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
|
|
otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
|
|
-The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
|
|
|
|
-makes it possible to use <b>pcre2grep</b> to scan files that have come from
|
|
|
|
-other environments without having to modify their line endings. If the data
|
|
|
|
-that is being scanned does not agree with the convention set by this option,
|
|
|
|
-<b>pcre2grep</b> may behave in strange ways. Note that this option does not
|
|
|
|
-apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
|
|
|
|
|
|
+<br>
|
|
|
|
+<br>
|
|
|
|
+This option makes it possible to use <b>pcre2grep</b> to scan files that have
|
|
|
|
+come from other environments without having to modify their line endings. If
|
|
|
|
+the data that is being scanned does not agree with the convention set by this
|
|
|
|
+option, <b>pcre2grep</b> may behave in strange ways. Note that this option does
|
|
|
|
+not apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
|
|
<b>--include-from</b> options, which are expected to use the operating system's
|
|
<b>--include-from</b> options, which are expected to use the operating system's
|
|
standard newline sequence.
|
|
standard newline sequence.
|
|
</P>
|
|
</P>
|
|
@@ -629,29 +665,41 @@ It should never be needed in normal use.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
<b>-O</b> <i>text</i>, <b>--output</b>=<i>text</i>
|
|
<b>-O</b> <i>text</i>, <b>--output</b>=<i>text</i>
|
|
-When there is a match, instead of outputting the whole line that matched,
|
|
|
|
-output just the given text. This option is mutually exclusive with
|
|
|
|
-<b>--only-matching</b>, <b>--file-offsets</b>, and <b>--line-offsets</b>. Escape
|
|
|
|
-sequences starting with a dollar character may be used to insert the contents
|
|
|
|
-of the matched part of the line and/or captured substrings into the text.
|
|
|
|
|
|
+When there is a match, instead of outputting the line that matched, output just
|
|
|
|
+the text specified in this option, followed by an operating-system standard
|
|
|
|
+newline. In this mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>,
|
|
|
|
+and <b>-C</b> options are ignored. The <b>--newline</b> option has no effect on
|
|
|
|
+this option, which is mutually exclusive with <b>--only-matching</b>,
|
|
|
|
+<b>--file-offsets</b>, and <b>--line-offsets</b>. However, like
|
|
|
|
+<b>--only-matching</b>, if there is more than one match in a line, each of them
|
|
|
|
+causes a line of output.
|
|
|
|
+<br>
|
|
|
|
+<br>
|
|
|
|
+Escape sequences starting with a dollar character may be used to insert the
|
|
|
|
+contents of the matched part of the line and/or captured substrings into the
|
|
|
|
+text.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
-$<digits> or ${<digits>} is replaced by the captured
|
|
|
|
-substring of the given decimal number; zero substitutes the whole match. If
|
|
|
|
-the number is greater than the number of capturing substrings, or if the
|
|
|
|
-capture is unset, the replacement is empty.
|
|
|
|
|
|
+$<digits> or ${<digits>} is replaced by the captured substring of the given
|
|
|
|
+decimal number; zero substitutes the whole match. If the number is greater than
|
|
|
|
+the number of capturing substrings, or if the capture is unset, the replacement
|
|
|
|
+is empty.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
$a is replaced by bell; $b by backspace; $e by escape; $f by form feed; $n by
|
|
$a is replaced by bell; $b by backspace; $e by escape; $f by form feed; $n by
|
|
newline; $r by carriage return; $t by tab; $v by vertical tab.
|
|
newline; $r by carriage return; $t by tab; $v by vertical tab.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
-$o<digits> is replaced by the character represented by the given octal
|
|
|
|
-number; up to three digits are processed.
|
|
|
|
|
|
+$o<digits> or $o{<digits>} is replaced by the character whose code point is the
|
|
|
|
+given octal number. In the first form, up to three octal digits are processed.
|
|
|
|
+When more digits are needed in Unicode mode to specify a wide character, the
|
|
|
|
+second form must be used.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
-$x<digits> is replaced by the character represented by the given hexadecimal
|
|
|
|
-number; up to two digits are processed.
|
|
|
|
|
|
+$x<digits> or $x{<digits>} is replaced by the character represented by the
|
|
|
|
+given hexadecimal number. In the first form, up to two hexadecimal digits are
|
|
|
|
+processed. When more digits are needed in Unicode mode to specify a wide
|
|
|
|
+character, the second form must be used.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
Any other character is substituted by itself. In particular, $$ is replaced by
|
|
Any other character is substituted by itself. In particular, $$ is replaced by
|
|
@@ -673,20 +721,32 @@ otherwise empty line. This option is mutually exclusive with <b>--output</b>,
|
|
<P>
|
|
<P>
|
|
<b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
|
|
<b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
|
|
Show only the part of the line that matched the capturing parentheses of the
|
|
Show only the part of the line that matched the capturing parentheses of the
|
|
-given number. Up to 32 capturing parentheses are supported, and -o0 is
|
|
|
|
-equivalent to <b>-o</b> without a number. Because these options can be given
|
|
|
|
-without an argument (see above), if an argument is present, it must be given in
|
|
|
|
-the same shell item, for example, -o3 or --only-matching=2. The comments given
|
|
|
|
-for the non-argument case above also apply to this option. If the specified
|
|
|
|
-capturing parentheses do not exist in the pattern, or were not set in the
|
|
|
|
-match, nothing is output unless the file name or line number are being output.
|
|
|
|
|
|
+given number. Up to 50 capturing parentheses are supported by default. This
|
|
|
|
+limit can be changed via the <b>--om-capture</b> option. A pattern may contain
|
|
|
|
+any number of capturing parentheses, but only those whose number is within the
|
|
|
|
+limit can be accessed by <b>-o</b>. An error occurs if the number specified by
|
|
|
|
+<b>-o</b> is greater than the limit.
|
|
|
|
+<br>
|
|
|
|
+<br>
|
|
|
|
+-o0 is the same as <b>-o</b> without a number. Because these options can be
|
|
|
|
+given without an argument (see above), if an argument is present, it must be
|
|
|
|
+given in the same shell item, for example, -o3 or --only-matching=2. The
|
|
|
|
+comments given for the non-argument case above also apply to this option. If
|
|
|
|
+the specified capturing parentheses do not exist in the pattern, or were not
|
|
|
|
+set in the match, nothing is output unless the file name or line number are
|
|
|
|
+being output.
|
|
<br>
|
|
<br>
|
|
<br>
|
|
<br>
|
|
If this option is given multiple times, multiple substrings are output for each
|
|
If this option is given multiple times, multiple substrings are output for each
|
|
match, in the order the options are given, and all on one line. For example,
|
|
match, in the order the options are given, and all on one line. For example,
|
|
-o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and
|
|
-o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and
|
|
then 3 again to be output. By default, there is no separator (but see the next
|
|
then 3 again to be output. By default, there is no separator (but see the next
|
|
-option).
|
|
|
|
|
|
+but one option).
|
|
|
|
+</P>
|
|
|
|
+<P>
|
|
|
|
+<b>--om-capture</b>=<i>number</i>
|
|
|
|
+Set the number of capturing parentheses that can be accessed by <b>-o</b>. The
|
|
|
|
+default is 50.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
<b>--om-separator</b>=<i>text</i>
|
|
<b>--om-separator</b>=<i>text</i>
|
|
@@ -708,7 +768,8 @@ option to "recurse".
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
<b>--recursion-limit</b>=<i>number</i>
|
|
<b>--recursion-limit</b>=<i>number</i>
|
|
-See <b>--match-limit</b> above.
|
|
|
|
|
|
+This is an obsolete synonym for <b>--depth-limit</b>. See <b>--match-limit</b>
|
|
|
|
+above for details.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
<b>-s</b>, <b>--no-messages</b>
|
|
<b>-s</b>, <b>--no-messages</b>
|
|
@@ -729,11 +790,23 @@ ignored when used with <b>-L</b> (list files without matches), because the grand
|
|
total would always be zero.
|
|
total would always be zero.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
-<b>-u</b>, <b>--utf-8</b>
|
|
|
|
|
|
+<b>-u</b>, <b>--utf</b>
|
|
Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
|
|
Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
|
|
with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
|
|
with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
|
|
-<b>--include</b> options) and all subject lines that are scanned must be valid
|
|
|
|
-strings of UTF-8 characters.
|
|
|
|
|
|
+<b>--include</b> options) and all lines that are scanned must be valid strings
|
|
|
|
+of UTF-8 characters. If an invalid UTF-8 string is encountered, an error
|
|
|
|
+occurs.
|
|
|
|
+</P>
|
|
|
|
+<P>
|
|
|
|
+<b>-U</b>, <b>--utf-allow-invalid</b>
|
|
|
|
+As <b>--utf</b>, but in addition subject lines may contain invalid UTF-8 code
|
|
|
|
+unit sequences. These can never form part of any pattern match. Patterns
|
|
|
|
+themselves, however, must still be valid UTF-8 strings. This facility allows
|
|
|
|
+valid UTF-8 strings to be sought within arbitrary byte sequences in executable
|
|
|
|
+or other binary files. For more details about matching in non-valid UTF-8
|
|
|
|
+strings, see the
|
|
|
|
+<a href="pcre2unicode.html"><b>pcre2unicode</b>(3)</a>
|
|
|
|
+documentation.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
<b>-V</b>, <b>--version</b>
|
|
<b>-V</b>, <b>--version</b>
|
|
@@ -744,7 +817,9 @@ ignored.
|
|
<P>
|
|
<P>
|
|
<b>-v</b>, <b>--invert-match</b>
|
|
<b>-v</b>, <b>--invert-match</b>
|
|
Invert the sense of the match, so that lines which do <i>not</i> match any of
|
|
Invert the sense of the match, so that lines which do <i>not</i> match any of
|
|
-the patterns are the ones that are found.
|
|
|
|
|
|
+the patterns are the ones that are found. When this option is set, options such
|
|
|
|
+as <b>--only-matching</b> and <b>--output</b>, which specify parts of a match
|
|
|
|
+that are to be output, are ignored.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
|
|
<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
|
|
@@ -764,27 +839,39 @@ pattern and ")$" at the end. This option applies only to the patterns that are
|
|
matched against the contents of files; it does not apply to patterns specified
|
|
matched against the contents of files; it does not apply to patterns specified
|
|
by any of the <b>--include</b> or <b>--exclude</b> options.
|
|
by any of the <b>--include</b> or <b>--exclude</b> options.
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
|
|
|
|
|
|
+<br><a name="SEC7" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
|
|
<P>
|
|
<P>
|
|
The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
|
|
The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
|
|
order, for a locale. The first one that is set is used. This can be overridden
|
|
order, for a locale. The first one that is set is used. This can be overridden
|
|
by the <b>--locale</b> option. If no locale is set, the PCRE2 library's default
|
|
by the <b>--locale</b> option. If no locale is set, the PCRE2 library's default
|
|
(usually the "C" locale) is used.
|
|
(usually the "C" locale) is used.
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC7" href="#TOC1">NEWLINES</a><br>
|
|
|
|
|
|
+<br><a name="SEC8" href="#TOC1">NEWLINES</a><br>
|
|
<P>
|
|
<P>
|
|
The <b>-N</b> (<b>--newline</b>) option allows <b>pcre2grep</b> to scan files with
|
|
The <b>-N</b> (<b>--newline</b>) option allows <b>pcre2grep</b> to scan files with
|
|
-different newline conventions from the default. Any parts of the input files
|
|
|
|
-that are written to the standard output are copied identically, with whatever
|
|
|
|
-newline sequences they have in the input. However, the setting of this option
|
|
|
|
-does not affect the interpretation of files specified by the <b>-f</b>,
|
|
|
|
-<b>--exclude-from</b>, or <b>--include-from</b> options, which are assumed to use
|
|
|
|
-the operating system's standard newline sequence, nor does it affect the way in
|
|
|
|
-which <b>pcre2grep</b> writes informational messages to the standard error and
|
|
|
|
-output streams. For these it uses the string "\n" to indicate newlines,
|
|
|
|
-relying on the C I/O library to convert this to an appropriate sequence.
|
|
|
|
-</P>
|
|
|
|
-<br><a name="SEC8" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
|
|
|
|
|
|
+newline conventions that differ from the default. This option affects only the
|
|
|
|
+way scanned files are processed. It does not affect the interpretation of files
|
|
|
|
+specified by the <b>-f</b>, <b>--file-list</b>, <b>--exclude-from</b>, or
|
|
|
|
+<b>--include-from</b> options.
|
|
|
|
+</P>
|
|
|
|
+<P>
|
|
|
|
+Any parts of the scanned input files that are written to the standard output
|
|
|
|
+are copied with whatever newline sequences they have in the input. However, if
|
|
|
|
+the final line of a file is output, and it does not end with a newline
|
|
|
|
+sequence, a newline sequence is added. If the newline setting is CR, LF, CRLF
|
|
|
|
+or NUL, that line ending is output; for the other settings (ANYCRLF or ANY) a
|
|
|
|
+single NL is used.
|
|
|
|
+</P>
|
|
|
|
+<P>
|
|
|
|
+The newline setting does not affect the way in which <b>pcre2grep</b> writes
|
|
|
|
+newlines in informational messages to the standard output and error streams.
|
|
|
|
+Under Windows, the standard output is set to be binary, so that "\r\n" at the
|
|
|
|
+ends of output lines that are copied from the input is not converted to
|
|
|
|
+"\r\r\n" by the C I/O library. This means that any messages written to the
|
|
|
|
+standard output must end with "\r\n". For all other operating systems, and
|
|
|
|
+for all messages to the standard error stream, "\n" is used.
|
|
|
|
+</P>
|
|
|
|
+<br><a name="SEC9" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
|
|
<P>
|
|
<P>
|
|
Many of the short and long forms of <b>pcre2grep</b>'s options are the same
|
|
Many of the short and long forms of <b>pcre2grep</b>'s options are the same
|
|
as in the GNU <b>grep</b> program. Any long option of the form
|
|
as in the GNU <b>grep</b> program. Any long option of the form
|
|
@@ -793,9 +880,9 @@ as in the GNU <b>grep</b> program. Any long option of the form
|
|
<b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
|
|
<b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
|
|
<b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
|
|
<b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
|
|
<b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
|
|
<b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
|
|
-<b>--output</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
|
|
|
|
-<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a
|
|
|
|
-capturing parentheses number.
|
|
|
|
|
|
+<b>--output</b>, <b>-u</b>, <b>--utf</b>, <b>-U</b>, and <b>--utf-allow-invalid</b>
|
|
|
|
+options are specific to <b>pcre2grep</b>, as is the use of the
|
|
|
|
+<b>--only-matching</b> option with a capturing parentheses number.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
Although most of the common options work the same way, a few are different in
|
|
Although most of the common options work the same way, a few are different in
|
|
@@ -804,7 +891,7 @@ for GNU <b>grep</b>, but a regular expression for <b>pcre2grep</b>. If both the
|
|
<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
|
|
<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
|
|
without counts, but <b>pcre2grep</b> gives the counts as well.
|
|
without counts, but <b>pcre2grep</b> gives the counts as well.
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
|
|
|
|
|
|
+<br><a name="SEC10" href="#TOC1">OPTIONS WITH DATA</a><br>
|
|
<P>
|
|
<P>
|
|
There are four different ways in which an option with data can be specified.
|
|
There are four different ways in which an option with data can be specified.
|
|
If a short form option is used, the data may follow immediately, or (with one
|
|
If a short form option is used, the data may follow immediately, or (with one
|
|
@@ -836,14 +923,16 @@ The exceptions to the above are the <b>--colour</b> (or <b>--color</b>) and
|
|
options does have data, it must be given in the first form, using an equals
|
|
options does have data, it must be given in the first form, using an equals
|
|
character. Otherwise <b>pcre2grep</b> will assume that it has no data.
|
|
character. Otherwise <b>pcre2grep</b> will assume that it has no data.
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC10" href="#TOC1">USING PCRE2'S CALLOUT FACILITY</a><br>
|
|
|
|
|
|
+<br><a name="SEC11" href="#TOC1">USING PCRE2'S CALLOUT FACILITY</a><br>
|
|
<P>
|
|
<P>
|
|
<b>pcre2grep</b> has, by default, support for calling external programs or
|
|
<b>pcre2grep</b> has, by default, support for calling external programs or
|
|
scripts or echoing specific strings during matching by making use of PCRE2's
|
|
scripts or echoing specific strings during matching by making use of PCRE2's
|
|
-callout facility. However, this support can be disabled when <b>pcre2grep</b> is
|
|
|
|
-built. You can find out whether your binary has support for callouts by running
|
|
|
|
-it with the <b>--help</b> option. If the support is not enabled, all callouts in
|
|
|
|
-patterns are ignored by <b>pcre2grep</b>.
|
|
|
|
|
|
+callout facility. However, this support can be completely or partially disabled
|
|
|
|
+when <b>pcre2grep</b> is built. You can find out whether your binary has support
|
|
|
|
+for callouts by running it with the <b>--help</b> option. If callout support is
|
|
|
|
+completely disabled, all callouts in patterns are ignored by <b>pcre2grep</b>.
|
|
|
|
+If the facility is partially disabled, calling external programs is not
|
|
|
|
+supported, and callouts that request it are ignored.
|
|
</P>
|
|
</P>
|
|
<P>
|
|
<P>
|
|
A callout in a PCRE2 pattern is of the form (?C<arg>) where the argument is
|
|
A callout in a PCRE2 pattern is of the form (?C<arg>) where the argument is
|
|
@@ -853,9 +942,39 @@ documentation for details). Numbered callouts are ignored by <b>pcre2grep</b>;
|
|
only callouts with string arguments are useful.
|
|
only callouts with string arguments are useful.
|
|
</P>
|
|
</P>
|
|
<br><b>
|
|
<br><b>
|
|
|
|
+Echoing a specific string
|
|
|
|
+</b><br>
|
|
|
|
+<P>
|
|
|
|
+Starting the callout string with a pipe character invokes an echoing facility
|
|
|
|
+that avoids calling an external program or script. This facility is always
|
|
|
|
+available, provided that callouts were not completely disabled when
|
|
|
|
+<b>pcre2grep</b> was built. The rest of the callout string is processed as a
|
|
|
|
+zero-terminated string, which means it should not contain any internal binary
|
|
|
|
+zeros. It is written to the output, having first been passed through the same
|
|
|
|
+escape processing as text from the <b>--output</b> (<b>-O</b>) option (see
|
|
|
|
+above). However, $0 cannot be used to insert a matched substring because the
|
|
|
|
+match is still in progress. Instead, the single character '0' is inserted. Any
|
|
|
|
+syntax errors in the string (for example, a dollar not followed by another
|
|
|
|
+character) causes the callout to be ignored. No terminator is added to the
|
|
|
|
+output string, so if you want a newline, you must include it explicitly using
|
|
|
|
+the escape $n. For example:
|
|
|
|
+<pre>
|
|
|
|
+ pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file>
|
|
|
|
+</pre>
|
|
|
|
+Matching continues normally after the string is output. If you want to see only
|
|
|
|
+the callout output but not any output from an actual match, you should end the
|
|
|
|
+pattern with (*FAIL).
|
|
|
|
+</P>
|
|
|
|
+<br><b>
|
|
Calling external programs or scripts
|
|
Calling external programs or scripts
|
|
</b><br>
|
|
</b><br>
|
|
<P>
|
|
<P>
|
|
|
|
+This facility can be independently disabled when <b>pcre2grep</b> is built. It
|
|
|
|
+is supported for Windows, where a call to <b>_spawnvp()</b> is used, for VMS,
|
|
|
|
+where <b>lib$spawn()</b> is used, and for any Unix-like environment where
|
|
|
|
+<b>fork()</b> and <b>execv()</b> are available.
|
|
|
|
+</P>
|
|
|
|
+<P>
|
|
If the callout string does not start with a pipe (vertical bar) character, it
|
|
If the callout string does not start with a pipe (vertical bar) character, it
|
|
is parsed into a list of substrings separated by pipe characters. The first
|
|
is parsed into a list of substrings separated by pipe characters. The first
|
|
substring must be an executable name, with the following substrings specifying
|
|
substring must be an executable name, with the following substrings specifying
|
|
@@ -864,14 +983,11 @@ arguments:
|
|
executable_name|arg1|arg2|...
|
|
executable_name|arg1|arg2|...
|
|
</pre>
|
|
</pre>
|
|
Any substring (including the executable name) may contain escape sequences
|
|
Any substring (including the executable name) may contain escape sequences
|
|
-started by a dollar character: $<digits> or ${<digits>} is replaced by the
|
|
|
|
-captured substring of the given decimal number, which must be greater than
|
|
|
|
-zero. If the number is greater than the number of capturing substrings, or if
|
|
|
|
-the capture is unset, the replacement is empty.
|
|
|
|
-</P>
|
|
|
|
-<P>
|
|
|
|
-Any other character is substituted by itself. In particular, $$ is replaced by
|
|
|
|
-a single dollar and $| is replaced by a pipe character. Here is an example:
|
|
|
|
|
|
+started by a dollar character. These are the same as for the <b>--output</b>
|
|
|
|
+(<b>-O</b>) option documented above, except that $0 cannot insert the matched
|
|
|
|
+string because the match is still in progress. Instead, the character '0'
|
|
|
|
+is inserted. If you need a literal dollar or pipe character in any
|
|
|
|
+substring, use $$ or $| respectively. Here is an example:
|
|
<pre>
|
|
<pre>
|
|
echo -e "abcde\n12345" | pcre2grep \
|
|
echo -e "abcde\n12345" | pcre2grep \
|
|
'(?x)(.)(..(.))
|
|
'(?x)(.)(..(.))
|
|
@@ -884,29 +1000,16 @@ a single dollar and $| is replaced by a pipe character. Here is an example:
|
|
Arg1: [1] [234] [4] Arg2: |1| ()
|
|
Arg1: [1] [234] [4] Arg2: |1| ()
|
|
12345
|
|
12345
|
|
</pre>
|
|
</pre>
|
|
-The parameters for the <b>execv()</b> system call that is used to run the
|
|
|
|
-program or script are zero-terminated strings. This means that binary zero
|
|
|
|
-characters in the callout argument will cause premature termination of their
|
|
|
|
-substrings, and therefore should not be present. Any syntax errors in the
|
|
|
|
-string (for example, a dollar not followed by another character) cause the
|
|
|
|
-callout to be ignored. If running the program fails for any reason (including
|
|
|
|
-the non-existence of the executable), a local matching failure occurs and the
|
|
|
|
-matcher backtracks in the normal way.
|
|
|
|
-</P>
|
|
|
|
-<br><b>
|
|
|
|
-Echoing a specific string
|
|
|
|
-</b><br>
|
|
|
|
-<P>
|
|
|
|
-If the callout string starts with a pipe (vertical bar) character, the rest of
|
|
|
|
-the string is written to the output, having been passed through the same escape
|
|
|
|
-processing as text from the --output option. This provides a simple echoing
|
|
|
|
-facility that avoids calling an external program or script. No terminator is
|
|
|
|
-added to the string, so if you want a newline, you must include it explicitly.
|
|
|
|
-Matching continues normally after the string is output. If you want to see only
|
|
|
|
-the callout output but not any output from an actual match, you should end the
|
|
|
|
-relevant pattern with (*FAIL).
|
|
|
|
|
|
+The parameters for the system call that is used to run the program or script
|
|
|
|
+are zero-terminated strings. This means that binary zero characters in the
|
|
|
|
+callout argument will cause premature termination of their substrings, and
|
|
|
|
+therefore should not be present. Any syntax errors in the string (for example,
|
|
|
|
+a dollar not followed by another character) causes the callout to be ignored.
|
|
|
|
+If running the program fails for any reason (including the non-existence of the
|
|
|
|
+executable), a local matching failure occurs and the matcher backtracks in the
|
|
|
|
+normal way.
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC11" href="#TOC1">MATCHING ERRORS</a><br>
|
|
|
|
|
|
+<br><a name="SEC12" href="#TOC1">MATCHING ERRORS</a><br>
|
|
<P>
|
|
<P>
|
|
It is possible to supply a regular expression that takes a very long time to
|
|
It is possible to supply a regular expression that takes a very long time to
|
|
fail to match certain lines. Such patterns normally involve nested indefinite
|
|
fail to match certain lines. Such patterns normally involve nested indefinite
|
|
@@ -922,7 +1025,7 @@ overall resource limit. There are also other limits that affect the amount of
|
|
memory used during matching; see the discussion of <b>--heap-limit</b> and
|
|
memory used during matching; see the discussion of <b>--heap-limit</b> and
|
|
<b>--depth-limit</b> above.
|
|
<b>--depth-limit</b> above.
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC12" href="#TOC1">DIAGNOSTICS</a><br>
|
|
|
|
|
|
+<br><a name="SEC13" href="#TOC1">DIAGNOSTICS</a><br>
|
|
<P>
|
|
<P>
|
|
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
|
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
|
for syntax errors, overlong lines, non-existent or inaccessible files (even if
|
|
for syntax errors, overlong lines, non-existent or inaccessible files (even if
|
|
@@ -934,24 +1037,25 @@ affect the return code.
|
|
When run under VMS, the return code is placed in the symbol PCRE2GREP_RC
|
|
When run under VMS, the return code is placed in the symbol PCRE2GREP_RC
|
|
because VMS does not distinguish between exit(0) and exit(1).
|
|
because VMS does not distinguish between exit(0) and exit(1).
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC13" href="#TOC1">SEE ALSO</a><br>
|
|
|
|
|
|
+<br><a name="SEC14" href="#TOC1">SEE ALSO</a><br>
|
|
<P>
|
|
<P>
|
|
-<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3), <b>pcre2callout</b>(3).
|
|
|
|
|
|
+<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3), <b>pcre2callout</b>(3),
|
|
|
|
+<b>pcre2unicode</b>(3).
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC14" href="#TOC1">AUTHOR</a><br>
|
|
|
|
|
|
+<br><a name="SEC15" href="#TOC1">AUTHOR</a><br>
|
|
<P>
|
|
<P>
|
|
Philip Hazel
|
|
Philip Hazel
|
|
<br>
|
|
<br>
|
|
-University Computing Service
|
|
|
|
|
|
+Retired from University Computing Service
|
|
<br>
|
|
<br>
|
|
Cambridge, England.
|
|
Cambridge, England.
|
|
<br>
|
|
<br>
|
|
</P>
|
|
</P>
|
|
-<br><a name="SEC15" href="#TOC1">REVISION</a><br>
|
|
|
|
|
|
+<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
|
<P>
|
|
<P>
|
|
-Last updated: 13 November 2017
|
|
|
|
|
|
+Last updated: 31 August 2021
|
|
<br>
|
|
<br>
|
|
-Copyright © 1997-2017 University of Cambridge.
|
|
|
|
|
|
+Copyright © 1997-2021 University of Cambridge.
|
|
<br>
|
|
<br>
|
|
<p>
|
|
<p>
|
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
|
Return to the <a href="index.html">PCRE2 index page</a>.
|