123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026 |
- PCRE2GREP(1) General Commands Manual PCRE2GREP(1)
- NAME
- pcre2grep - a grep with Perl-compatible regular expressions.
- SYNOPSIS
- pcre2grep [options] [long options] [pattern] [path1 path2 ...]
- DESCRIPTION
- pcre2grep searches files for character patterns, in the same way as
- other grep commands do, but it uses the PCRE2 regular expression li-
- brary to support patterns that are compatible with the regular expres-
- sions of Perl 5. See pcre2syntax(3) for a quick-reference summary of
- pattern syntax, or pcre2pattern(3) for a full description of the syntax
- and semantics of the regular expressions that PCRE2 supports.
- Patterns, whether supplied on the command line or in a separate file,
- are given without delimiters. For example:
- pcre2grep Thursday /etc/motd
- If you attempt to use delimiters (for example, by surrounding a pattern
- with slashes, as is common in Perl scripts), they are interpreted as
- part of the pattern. Quotes can of course be used to delimit patterns
- on the command line because they are interpreted by the shell, and in-
- deed quotes are required if a pattern contains white space or shell
- metacharacters.
- The first argument that follows any option settings is treated as the
- single pattern to be matched when neither -e nor -f is present. Con-
- versely, when one or both of these options are used to specify pat-
- terns, all arguments are treated as path names. At least one of -e, -f,
- or an argument pattern must be provided.
- If no files are specified, pcre2grep reads the standard input. The
- standard input can also be referenced by a name consisting of a single
- hyphen. For example:
- pcre2grep some-pattern file1 - file3
- Input files are searched line by line. By default, each line that
- matches a pattern is copied to the standard output, and if there is
- more than one file, the file name is output at the start of each line,
- followed by a colon. However, there are options that can change how
- pcre2grep behaves. In particular, the -M option makes it possible to
- search for strings that span line boundaries. What defines a line
- boundary is controlled by the -N (--newline) option.
- The amount of memory used for buffering files that are being scanned is
- controlled by parameters that can be set by the --buffer-size and
- --max-buffer-size options. The first of these sets the size of buffer
- that is obtained at the start of processing. If an input file contains
- very long lines, a larger buffer may be needed; this is handled by au-
- tomatically extending the buffer, up to the limit specified by --max-
- buffer-size. The default values for these parameters can be set when
- pcre2grep is built; if nothing is specified, the defaults are set to
- 20KiB and 1MiB respectively. An error occurs if a line is too long and
- the buffer can no longer be expanded.
- The block of memory that is actually used is three times the "buffer
- size", to allow for buffering "before" and "after" lines. If the buffer
- size is too small, fewer than requested "before" and "after" lines may
- be output.
- Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the
- greater. BUFSIZ is defined in <stdio.h>. When there is more than one
- pattern (specified by the use of -e and/or -f), each pattern is applied
- to each line in the order in which they are defined, except that all
- the -e patterns are tried before the -f patterns.
- By default, as soon as one pattern matches a line, no further patterns
- are considered. However, if --colour (or --color) is used to colour the
- matching substrings, or if --only-matching, --file-offsets, or --line-
- offsets is used to output only the part of the line that matched (ei-
- ther shown literally, or as an offset), scanning resumes immediately
- following the match, so that further matches on the same line can be
- found. If there are multiple patterns, they are all tried on the re-
- mainder of the line, but patterns that follow the one that matched are
- not tried on the earlier matched part of the line.
- This behaviour means that the order in which multiple patterns are
- specified can affect the output when one of the above options is used.
- This is no longer the same behaviour as GNU grep, which now manages to
- display earlier matches for later patterns (as long as there is no
- overlap).
- Patterns that can match an empty string are accepted, but empty string
- matches are never recognized. An example is the pattern "(su-
- per)?(man)?", in which all components are optional. This pattern finds
- all occurrences of both "super" and "man"; the output differs from
- matching with "super|man" when only the matching substrings are being
- shown.
- If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
- the value to set a locale when calling the PCRE2 library. The --locale
- option can be used to override this.
- SUPPORT FOR COMPRESSED FILES
- It is possible to compile pcre2grep so that it uses libz or libbz2 to
- read compressed files whose names end in .gz or .bz2, respectively. You
- can find out whether your pcre2grep binary has support for one or both
- of these file types by running it with the --help option. If the appro-
- priate support is not present, all files are treated as plain text. The
- standard input is always so treated. When input is from a compressed
- .gz or .bz2 file, the --line-buffered option is ignored.
- BINARY FILES
- By default, a file that contains a binary zero byte within the first
- 1024 bytes is identified as a binary file, and is processed specially.
- However, if the newline type is specified as NUL, that is, the line
- terminator is a binary zero, the test for a binary file is not applied.
- See the --binary-files option for a means of changing the way binary
- files are handled.
- BINARY ZEROS IN PATTERNS
- Patterns passed from the command line are strings that are terminated
- by a binary zero, so cannot contain internal zeros. However, patterns
- that are read from a file via the -f option may contain binary zeros.
- OPTIONS
- The order in which some of the options appear can affect the output.
- For example, both the -H and -l options affect the printing of file
- names. Whichever comes later in the command line will be the one that
- takes effect. Similarly, except where noted below, if an option is
- given twice, the later setting is used. Numerical values for options
- may be followed by K or M, to signify multiplication by 1024 or
- 1024*1024 respectively.
- -- This terminates the list of options. It is useful if the next
- item on the command line starts with a hyphen but is not an
- option. This allows for the processing of patterns and file
- names that start with hyphens.
- -A number, --after-context=number
- Output up to number lines of context after each matching
- line. Fewer lines are output if the next match or the end of
- the file is reached, or if the processing buffer size has
- been set too small. If file names and/or line numbers are be-
- ing output, a hyphen separator is used instead of a colon for
- the context lines. A line containing "--" is output between
- each group of lines, unless they are in fact contiguous in
- the input file. The value of number is expected to be rela-
- tively small. When -c is used, -A is ignored.
- -a, --text
- Treat binary files as text. This is equivalent to --binary-
- files=text.
- --allow-lookaround-bsk
- PCRE2 now forbids the use of \K in lookarounds by default, in
- line with Perl. This option causes pcre2grep to set the
- PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option, which enables this
- somewhat dangerous usage.
- -B number, --before-context=number
- Output up to number lines of context before each matching
- line. Fewer lines are output if the previous match or the
- start of the file is within number lines, or if the process-
- ing buffer size has been set too small. If file names and/or
- line numbers are being output, a hyphen separator is used in-
- stead of a colon for the context lines. A line containing
- "--" is output between each group of lines, unless they are
- in fact contiguous in the input file. The value of number is
- expected to be relatively small. When -c is used, -B is ig-
- nored.
- --binary-files=word
- Specify how binary files are to be processed. If the word is
- "binary" (the default), pattern matching is performed on bi-
- nary files, but the only output is "Binary file <name>
- matches" when a match succeeds. If the word is "text", which
- is equivalent to the -a or --text option, binary files are
- processed in the same way as any other file. In this case,
- when a match succeeds, the output may be binary garbage,
- which can have nasty effects if sent to a terminal. If the
- word is "without-match", which is equivalent to the -I op-
- tion, binary files are not processed at all; they are assumed
- not to be of interest and are skipped without causing any
- output or affecting the return code.
- --buffer-size=number
- Set the parameter that controls how much memory is obtained
- at the start of processing for buffering files that are being
- scanned. See also --max-buffer-size below.
- -C number, --context=number
- Output number lines of context both before and after each
- matching line. This is equivalent to setting both -A and -B
- to the same value.
- -c, --count
- Do not output lines from the files that are being scanned;
- instead output the number of lines that would have been
- shown, either because they matched, or, if -v is set, because
- they failed to match. By default, this count is exactly the
- same as the number of lines that would have been output, but
- if the -M (multiline) option is used (without -v), there may
- be more suppressed lines than the count (that is, the number
- of matches).
- If no lines are selected, the number zero is output. If sev-
- eral files are are being scanned, a count is output for each
- of them and the -t option can be used to cause a total to be
- output at the end. However, if the --files-with-matches op-
- tion is also used, only those files whose counts are greater
- than zero are listed. When -c is used, the -A, -B, and -C op-
- tions are ignored.
- --colour, --color
- If this option is given without any data, it is equivalent to
- "--colour=auto". If data is required, it must be given in
- the same shell item, separated by an equals sign.
- --colour=value, --color=value
- This option specifies under what circumstances the parts of a
- line that matched a pattern should be coloured in the output.
- By default, the output is not coloured. The value (which is
- optional, see above) may be "never", "always", or "auto". In
- the latter case, colouring happens only if the standard out-
- put is connected to a terminal. More resources are used when
- colouring is enabled, because pcre2grep has to search for all
- possible matches in a line, not just one, in order to colour
- them all.
- The colour that is used can be specified by setting one of
- the environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR,
- PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
- order. If none of these are set, pcre2grep looks for
- GREP_COLORS or GREP_COLOR (in that order). The value of the
- variable should be a string of two numbers, separated by a
- semicolon, except in the case of GREP_COLORS, which must
- start with "ms=" or "mt=" followed by two semicolon-separated
- colours, terminated by the end of the string or by a colon.
- If GREP_COLORS does not start with "ms=" or "mt=" it is ig-
- nored, and GREP_COLOR is checked.
- If the string obtained from one of the above variables con-
- tains any characters other than semicolon or digits, the set-
- ting is ignored and the default colour is used. The string is
- copied directly into the control string for setting colour on
- a terminal, so it is your responsibility to ensure that the
- values make sense. If no relevant environment variable is
- set, the default is "1;31", which gives red.
- -D action, --devices=action
- If an input path is not a regular file or a directory, "ac-
- tion" specifies how it is to be processed. Valid values are
- "read" (the default) or "skip" (silently skip the path).
- -d action, --directories=action
- If an input path is a directory, "action" specifies how it is
- to be processed. Valid values are "read" (the default in
- non-Windows environments, for compatibility with GNU grep),
- "recurse" (equivalent to the -r option), or "skip" (silently
- skip the path, the default in Windows environments). In the
- "read" case, directories are read as if they were ordinary
- files. In some operating systems the effect of reading a di-
- rectory like this is an immediate end-of-file; in others it
- may provoke an error.
- --depth-limit=number
- See --match-limit below.
- -e pattern, --regex=pattern, --regexp=pattern
- Specify a pattern to be matched. This option can be used mul-
- tiple times in order to specify several patterns. It can also
- be used as a way of specifying a single pattern that starts
- with a hyphen. When -e is used, no argument pattern is taken
- from the command line; all arguments are treated as file
- names. There is no limit to the number of patterns. They are
- applied to each line in the order in which they are defined
- until one matches.
- If -f is used with -e, the command line patterns are matched
- first, followed by the patterns from the file(s), independent
- of the order in which these options are specified. Note that
- multiple use of -e is not the same as a single pattern with
- alternatives. For example, X|Y finds the first character in a
- line that is X or Y, whereas if the two patterns are given
- separately, with X first, pcre2grep finds X if it is present,
- even if it follows Y in the line. It finds Y only if there is
- no X in the line. This matters only if you are using -o or
- --colo(u)r to show the part(s) of the line that matched.
- --exclude=pattern
- Files (but not directories) whose names match the pattern are
- skipped without being processed. This applies to all files,
- whether listed on the command line, obtained from --file-
- list, or by scanning a directory. The pattern is a PCRE2 reg-
- ular expression, and is matched against the final component
- of the file name, not the entire path. The -F, -w, and -x op-
- tions do not apply to this pattern. The option may be given
- any number of times in order to specify multiple patterns. If
- a file name matches both an --include and an --exclude pat-
- tern, it is excluded. There is no short form for this option.
- --exclude-from=filename
- Treat each non-empty line of the file as the data for an
- --exclude option. What constitutes a newline when reading the
- file is the operating system's default. The --newline option
- has no effect on this option. This option may be given more
- than once in order to specify a number of files to read.
- --exclude-dir=pattern
- Directories whose names match the pattern are skipped without
- being processed, whatever the setting of the --recursive op-
- tion. This applies to all directories, whether listed on the
- command line, obtained from --file-list, or by scanning a
- parent directory. The pattern is a PCRE2 regular expression,
- and is matched against the final component of the directory
- name, not the entire path. The -F, -w, and -x options do not
- apply to this pattern. The option may be given any number of
- times in order to specify more than one pattern. If a direc-
- tory matches both --include-dir and --exclude-dir, it is ex-
- cluded. There is no short form for this option.
- -F, --fixed-strings
- Interpret each data-matching pattern as a list of fixed
- strings, separated by newlines, instead of as a regular ex-
- pression. What constitutes a newline for this purpose is con-
- trolled by the --newline option. The -w (match as a word) and
- -x (match whole line) options can be used with -F. They ap-
- ply to each of the fixed strings. A line is selected if any
- of the fixed strings are found in it (subject to -w or -x, if
- present). This option applies only to the patterns that are
- matched against the contents of files; it does not apply to
- patterns specified by any of the --include or --exclude op-
- tions.
- -f filename, --file=filename
- Read patterns from the file, one per line, and match them
- against each line of input. As is the case with patterns on
- the command line, no delimiters should be used. What consti-
- tutes a newline when reading the file is the operating sys-
- tem's default interpretation of \n. The --newline option has
- no effect on this option. Trailing white space is removed
- from each line, and blank lines are ignored. An empty file
- contains no patterns and therefore matches nothing. Patterns
- read from a file in this way may contain binary zeros, which
- are treated as ordinary data characters. See also the com-
- ments about multiple patterns versus a single pattern with
- alternatives in the description of -e above.
- If this option is given more than once, all the specified
- files are read. A data line is output if any of the patterns
- match it. A file name can be given as "-" to refer to the
- standard input. When -f is used, patterns specified on the
- command line using -e may also be present; they are tested
- before the file's patterns. However, no other pattern is
- taken from the command line; all arguments are treated as the
- names of paths to be searched.
- --file-list=filename
- Read a list of files and/or directories that are to be
- scanned from the given file, one per line. What constitutes a
- newline when reading the file is the operating system's de-
- fault. Trailing white space is removed from each line, and
- blank lines are ignored. These paths are processed before any
- that are listed on the command line. The file name can be
- given as "-" to refer to the standard input. If --file and
- --file-list are both specified as "-", patterns are read
- first. This is useful only when the standard input is a ter-
- minal, from which further lines (the list of files) can be
- read after an end-of-file indication. If this option is given
- more than once, all the specified files are read.
- --file-offsets
- Instead of showing lines or parts of lines that match, show
- each match as an offset from the start of the file and a
- length, separated by a comma. In this mode, no context is
- shown. That is, the -A, -B, and -C options are ignored. If
- there is more than one match in a line, each of them is shown
- separately. This option is mutually exclusive with --output,
- --line-offsets, and --only-matching.
- -H, --with-filename
- Force the inclusion of the file name at the start of output
- lines when searching a single file. By default, the file name
- is not shown in this case. For matching lines, the file name
- is followed by a colon; for context lines, a hyphen separator
- is used. If a line number is also being output, it follows
- the file name. When the -M option causes a pattern to match
- more than one line, only the first is preceded by the file
- name. This option overrides any previous -h, -l, or -L op-
- tions.
- -h, --no-filename
- Suppress the output file names when searching multiple files.
- By default, file names are shown when multiple files are
- searched. For matching lines, the file name is followed by a
- colon; for context lines, a hyphen separator is used. If a
- line number is also being output, it follows the file name.
- This option overrides any previous -H, -L, or -l options.
- --heap-limit=number
- See --match-limit below.
- --help Output a help message, giving brief details of the command
- options and file type support, and then exit. Anything else
- on the command line is ignored.
- -I Ignore binary files. This is equivalent to --binary-
- files=without-match.
- -i, --ignore-case
- Ignore upper/lower case distinctions during comparisons.
- --include=pattern
- If any --include patterns are specified, the only files that
- are processed are those whose names match one of the patterns
- and do not match an --exclude pattern. This option does not
- affect directories, but it applies to all files, whether
- listed on the command line, obtained from --file-list, or by
- scanning a directory. The pattern is a PCRE2 regular expres-
- sion, and is matched against the final component of the file
- name, not the entire path. The -F, -w, and -x options do not
- apply to this pattern. The option may be given any number of
- times. If a file name matches both an --include and an --ex-
- clude pattern, it is excluded. There is no short form for
- this option.
- --include-from=filename
- Treat each non-empty line of the file as the data for an
- --include option. What constitutes a newline for this purpose
- is the operating system's default. The --newline option has
- no effect on this option. This option may be given any number
- of times; all the files are read.
- --include-dir=pattern
- If any --include-dir patterns are specified, the only direc-
- tories that are processed are those whose names match one of
- the patterns and do not match an --exclude-dir pattern. This
- applies to all directories, whether listed on the command
- line, obtained from --file-list, or by scanning a parent di-
- rectory. The pattern is a PCRE2 regular expression, and is
- matched against the final component of the directory name,
- not the entire path. The -F, -w, and -x options do not apply
- to this pattern. The option may be given any number of times.
- If a directory matches both --include-dir and --exclude-dir,
- it is excluded. There is no short form for this option.
- -L, --files-without-match
- Instead of outputting lines from the files, just output the
- names of the files that do not contain any lines that would
- have been output. Each file name is output once, on a sepa-
- rate line. This option overrides any previous -H, -h, or -l
- options.
- -l, --files-with-matches
- Instead of outputting lines from the files, just output the
- names of the files containing lines that would have been out-
- put. Each file name is output once, on a separate line.
- Searching normally stops as soon as a matching line is found
- in a file. However, if the -c (count) option is also used,
- matching continues in order to obtain the correct count, and
- those files that have at least one match are listed along
- with their counts. Using this option with -c is a way of sup-
- pressing the listing of files with no matches that occurs
- with -c on its own. This option overrides any previous -H,
- -h, or -L options.
- --label=name
- This option supplies a name to be used for the standard input
- when file names are being output. If not supplied, "(standard
- input)" is used. There is no short form for this option.
- --line-buffered
- When this option is given, non-compressed input is read and
- processed line by line, and the output is flushed after each
- write. By default, input is read in large chunks, unless
- pcre2grep can determine that it is reading from a terminal,
- which is currently possible only in Unix-like environments or
- Windows. Output to terminal is normally automatically flushed
- by the operating system. This option can be useful when the
- input or output is attached to a pipe and you do not want
- pcre2grep to buffer up large amounts of data. However, its
- use will affect performance, and the -M (multiline) option
- ceases to work. When input is from a compressed .gz or .bz2
- file, --line-buffered is ignored.
- --line-offsets
- Instead of showing lines or parts of lines that match, show
- each match as a line number, the offset from the start of the
- line, and a length. The line number is terminated by a colon
- (as usual; see the -n option), and the offset and length are
- separated by a comma. In this mode, no context is shown.
- That is, the -A, -B, and -C options are ignored. If there is
- more than one match in a line, each of them is shown sepa-
- rately. This option is mutually exclusive with --output,
- --file-offsets, and --only-matching.
- --locale=locale-name
- This option specifies a locale to be used for pattern match-
- ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
- ronment variables. If no locale is specified, the PCRE2 li-
- brary's default (usually the "C" locale) is used. There is no
- short form for this option.
- -M, --multiline
- Allow patterns to match more than one line. When this option
- is set, the PCRE2 library is called in "multiline" mode. This
- allows a matched string to extend past the end of a line and
- continue on one or more subsequent lines. Patterns used with
- -M may usefully contain literal newline characters and inter-
- nal occurrences of ^ and $ characters. The output for a suc-
- cessful match may consist of more than one line. The first
- line is the line in which the match started, and the last
- line is the line in which the match ended. If the matched
- string ends with a newline sequence, the output ends at the
- end of that line. If -v is set, none of the lines in a
- multi-line match are output. Once a match has been handled,
- scanning restarts at the beginning of the line after the one
- in which the match ended.
- The newline sequence that separates multiple lines must be
- matched as part of the pattern. For example, to find the
- phrase "regular expression" in a file where "regular" might
- be at the end of a line and "expression" at the start of the
- next line, you could use this command:
- pcre2grep -M 'regular\s+expression' <file>
- The \s escape sequence matches any white space character, in-
- cluding newlines, and is followed by + so as to match trail-
- ing white space on the first line as well as possibly han-
- dling a two-character newline sequence.
- There is a limit to the number of lines that can be matched,
- imposed by the way that pcre2grep buffers the input file as
- it scans it. With a sufficiently large processing buffer,
- this should not be a problem, but the -M option does not work
- when input is read line by line (see --line-buffered.)
- -m number, --max-count=number
- Stop processing after finding number matching lines, or non-
- matching lines if -v is also set. Any trailing context lines
- are output after the final match. In multiline mode, each
- multiline match counts as just one line for this purpose. If
- this limit is reached when reading the standard input from a
- regular file, the file is left positioned just after the last
- matching line. If -c is also set, the count that is output
- is never greater than number. This option has no effect if
- used with -L, -l, or -q, or when just checking for a match in
- a binary file.
- --match-limit=number
- Processing some regular expression patterns may take a very
- long time to search for all possible matching strings. Others
- may require a very large amount of memory. There are three
- options that set resource limits for matching.
- The --match-limit option provides a means of limiting comput-
- ing resource usage when processing patterns that are not go-
- ing to match, but which have a very large number of possibil-
- ities in their search trees. The classic example is a pattern
- that uses nested unlimited repeats. Internally, PCRE2 has a
- counter that is incremented each time around its main pro-
- cessing loop. If the value set by --match-limit is reached,
- an error occurs.
- The --heap-limit option specifies, as a number of kibibytes
- (units of 1024 bytes), the amount of heap memory that may be
- used for matching. Heap memory is needed only if matching the
- pattern requires a significant number of nested backtracking
- points to be remembered. This parameter can be set to zero to
- forbid the use of heap memory altogether.
- The --depth-limit option limits the depth of nested back-
- tracking points, which indirectly limits the amount of memory
- that is used. The amount of memory needed for each backtrack-
- ing point depends on the number of capturing parentheses in
- the pattern, so the amount of memory that is used before this
- limit acts varies from pattern to pattern. This limit is of
- use only if it is set smaller than --match-limit.
- There are no short forms for these options. The default lim-
- its can be set when the PCRE2 library is compiled; if they
- are not specified, the defaults are very large and so effec-
- tively unlimited.
- --max-buffer-size=number
- This limits the expansion of the processing buffer, whose
- initial size can be set by --buffer-size. The maximum buffer
- size is silently forced to be no smaller than the starting
- buffer size.
- -N newline-type, --newline=newline-type
- Six different conventions for indicating the ends of lines in
- scanned files are supported. For example:
- pcre2grep -N CRLF 'some pattern' <file>
- The newline type may be specified in upper, lower, or mixed
- case. If the newline type is NUL, lines are separated by bi-
- nary zero characters. The other types are the single-charac-
- ter sequences CR (carriage return) and LF (linefeed), the
- two-character sequence CRLF, an "anycrlf" type, which recog-
- nizes any of the preceding three types, and an "any" type,
- for which any Unicode line ending sequence is assumed to end
- a line. The Unicode sequences are the three just mentioned,
- plus VT (vertical tab, U+000B), FF (form feed, U+000C), NEL
- (next line, U+0085), LS (line separator, U+2028), and PS
- (paragraph separator, U+2029).
- When the PCRE2 library is built, a default line-ending se-
- quence is specified. This is normally the standard sequence
- for the operating system. Unless otherwise specified by this
- option, pcre2grep uses the library's default.
- This option makes it possible to use pcre2grep to scan files
- that have come from other environments without having to mod-
- ify their line endings. If the data that is being scanned
- does not agree with the convention set by this option,
- pcre2grep may behave in strange ways. Note that this option
- does not apply to files specified by the -f, --exclude-from,
- or --include-from options, which are expected to use the op-
- erating system's standard newline sequence.
- -n, --line-number
- Precede each output line by its line number in the file, fol-
- lowed by a colon for matching lines or a hyphen for context
- lines. If the file name is also being output, it precedes the
- line number. When the -M option causes a pattern to match
- more than one line, only the first is preceded by its line
- number. This option is forced if --line-offsets is used.
- --no-jit If the PCRE2 library is built with support for just-in-time
- compiling (which speeds up matching), pcre2grep automatically
- makes use of this, unless it was explicitly disabled at build
- time. This option can be used to disable the use of JIT at
- run time. It is provided for testing and working round prob-
- lems. It should never be needed in normal use.
- -O text, --output=text
- When there is a match, instead of outputting the line that
- matched, output just the text specified in this option, fol-
- lowed by an operating-system standard newline. In this mode,
- no context is shown. That is, the -A, -B, and -C options are
- ignored. The --newline option has no effect on this option,
- which is mutually exclusive with --only-matching, --file-off-
- sets, and --line-offsets. However, like --only-matching, if
- there is more than one match in a line, each of them causes a
- line of output.
- Escape sequences starting with a dollar character may be used
- to insert the contents of the matched part of the line and/or
- captured substrings into the text.
- $<digits> or ${<digits>} is replaced by the captured sub-
- string of the given decimal number; zero substitutes the
- whole match. If the number is greater than the number of cap-
- turing substrings, or if the capture is unset, the replace-
- ment is empty.
- $a is replaced by bell; $b by backspace; $e by escape; $f by
- form feed; $n by newline; $r by carriage return; $t by tab;
- $v by vertical tab.
- $o<digits> or $o{<digits>} is replaced by the character whose
- code point is the given octal number. In the first form, up
- to three octal digits are processed. When more digits are
- needed in Unicode mode to specify a wide character, the sec-
- ond form must be used.
- $x<digits> or $x{<digits>} is replaced by the character rep-
- resented by the given hexadecimal number. In the first form,
- up to two hexadecimal digits are processed. When more digits
- are needed in Unicode mode to specify a wide character, the
- second form must be used.
- Any other character is substituted by itself. In particular,
- $$ is replaced by a single dollar.
- -o, --only-matching
- Show only the part of the line that matched a pattern instead
- of the whole line. In this mode, no context is shown. That
- is, the -A, -B, and -C options are ignored. If there is more
- than one match in a line, each of them is shown separately,
- on a separate line of output. If -o is combined with -v (in-
- vert the sense of the match to find non-matching lines), no
- output is generated, but the return code is set appropri-
- ately. If the matched portion of the line is empty, nothing
- is output unless the file name or line number are being
- printed, in which case they are shown on an otherwise empty
- line. This option is mutually exclusive with --output,
- --file-offsets and --line-offsets.
- -onumber, --only-matching=number
- Show only the part of the line that matched the capturing
- parentheses of the given number. Up to 50 capturing parenthe-
- ses are supported by default. This limit can be changed via
- the --om-capture option. A pattern may contain any number of
- capturing parentheses, but only those whose number is within
- the limit can be accessed by -o. An error occurs if the num-
- ber specified by -o is greater than the limit.
- -o0 is the same as -o without a number. Because these options
- can be given without an argument (see above), if an argument
- is present, it must be given in the same shell item, for ex-
- ample, -o3 or --only-matching=2. The comments given for the
- non-argument case above also apply to this option. If the
- specified capturing parentheses do not exist in the pattern,
- or were not set in the match, nothing is output unless the
- file name or line number are being output.
- If this option is given multiple times, multiple substrings
- are output for each match, in the order the options are
- given, and all on one line. For example, -o3 -o1 -o3 causes
- the substrings matched by capturing parentheses 3 and 1 and
- then 3 again to be output. By default, there is no separator
- (but see the next but one option).
- --om-capture=number
- Set the number of capturing parentheses that can be accessed
- by -o. The default is 50.
- --om-separator=text
- Specify a separating string for multiple occurrences of -o.
- The default is an empty string. Separating strings are never
- coloured.
- -q, --quiet
- Work quietly, that is, display nothing except error messages.
- The exit status indicates whether or not any matches were
- found.
- -r, --recursive
- If any given path is a directory, recursively scan the files
- it contains, taking note of any --include and --exclude set-
- tings. By default, a directory is read as a normal file; in
- some operating systems this gives an immediate end-of-file.
- This option is a shorthand for setting the -d option to "re-
- curse".
- --recursion-limit=number
- This is an obsolete synonym for --depth-limit. See --match-
- limit above for details.
- -s, --no-messages
- Suppress error messages about non-existent or unreadable
- files. Such files are quietly skipped. However, the return
- code is still 2, even if matches were found in other files.
- -t, --total-count
- This option is useful when scanning more than one file. If
- used on its own, -t suppresses all output except for a grand
- total number of matching lines (or non-matching lines if -v
- is used) in all the files. If -t is used with -c, a grand to-
- tal is output except when the previous output is just one
- line. In other words, it is not output when just one file's
- count is listed. If file names are being output, the grand
- total is preceded by "TOTAL:". Otherwise, it appears as just
- another number. The -t option is ignored when used with -L
- (list files without matches), because the grand total would
- always be zero.
- -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
- has been compiled with UTF-8 support. All patterns (including
- those for any --exclude and --include options) and all lines
- that are scanned must be valid strings of UTF-8 characters.
- If an invalid UTF-8 string is encountered, an error occurs.
- -U, --utf-allow-invalid
- As --utf, but in addition subject lines may contain invalid
- UTF-8 code unit sequences. These can never form part of any
- pattern match. Patterns themselves, however, must still be
- valid UTF-8 strings. This facility allows valid UTF-8 strings
- to be sought within arbitrary byte sequences in executable or
- other binary files. For more details about matching in non-
- valid UTF-8 strings, see the pcre2unicode(3) documentation.
- -V, --version
- Write the version numbers of pcre2grep and the PCRE2 library
- to the standard output and then exit. Anything else on the
- command line is ignored.
- -v, --invert-match
- Invert the sense of the match, so that lines which do not
- match any of the patterns are the ones that are found. When
- this option is set, options such as --only-matching and
- --output, which specify parts of a match that are to be out-
- put, are ignored.
- -w, --word-regex, --word-regexp
- Force the patterns only to match "words". That is, there must
- be a word boundary at the start and end of each matched
- string. This is equivalent to having "\b(?:" at the start of
- each pattern, and ")\b" at the end. This option applies only
- to the patterns that are matched against the contents of
- files; it does not apply to patterns specified by any of the
- --include or --exclude options.
- -x, --line-regex, --line-regexp
- Force the patterns to start matching only at the beginnings
- of lines, and in addition, require them to match entire
- lines. In multiline mode the match may be more than one line.
- This is equivalent to having "^(?:" at the start of each pat-
- tern and ")$" at the end. This option applies only to the
- patterns that are matched against the contents of files; it
- does not apply to patterns specified by any of the --include
- or --exclude options.
- ENVIRONMENT VARIABLES
- The environment variables LC_ALL and LC_CTYPE are examined, in that or-
- der, for a locale. The first one that is set is used. This can be over-
- ridden by the --locale option. If no locale is set, the PCRE2 library's
- default (usually the "C" locale) is used.
- NEWLINES
- The -N (--newline) option allows pcre2grep to scan files with newline
- conventions that differ from the default. This option affects only the
- way scanned files are processed. It does not affect the interpretation
- of files specified by the -f, --file-list, --exclude-from, or --in-
- clude-from options.
- Any parts of the scanned input files that are written to the standard
- output are copied with whatever newline sequences they have in the in-
- put. However, if the final line of a file is output, and it does not
- end with a newline sequence, a newline sequence is added. If the new-
- line setting is CR, LF, CRLF or NUL, that line ending is output; for
- the other settings (ANYCRLF or ANY) a single NL is used.
- The newline setting does not affect the way in which pcre2grep writes
- newlines in informational messages to the standard output and error
- streams. Under Windows, the standard output is set to be binary, so
- that "\r\n" at the ends of output lines that are copied from the input
- is not converted to "\r\r\n" by the C I/O library. This means that any
- messages written to the standard output must end with "\r\n". For all
- other operating systems, and for all messages to the standard error
- stream, "\n" is used.
- OPTIONS COMPATIBILITY
- Many of the short and long forms of pcre2grep's options are the same as
- in the GNU grep program. Any long option of the form --xxx-regexp (GNU
- terminology) is also available as --xxx-regex (PCRE2 terminology). How-
- ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
- --include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
- line, -N, --newline, --om-separator, --output, -u, --utf, -U, and
- --utf-allow-invalid options are specific to pcre2grep, as is the use of
- the --only-matching option with a capturing parentheses number.
- Although most of the common options work the same way, a few are dif-
- ferent in pcre2grep. For example, the --include option's argument is a
- glob for GNU grep, but a regular expression for pcre2grep. If both the
- -c and -l options are given, GNU grep lists only file names, without
- counts, but pcre2grep gives the counts as well.
- OPTIONS WITH DATA
- There are four different ways in which an option with data can be spec-
- ified. If a short form option is used, the data may follow immedi-
- ately, or (with one exception) in the next command line item. For exam-
- ple:
- -f/some/file
- -f /some/file
- The exception is the -o option, which may appear with or without data.
- Because of this, if data is present, it must follow immediately in the
- same item, for example -o3.
- If a long form option is used, the data may appear in the same command
- line item, separated by an equals character, or (with two exceptions)
- it may appear in the next command line item. For example:
- --file=/some/file
- --file /some/file
- Note, however, that if you want to supply a file name beginning with ~
- as data in a shell command, and have the shell expand ~ to a home di-
- rectory, you must separate the file name from the option, because the
- shell does not treat ~ specially unless it is at the start of an item.
- The exceptions to the above are the --colour (or --color) and --only-
- matching options, for which the data is optional. If one of these op-
- tions does have data, it must be given in the first form, using an
- equals character. Otherwise pcre2grep will assume that it has no data.
- USING PCRE2'S CALLOUT FACILITY
- pcre2grep has, by default, support for calling external programs or
- scripts or echoing specific strings during matching by making use of
- PCRE2's callout facility. However, this support can be completely or
- partially disabled when pcre2grep is built. You can find out whether
- your binary has support for callouts by running it with the --help op-
- tion. If callout support is completely disabled, all callouts in pat-
- terns are ignored by pcre2grep. If the facility is partially disabled,
- calling external programs is not supported, and callouts that request
- it are ignored.
- A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu-
- ment is either a number or a quoted string (see the pcre2callout docu-
- mentation for details). Numbered callouts are ignored by pcre2grep;
- only callouts with string arguments are useful.
- Echoing a specific string
- Starting the callout string with a pipe character invokes an echoing
- facility that avoids calling an external program or script. This facil-
- ity is always available, provided that callouts were not completely
- disabled when pcre2grep was built. The rest of the callout string is
- processed as a zero-terminated string, which means it should not con-
- tain any internal binary zeros. It is written to the output, having
- first been passed through the same escape processing as text from the
- --output (-O) option (see above). However, $0 cannot be used to insert
- a matched substring because the match is still in progress. Instead,
- the single character '0' is inserted. Any syntax errors in the string
- (for example, a dollar not followed by another character) causes the
- callout to be ignored. No terminator is added to the output string, so
- if you want a newline, you must include it explicitly using the escape
- $n. For example:
- pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file>
- Matching continues normally after the string is output. If you want to
- see only the callout output but not any output from an actual match,
- you should end the pattern with (*FAIL).
- Calling external programs or scripts
- This facility can be independently disabled when pcre2grep is built. It
- is supported for Windows, where a call to _spawnvp() is used, for VMS,
- where lib$spawn() is used, and for any Unix-like environment where
- fork() and execv() are available.
- If the callout string does not start with a pipe (vertical bar) charac-
- ter, it is parsed into a list of substrings separated by pipe charac-
- ters. The first substring must be an executable name, with the follow-
- ing substrings specifying arguments:
- executable_name|arg1|arg2|...
- Any substring (including the executable name) may contain escape se-
- quences started by a dollar character. These are the same as for the
- --output (-O) option documented above, except that $0 cannot insert the
- matched string because the match is still in progress. Instead, the
- character '0' is inserted. If you need a literal dollar or pipe charac-
- ter in any substring, use $$ or $| respectively. Here is an example:
- echo -e "abcde\n12345" | pcre2grep \
- '(?x)(.)(..(.))
- (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
- Output:
- Arg1: [a] [bcd] [d] Arg2: |a| ()
- abcde
- Arg1: [1] [234] [4] Arg2: |1| ()
- 12345
- The parameters for the system call that is used to run the program or
- script are zero-terminated strings. This means that binary zero charac-
- ters in the callout argument will cause premature termination of their
- substrings, and therefore should not be present. Any syntax errors in
- the string (for example, a dollar not followed by another character)
- causes the callout to be ignored. If running the program fails for any
- reason (including the non-existence of the executable), a local match-
- ing failure occurs and the matcher backtracks in the normal way.
- MATCHING ERRORS
- It is possible to supply a regular expression that takes a very long
- time to fail to match certain lines. Such patterns normally involve
- nested indefinite repeats, for example: (a+)*\d when matched against a
- line of a's with no final digit. The PCRE2 matching function has a re-
- source limit that causes it to abort in these circumstances. If this
- happens, pcre2grep outputs an error message and the line that caused
- the problem to the standard error stream. If there are more than 20
- such errors, pcre2grep gives up.
- The --match-limit option of pcre2grep can be used to set the overall
- resource limit. There are also other limits that affect the amount of
- memory used during matching; see the discussion of --heap-limit and
- --depth-limit above.
- DIAGNOSTICS
- Exit status is 0 if any matches were found, 1 if no matches were found,
- and 2 for syntax errors, overlong lines, non-existent or inaccessible
- files (even if matches were found in other files) or too many matching
- errors. Using the -s option to suppress error messages about inaccessi-
- ble files does not affect the return code.
- When run under VMS, the return code is placed in the symbol
- PCRE2GREP_RC because VMS does not distinguish between exit(0) and
- exit(1).
- SEE ALSO
- pcre2pattern(3), pcre2syntax(3), pcre2callout(3), pcre2unicode(3).
- AUTHOR
- Philip Hazel
- Retired from University Computing Service
- Cambridge, England.
- REVISION
- Last updated: 31 August 2021
- Copyright (c) 1997-2021 University of Cambridge.
|