123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095 |
- PCRE2GREP(1) General Commands Manual PCRE2GREP(1)
- NAME
- pcre2grep - a grep with Perl-compatible regular expressions.
- SYNOPSIS
- pcre2grep [options] [long options] [pattern] [path1 path2 ...]
- DESCRIPTION
- pcre2grep searches files for character patterns, in the same way as
- other grep commands do, but it uses the PCRE2 regular expression li-
- brary to support patterns that are compatible with the regular expres-
- sions of Perl 5. See pcre2syntax(3) for a quick-reference summary of
- pattern syntax, or pcre2pattern(3) for a full description of the syntax
- and semantics of the regular expressions that PCRE2 supports.
- Patterns, whether supplied on the command line or in a separate file,
- are given without delimiters. For example:
- pcre2grep Thursday /etc/motd
- If you attempt to use delimiters (for example, by surrounding a pattern
- with slashes, as is common in Perl scripts), they are interpreted as
- part of the pattern. Quotes can of course be used to delimit patterns
- on the command line because they are interpreted by the shell, and in-
- deed quotes are required if a pattern contains white space or shell
- metacharacters.
- The first argument that follows any option settings is treated as the
- single pattern to be matched when neither -e nor -f is present. Con-
- versely, when one or both of these options are used to specify pat-
- terns, all arguments are treated as path names. At least one of -e, -f,
- or an argument pattern must be provided.
- If no files are specified, pcre2grep reads the standard input. The
- standard input can also be referenced by a name consisting of a single
- hyphen. For example:
- pcre2grep some-pattern file1 - file3
- By default, input files are searched line by line, so pattern asser-
- tions about the beginning and end of a subject string (^, $, \A, \Z,
- and \z) match at the beginning and end of each line. When a line
- matches a pattern, it is copied to the standard output, and if there is
- more than one file, the file name is output at the start of each line,
- followed by a colon. However, there are options that can change how
- pcre2grep behaves. For example, the -M option makes it possible to
- search for strings that span line boundaries. What defines a line
- boundary is controlled by the -N (--newline) option. The -h and -H op-
- tions control whether or not file names are shown, and the -Z option
- changes the file name terminator to a zero byte.
- The amount of memory used for buffering files that are being scanned is
- controlled by parameters that can be set by the --buffer-size and
- --max-buffer-size options. The first of these sets the size of buffer
- that is obtained at the start of processing. If an input file contains
- very long lines, a larger buffer may be needed; this is handled by au-
- tomatically extending the buffer, up to the limit specified by --max-
- buffer-size. The default values for these parameters can be set when
- pcre2grep is built; if nothing is specified, the defaults are set to
- 20KiB and 1MiB respectively. An error occurs if a line is too long and
- the buffer can no longer be expanded.
- The block of memory that is actually used is three times the "buffer
- size", to allow for buffering "before" and "after" lines. If the buffer
- size is too small, fewer than requested "before" and "after" lines may
- be output.
- When matching with a multiline pattern, the size of the buffer must be
- at least half of the maximum match expected or the pattern might fail
- to match.
- Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the
- greater. BUFSIZ is defined in <stdio.h>. When there is more than one
- pattern (specified by the use of -e and/or -f), each pattern is applied
- to each line in the order in which they are defined, except that all
- the -e patterns are tried before the -f patterns.
- By default, as soon as one pattern matches a line, no further patterns
- are considered. However, if --colour (or --color) is used to colour the
- matching substrings, or if --only-matching, --file-offsets, --line-off-
- sets, or --output is used to output only the part of the line that
- matched (either shown literally, or as an offset), the behaviour is
- different. In this situation, all the patterns are applied to the line.
- If there is more than one match, the one that begins nearest to the
- start of the subject is processed; if there is more than one match at
- that position, the one with the longest matching substring is
- processed; if the matching substrings are equal, the first match found
- is processed.
- Scanning with all the patterns resumes immediately following the match,
- so that later matches on the same line can be found. Note, however,
- that an overlapping match that starts in the middle of another match
- will not be processed.
- The above behaviour was changed at release 10.41 to be more compatible
- with GNU grep. In earlier releases, pcre2grep did not recognize matches
- from later patterns that were earlier in the subject.
- Patterns that can match an empty string are accepted, but empty string
- matches are never recognized. An example is the pattern "(su-
- per)?(man)?", in which all components are optional. This pattern finds
- all occurrences of both "super" and "man"; the output differs from
- matching with "super|man" when only the matching substrings are being
- shown.
- If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
- the value to set a locale when calling the PCRE2 library. The --locale
- option can be used to override this.
- SUPPORT FOR COMPRESSED FILES
- Compile-time options for pcre2grep can set it up to use libz or libbz2
- for reading compressed files whose names end in .gz or .bz2, respec-
- tively. You can find out whether your pcre2grep binary has support for
- one or both of these file types by running it with the --help option.
- If the appropriate support is not present, all files are treated as
- plain text. The standard input is always so treated. If a file with a
- .gz or .bz2 extension is not in fact compressed, it is read as a plain
- text file. When input is from a compressed .gz or .bz2 file, the
- --line-buffered option is ignored.
- BINARY FILES
- By default, a file that contains a binary zero byte within the first
- 1024 bytes is identified as a binary file, and is processed specially.
- However, if the newline type is specified as NUL, that is, the line
- terminator is a binary zero, the test for a binary file is not applied.
- See the --binary-files option for a means of changing the way binary
- files are handled.
- BINARY ZEROS IN PATTERNS
- Patterns passed from the command line are strings that are terminated
- by a binary zero, so cannot contain internal zeros. However, patterns
- that are read from a file via the -f option may contain binary zeros.
- OPTIONS
- The order in which some of the options appear can affect the output.
- For example, both the -H and -l options affect the printing of file
- names. Whichever comes later in the command line will be the one that
- takes effect. Similarly, except where noted below, if an option is
- given twice, the later setting is used. Numerical values for options
- may be followed by K or M, to signify multiplication by 1024 or
- 1024*1024 respectively.
- -- This terminates the list of options. It is useful if the next
- item on the command line starts with a hyphen but is not an
- option. This allows for the processing of patterns and file
- names that start with hyphens.
- -A number, --after-context=number
- Output up to number lines of context after each matching
- line. Fewer lines are output if the next match or the end of
- the file is reached, or if the processing buffer size has
- been set too small. If file names and/or line numbers are be-
- ing output, a hyphen separator is used instead of a colon for
- the context lines (the -Z option can be used to change the
- file name terminator to a zero byte). A line containing "--"
- is output between each group of lines, unless they are in
- fact contiguous in the input file. The value of number is ex-
- pected to be relatively small. When -c is used, -A is ig-
- nored.
- -a, --text
- Treat binary files as text. This is equivalent to --binary-
- files=text.
- --allow-lookaround-bsk
- PCRE2 now forbids the use of \K in lookarounds by default, in
- line with Perl. This option causes pcre2grep to set the
- PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option, which enables this
- somewhat dangerous usage.
- -B number, --before-context=number
- Output up to number lines of context before each matching
- line. Fewer lines are output if the previous match or the
- start of the file is within number lines, or if the process-
- ing buffer size has been set too small. If file names and/or
- line numbers are being output, a hyphen separator is used in-
- stead of a colon for the context lines (the -Z option can be
- used to change the file name terminator to a zero byte). A
- line containing "--" is output between each group of lines,
- unless they are in fact contiguous in the input file. The
- value of number is expected to be relatively small. When -c
- is used, -B is ignored.
- --binary-files=word
- Specify how binary files are to be processed. If the word is
- "binary" (the default), pattern matching is performed on bi-
- nary files, but the only output is "Binary file <name>
- matches" when a match succeeds. If the word is "text", which
- is equivalent to the -a or --text option, binary files are
- processed in the same way as any other file. In this case,
- when a match succeeds, the output may be binary garbage,
- which can have nasty effects if sent to a terminal. If the
- word is "without-match", which is equivalent to the -I op-
- tion, binary files are not processed at all; they are assumed
- not to be of interest and are skipped without causing any
- output or affecting the return code.
- --buffer-size=number
- Set the parameter that controls how much memory is obtained
- at the start of processing for buffering files that are being
- scanned. See also --max-buffer-size below.
- -C number, --context=number
- Output number lines of context both before and after each
- matching line. This is equivalent to setting both -A and -B
- to the same value.
- -c, --count
- Do not output lines from the files that are being scanned;
- instead output the number of lines that would have been
- shown, either because they matched, or, if -v is set, because
- they failed to match. By default, this count is exactly the
- same as the number of lines that would have been output, but
- if the -M (multiline) option is used (without -v), there may
- be more suppressed lines than the count (that is, the number
- of matches).
- If no lines are selected, the number zero is output. If sev-
- eral files are being scanned, a count is output for each of
- them and the -t option can be used to cause a total to be
- output at the end. However, if the --files-with-matches op-
- tion is also used, only those files whose counts are greater
- than zero are listed. When -c is used, the -A, -B, and -C op-
- tions are ignored.
- --colour, --color
- If this option is given without any data, it is equivalent to
- "--colour=auto". If data is required, it must be given in
- the same shell item, separated by an equals sign.
- --colour=value, --color=value
- This option specifies under what circumstances the parts of a
- line that matched a pattern should be coloured in the output.
- It is ignored if --file-offsets, --line-offsets, or --output
- is set. By default, output is not coloured. The value for the
- --colour option (which is optional, see above) may be
- "never", "always", or "auto". In the latter case, colouring
- happens only if the standard output is connected to a termi-
- nal. More resources are used when colouring is enabled, be-
- cause pcre2grep has to search for all possible matches in a
- line, not just one, in order to colour them all.
- The colour that is used can be specified by setting one of
- the environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR,
- PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
- order. If none of these are set, pcre2grep looks for
- GREP_COLORS or GREP_COLOR (in that order). The value of the
- variable should be a string of two numbers, separated by a
- semicolon, except in the case of GREP_COLORS, which must
- start with "ms=" or "mt=" followed by two semicolon-separated
- colours, terminated by the end of the string or by a colon.
- If GREP_COLORS does not start with "ms=" or "mt=" it is ig-
- nored, and GREP_COLOR is checked.
- If the string obtained from one of the above variables con-
- tains any characters other than semicolon or digits, the set-
- ting is ignored and the default colour is used. The string is
- copied directly into the control string for setting colour on
- a terminal, so it is your responsibility to ensure that the
- values make sense. If no relevant environment variable is
- set, the default is "1;31", which gives red.
- -D action, --devices=action
- If an input path is not a regular file or a directory, "ac-
- tion" specifies how it is to be processed. Valid values are
- "read" (the default) or "skip" (silently skip the path).
- -d action, --directories=action
- If an input path is a directory, "action" specifies how it is
- to be processed. Valid values are "read" (the default in
- non-Windows environments, for compatibility with GNU grep),
- "recurse" (equivalent to the -r option), or "skip" (silently
- skip the path, the default in Windows environments). In the
- "read" case, directories are read as if they were ordinary
- files. In some operating systems the effect of reading a di-
- rectory like this is an immediate end-of-file; in others it
- may provoke an error.
- --depth-limit=number
- See --match-limit below.
- -E, --case-restrict
- When case distinctions are being ignored in Unicode mode, two
- ASCII letters (K and S) will by default match Unicode charac-
- ters U+212A (Kelvin sign) and U+017F (long S) respectively,
- as well as their lower case ASCII counterparts. When this op-
- tion is set, case equivalences are restricted such that no
- ASCII character matches a non-ASCII character, and vice
- versa.
- -e pattern, --regex=pattern, --regexp=pattern
- Specify a pattern to be matched. This option can be used mul-
- tiple times in order to specify several patterns. It can also
- be used as a way of specifying a single pattern that starts
- with a hyphen. When -e is used, no argument pattern is taken
- from the command line; all arguments are treated as file
- names. There is no limit to the number of patterns. They are
- applied to each line in the order in which they are defined.
- If -f is used with -e, the command line patterns are matched
- first, followed by the patterns from the file(s), independent
- of the order in which these options are specified.
- --exclude=pattern
- Files (but not directories) whose names match the pattern are
- skipped without being processed. This applies to all files,
- whether listed on the command line, obtained from --file-
- list, or by scanning a directory. The pattern is a PCRE2 reg-
- ular expression, and is matched against the final component
- of the file name, not the entire path. The -F, -w, and -x op-
- tions do not apply to this pattern. The option may be given
- any number of times in order to specify multiple patterns. If
- a file name matches both an --include and an --exclude pat-
- tern, it is excluded. There is no short form for this option.
- --exclude-from=filename
- Treat each non-empty line of the file as the data for an
- --exclude option. What constitutes a newline when reading the
- file is the operating system's default. The --newline option
- has no effect on this option. This option may be given more
- than once in order to specify a number of files to read.
- --exclude-dir=pattern
- Directories whose names match the pattern are skipped without
- being processed, whatever the setting of the --recursive op-
- tion. This applies to all directories, whether listed on the
- command line, obtained from --file-list, or by scanning a
- parent directory. The pattern is a PCRE2 regular expression,
- and is matched against the final component of the directory
- name, not the entire path. The -F, -w, and -x options do not
- apply to this pattern. The option may be given any number of
- times in order to specify more than one pattern. If a direc-
- tory matches both --include-dir and --exclude-dir, it is ex-
- cluded. There is no short form for this option.
- -F, --fixed-strings
- Interpret each data-matching pattern as a list of fixed
- strings, separated by newlines, instead of as a regular ex-
- pression. What constitutes a newline for this purpose is con-
- trolled by the --newline option. The -w (match as a word) and
- -x (match whole line) options can be used with -F. They ap-
- ply to each of the fixed strings. A line is selected if any
- of the fixed strings are found in it (subject to -w or -x, if
- present). This option applies only to the patterns that are
- matched against the contents of files; it does not apply to
- patterns specified by any of the --include or --exclude op-
- tions.
- -f filename, --file=filename
- Read patterns from the file, one per line. As is the case
- with patterns on the command line, no delimiters should be
- used. What constitutes a newline when reading the file is the
- operating system's default interpretation of \n. The --new-
- line option has no effect on this option. Trailing white
- space is removed from each line, and blank lines are ignored.
- An empty file contains no patterns and therefore matches
- nothing. Patterns read from a file in this way may contain
- binary zeros, which are treated as ordinary data characters.
- If this option is given more than once, all the specified
- files are read. A data line is output if any of the patterns
- match it. A file name can be given as "-" to refer to the
- standard input. When -f is used, patterns specified on the
- command line using -e may also be present; they are matched
- before the file's patterns. However, no pattern is taken from
- the command line; all arguments are treated as the names of
- paths to be searched.
- --file-list=filename
- Read a list of files and/or directories that are to be
- scanned from the given file, one per line. What constitutes a
- newline when reading the file is the operating system's de-
- fault. Trailing white space is removed from each line, and
- blank lines are ignored. These paths are processed before any
- that are listed on the command line. The file name can be
- given as "-" to refer to the standard input. If --file and
- --file-list are both specified as "-", patterns are read
- first. This is useful only when the standard input is a ter-
- minal, from which further lines (the list of files) can be
- read after an end-of-file indication. If this option is given
- more than once, all the specified files are read.
- --file-offsets
- Instead of showing lines or parts of lines that match, show
- each match as an offset from the start of the file and a
- length, separated by a comma. In this mode, --colour has no
- effect, and no context is shown. That is, the -A, -B, and -C
- options are ignored. If there is more than one match in a
- line, each of them is shown separately. This option is mutu-
- ally exclusive with --output, --line-offsets, and --only-
- matching.
- --group-separator=text
- Output this text string instead of two hyphens between groups
- of lines when -A, -B, or -C is in use. See also --no-group-
- separator.
- -H, --with-filename
- Force the inclusion of the file name at the start of output
- lines when searching a single file. The file name is not nor-
- mally shown in this case. By default, for matching lines,
- the file name is followed by a colon; for context lines, a
- hyphen separator is used. The -Z option can be used to change
- the terminator to a zero byte. If a line number is also being
- output, it follows the file name. When the -M option causes a
- pattern to match more than one line, only the first is pre-
- ceded by the file name. This option overrides any previous
- -h, -l, or -L options.
- -h, --no-filename
- Suppress the output file names when searching multiple files.
- File names are normally shown when multiple files are
- searched. By default, for matching lines, the file name is
- followed by a colon; for context lines, a hyphen separator is
- used. The -Z option can be used to change the terminator to a
- zero byte. If a line number is also being output, it follows
- the file name. This option overrides any previous -H, -L, or
- -l options.
- --heap-limit=number
- See --match-limit below.
- --help Output a help message, giving brief details of the command
- options and file type support, and then exit. Anything else
- on the command line is ignored.
- -I Ignore binary files. This is equivalent to --binary-
- files=without-match.
- -i, --ignore-case
- Ignore upper/lower case distinctions when pattern matching.
- This applies when matching path names for inclusion or exclu-
- sion as well as when matching lines in files.
- --include=pattern
- If any --include patterns are specified, the only files that
- are processed are those whose names match one of the patterns
- and do not match an --exclude pattern. This option does not
- affect directories, but it applies to all files, whether
- listed on the command line, obtained from --file-list, or by
- scanning a directory. The pattern is a PCRE2 regular expres-
- sion, and is matched against the final component of the file
- name, not the entire path. The -F, -w, and -x options do not
- apply to this pattern. The option may be given any number of
- times. If a file name matches both an --include and an --ex-
- clude pattern, it is excluded. There is no short form for
- this option.
- --include-from=filename
- Treat each non-empty line of the file as the data for an
- --include option. What constitutes a newline for this purpose
- is the operating system's default. The --newline option has
- no effect on this option. This option may be given any number
- of times; all the files are read.
- --include-dir=pattern
- If any --include-dir patterns are specified, the only direc-
- tories that are processed are those whose names match one of
- the patterns and do not match an --exclude-dir pattern. This
- applies to all directories, whether listed on the command
- line, obtained from --file-list, or by scanning a parent di-
- rectory. The pattern is a PCRE2 regular expression, and is
- matched against the final component of the directory name,
- not the entire path. The -F, -w, and -x options do not apply
- to this pattern. The option may be given any number of times.
- If a directory matches both --include-dir and --exclude-dir,
- it is excluded. There is no short form for this option.
- -L, --files-without-match
- Instead of outputting lines from the files, just output the
- names of the files that do not contain any lines that would
- have been output. Each file name is output once, on a sepa-
- rate line by default, but if the -Z option is set, they are
- separated by zero bytes instead of newlines. This option
- overrides any previous -H, -h, or -l options.
- -l, --files-with-matches
- Instead of outputting lines from the files, just output the
- names of the files containing lines that would have been out-
- put. Each file name is output once, on a separate line, but
- if the -Z option is set, they are separated by zero bytes in-
- stead of newlines. Searching normally stops as soon as a
- matching line is found in a file. However, if the -c (count)
- option is also used, matching continues in order to obtain
- the correct count, and those files that have at least one
- match are listed along with their counts. Using this option
- with -c is a way of suppressing the listing of files with no
- matches that occurs with -c on its own. This option overrides
- any previous -H, -h, or -L options.
- --label=name
- This option supplies a name to be used for the standard input
- when file names are being output. If not supplied, "(standard
- input)" is used. There is no short form for this option.
- --line-buffered
- When this option is given, non-compressed input is read and
- processed line by line, and the output is flushed after each
- write. By default, input is read in large chunks, unless
- pcre2grep can determine that it is reading from a terminal,
- which is currently possible only in Unix-like environments or
- Windows. Output to terminal is normally automatically flushed
- by the operating system. This option can be useful when the
- input or output is attached to a pipe and you do not want
- pcre2grep to buffer up large amounts of data. However, its
- use will affect performance, and the -M (multiline) option
- ceases to work. When input is from a compressed .gz or .bz2
- file, --line-buffered is ignored.
- --line-offsets
- Instead of showing lines or parts of lines that match, show
- each match as a line number, the offset from the start of the
- line, and a length. The line number is terminated by a colon
- (as usual; see the -n option), and the offset and length are
- separated by a comma. In this mode, --colour has no effect,
- and no context is shown. That is, the -A, -B, and -C options
- are ignored. If there is more than one match in a line, each
- of them is shown separately. This option is mutually exclu-
- sive with --output, --file-offsets, and --only-matching.
- --locale=locale-name
- This option specifies a locale to be used for pattern match-
- ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
- ronment variables. If no locale is specified, the PCRE2 li-
- brary's default (usually the "C" locale) is used. There is no
- short form for this option.
- -M, --multiline
- Allow patterns to match more than one line. When this option
- is set, the PCRE2 library is called in "multiline" mode, and
- a match is allowed to continue past the end of the initial
- line and onto one or more subsequent lines.
- Patterns used with -M may usefully contain literal newline
- characters and internal occurrences of ^ and $ characters,
- because in multiline mode these can match at internal new-
- lines. Because pcre2grep is scanning multiple lines, the \Z
- and \z assertions match only at the end of the last line in
- the file. The \A assertion matches at the start of the first
- line of a match. This can be any line in the file; it is not
- anchored to the first line.
- The output for a successful match may consist of more than
- one line. The first line is the line in which the match
- started, and the last line is the line in which the match
- ended. If the matched string ends with a newline sequence,
- the output ends at the end of that line. If -v is set, none
- of the lines in a multi-line match are output. Once a match
- has been handled, scanning restarts at the beginning of the
- line after the one in which the match ended.
- The newline sequence that separates multiple lines must be
- matched as part of the pattern. For example, to find the
- phrase "regular expression" in a file where "regular" might
- be at the end of a line and "expression" at the start of the
- next line, you could use this command:
- pcre2grep -M 'regular\s+expression' <file>
- The \s escape sequence matches any white space character, in-
- cluding newlines, and is followed by + so as to match trail-
- ing white space on the first line as well as possibly han-
- dling a two-character newline sequence.
- There is a limit to the number of lines that can be matched,
- imposed by the way that pcre2grep buffers the input file as
- it scans it. With a sufficiently large processing buffer,
- this should not be a problem.
- The -M option does not work when input is read line by line
- (see --line-buffered.)
- -m number, --max-count=number
- Stop processing after finding number matching lines, or non-
- matching lines if -v is also set. Any trailing context lines
- are output after the final match. In multiline mode, each
- multiline match counts as just one line for this purpose. If
- this limit is reached when reading the standard input from a
- regular file, the file is left positioned just after the last
- matching line. If -c is also set, the count that is output
- is never greater than number. This option has no effect if
- used with -L, -l, or -q, or when just checking for a match in
- a binary file.
- --match-limit=number
- Processing some regular expression patterns may take a very
- long time to search for all possible matching strings. Others
- may require a very large amount of memory. There are three
- options that set resource limits for matching.
- The --match-limit option provides a means of limiting comput-
- ing resource usage when processing patterns that are not go-
- ing to match, but which have a very large number of possibil-
- ities in their search trees. The classic example is a pattern
- that uses nested unlimited repeats. Internally, PCRE2 has a
- counter that is incremented each time around its main pro-
- cessing loop. If the value set by --match-limit is reached,
- an error occurs.
- The --heap-limit option specifies, as a number of kibibytes
- (units of 1024 bytes), the maximum amount of heap memory that
- may be used for matching.
- The --depth-limit option limits the depth of nested back-
- tracking points, which indirectly limits the amount of memory
- that is used. The amount of memory needed for each backtrack-
- ing point depends on the number of capturing parentheses in
- the pattern, so the amount of memory that is used before this
- limit acts varies from pattern to pattern. This limit is of
- use only if it is set smaller than --match-limit.
- There are no short forms for these options. The default lim-
- its can be set when the PCRE2 library is compiled; if they
- are not specified, the defaults are very large and so effec-
- tively unlimited.
- --max-buffer-size=number
- This limits the expansion of the processing buffer, whose
- initial size can be set by --buffer-size. The maximum buffer
- size is silently forced to be no smaller than the starting
- buffer size.
- -N newline-type, --newline=newline-type
- Six different conventions for indicating the ends of lines in
- scanned files are supported. For example:
- pcre2grep -N CRLF 'some pattern' <file>
- The newline type may be specified in upper, lower, or mixed
- case. If the newline type is NUL, lines are separated by bi-
- nary zero characters. The other types are the single-charac-
- ter sequences CR (carriage return) and LF (linefeed), the
- two-character sequence CRLF, an "anycrlf" type, which recog-
- nizes any of the preceding three types, and an "any" type,
- for which any Unicode line ending sequence is assumed to end
- a line. The Unicode sequences are the three just mentioned,
- plus VT (vertical tab, U+000B), FF (form feed, U+000C), NEL
- (next line, U+0085), LS (line separator, U+2028), and PS
- (paragraph separator, U+2029).
- When the PCRE2 library is built, a default line-ending se-
- quence is specified. This is normally the standard sequence
- for the operating system. Unless otherwise specified by this
- option, pcre2grep uses the library's default.
- This option makes it possible to use pcre2grep to scan files
- that have come from other environments without having to mod-
- ify their line endings. If the data that is being scanned
- does not agree with the convention set by this option,
- pcre2grep may behave in strange ways. Note that this option
- does not apply to files specified by the -f, --exclude-from,
- or --include-from options, which are expected to use the op-
- erating system's standard newline sequence.
- -n, --line-number
- Precede each output line by its line number in the file, fol-
- lowed by a colon for matching lines or a hyphen for context
- lines. If the file name is also being output, it precedes the
- line number. When the -M option causes a pattern to match
- more than one line, only the first is preceded by its line
- number. This option is forced if --line-offsets is used.
- --no-group-separator
- Do not output a separator between groups of lines when -A,
- -B, or -C is in use. The default is to output a line contain-
- ing two hyphens. See also --group-separator.
- --no-jit If the PCRE2 library is built with support for just-in-time
- compiling (which speeds up matching), pcre2grep automatically
- makes use of this, unless it was explicitly disabled at build
- time. This option can be used to disable the use of JIT at
- run time. It is provided for testing and working around prob-
- lems. It should never be needed in normal use.
- -O text, --output=text
- When there is a match, instead of outputting the line that
- matched, output just the text specified in this option, fol-
- lowed by an operating-system standard newline. In this mode,
- --colour has no effect, and no context is shown. That is,
- the -A, -B, and -C options are ignored. The --newline option
- has no effect on this option, which is mutually exclusive
- with --only-matching, --file-offsets, and --line-offsets.
- However, like --only-matching, if there is more than one
- match in a line, each of them causes a line of output.
- Escape sequences starting with a dollar character may be used
- to insert the contents of the matched part of the line and/or
- captured substrings into the text.
- $<digits> or ${<digits>} is replaced by the captured sub-
- string of the given decimal number; zero substitutes the
- whole match. If the number is greater than the number of cap-
- turing substrings, or if the capture is unset, the replace-
- ment is empty.
- $a is replaced by bell; $b by backspace; $e by escape; $f by
- form feed; $n by newline; $r by carriage return; $t by tab;
- $v by vertical tab.
- $o<digits> or $o{<digits>} is replaced by the character whose
- code point is the given octal number. In the first form, up
- to three octal digits are processed. When more digits are
- needed in Unicode mode to specify a wide character, the sec-
- ond form must be used.
- $x<digits> or $x{<digits>} is replaced by the character rep-
- resented by the given hexadecimal number. In the first form,
- up to two hexadecimal digits are processed. When more digits
- are needed in Unicode mode to specify a wide character, the
- second form must be used.
- Any other character is substituted by itself. In particular,
- $$ is replaced by a single dollar.
- -o, --only-matching
- Show only the part of the line that matched a pattern instead
- of the whole line. In this mode, no context is shown. That
- is, the -A, -B, and -C options are ignored. If there is more
- than one match in a line, each of them is shown separately,
- on a separate line of output. If -o is combined with -v (in-
- vert the sense of the match to find non-matching lines), no
- output is generated, but the return code is set appropri-
- ately. If the matched portion of the line is empty, nothing
- is output unless the file name or line number are being
- printed, in which case they are shown on an otherwise empty
- line. This option is mutually exclusive with --output,
- --file-offsets and --line-offsets.
- -onumber, --only-matching=number
- Show only the part of the line that matched the capturing
- parentheses of the given number. Up to 50 capturing parenthe-
- ses are supported by default. This limit can be changed via
- the --om-capture option. A pattern may contain any number of
- capturing parentheses, but only those whose number is within
- the limit can be accessed by -o. An error occurs if the num-
- ber specified by -o is greater than the limit.
- -o0 is the same as -o without a number. Because these options
- can be given without an argument (see above), if an argument
- is present, it must be given in the same shell item, for ex-
- ample, -o3 or --only-matching=2. The comments given for the
- non-argument case above also apply to this option. If the
- specified capturing parentheses do not exist in the pattern,
- or were not set in the match, nothing is output unless the
- file name or line number are being output.
- If this option is given multiple times, multiple substrings
- are output for each match, in the order the options are
- given, and all on one line. For example, -o3 -o1 -o3 causes
- the substrings matched by capturing parentheses 3 and 1 and
- then 3 again to be output. By default, there is no separator
- (but see the next but one option).
- --om-capture=number
- Set the number of capturing parentheses that can be accessed
- by -o. The default is 50.
- --om-separator=text
- Specify a separating string for multiple occurrences of -o.
- The default is an empty string. Separating strings are never
- coloured.
- -P, --no-ucp
- Starting from release 10.43, when UTF/Unicode mode is speci-
- fied with -u or -U, the PCRE2_UCP option is used by default.
- This means that the POSIX classes in patterns match more than
- just ASCII characters. For example, [:digit:] matches any
- Unicode decimal digit. The --no-ucp option suppresses
- PCRE2_UCP, thus restricting the POSIX classes to ASCII char-
- acters, as was the case in earlier releases. Note that there
- are now more fine-grained option settings within patterns
- that affect individual classes. For example, when in UCP
- mode, the sequence (?aP) restricts [:word:] to ASCII letters,
- while allowing \w to match Unicode letters and digits.
- -q, --quiet
- Work quietly, that is, display nothing except error messages.
- The exit status indicates whether or not any matches were
- found.
- -r, --recursive
- If any given path is a directory, recursively scan the files
- it contains, taking note of any --include and --exclude set-
- tings. By default, a directory is read as a normal file; in
- some operating systems this gives an immediate end-of-file.
- This option is a shorthand for setting the -d option to "re-
- curse".
- --recursion-limit=number
- This is an obsolete synonym for --depth-limit. See --match-
- limit above for details.
- -s, --no-messages
- Suppress error messages about non-existent or unreadable
- files. Such files are quietly skipped. However, the return
- code is still 2, even if matches were found in other files.
- -t, --total-count
- This option is useful when scanning more than one file. If
- used on its own, -t suppresses all output except for a grand
- total number of matching lines (or non-matching lines if -v
- is used) in all the files. If -t is used with -c, a grand to-
- tal is output except when the previous output is just one
- line. In other words, it is not output when just one file's
- count is listed. If file names are being output, the grand
- total is preceded by "TOTAL:". Otherwise, it appears as just
- another number. The -t option is ignored when used with -L
- (list files without matches), because the grand total would
- always be zero.
- -u, --utf Operate in UTF/Unicode mode. This option is available only if
- PCRE2 has been compiled with UTF-8 support. All patterns (in-
- cluding those for any --exclude and --include options) and
- all lines that are scanned must be valid strings of UTF-8
- characters. If an invalid UTF-8 string is encountered, an er-
- ror occurs.
- -U, --utf-allow-invalid
- As --utf, but in addition subject lines may contain invalid
- UTF-8 code unit sequences. These can never form part of any
- pattern match. Patterns themselves, however, must still be
- valid UTF-8 strings. This facility allows valid UTF-8 strings
- to be sought within arbitrary byte sequences in executable or
- other binary files. For more details about matching in non-
- valid UTF-8 strings, see the pcre2unicode(3) documentation.
- -V, --version
- Write the version numbers of pcre2grep and the PCRE2 library
- to the standard output and then exit. Anything else on the
- command line is ignored.
- -v, --invert-match
- Invert the sense of the match, so that lines which do not
- match any of the patterns are the ones that are found. When
- this option is set, options such as --only-matching and
- --output, which specify parts of a match that are to be out-
- put, are ignored.
- -w, --word-regex, --word-regexp
- Force the patterns only to match "words". That is, there must
- be a word boundary at the start and end of each matched
- string. This is equivalent to having "\b(?:" at the start of
- each pattern, and ")\b" at the end. This option applies only
- to the patterns that are matched against the contents of
- files; it does not apply to patterns specified by any of the
- --include or --exclude options.
- -x, --line-regex, --line-regexp
- Force the patterns to start matching only at the beginnings
- of lines, and in addition, require them to match entire
- lines. In multiline mode the match may be more than one line.
- This is equivalent to having "^(?:" at the start of each pat-
- tern and ")$" at the end. This option applies only to the
- patterns that are matched against the contents of files; it
- does not apply to patterns specified by any of the --include
- or --exclude options.
- -Z, --null
- Terminate files names in the regular output with a zero byte
- (the NUL character) instead of what would normally appear.
- This is useful when file names contain unusual characters
- such as colons, hyphens, or even newlines. The option does
- not apply to file names in error messages.
- ENVIRONMENT VARIABLES
- The environment variables LC_ALL and LC_CTYPE are examined, in that or-
- der, for a locale. The first one that is set is used. This can be over-
- ridden by the --locale option. If no locale is set, the PCRE2 library's
- default (usually the "C" locale) is used.
- NEWLINES
- The -N (--newline) option allows pcre2grep to scan files with newline
- conventions that differ from the default. This option affects only the
- way scanned files are processed. It does not affect the interpretation
- of files specified by the -f, --file-list, --exclude-from, or --in-
- clude-from options.
- Any parts of the scanned input files that are written to the standard
- output are copied with whatever newline sequences they have in the in-
- put. However, if the final line of a file is output, and it does not
- end with a newline sequence, a newline sequence is added. If the new-
- line setting is CR, LF, CRLF or NUL, that line ending is output; for
- the other settings (ANYCRLF or ANY) a single NL is used.
- The newline setting does not affect the way in which pcre2grep writes
- newlines in informational messages to the standard output and error
- streams. Under Windows, the standard output is set to be binary, so
- that "\r\n" at the ends of output lines that are copied from the input
- is not converted to "\r\r\n" by the C I/O library. This means that any
- messages written to the standard output must end with "\r\n". For all
- other operating systems, and for all messages to the standard error
- stream, "\n" is used.
- OPTIONS COMPATIBILITY WITH GNU GREP
- Many of the short and long forms of pcre2grep's options are the same as
- in the GNU grep program. Any long option of the form --xxx-regexp (GNU
- terminology) is also available as --xxx-regex (PCRE2 terminology).
- However, the --case-restrict, --depth-limit, -E, --file-list, --file-
- offsets, --heap-limit, --include-dir, --line-offsets, --locale,
- --match-limit, -M, --multiline, -N, --newline, --no-ucp, --om-separa-
- tor, --output, -P, -u, --utf, -U, and --utf-allow-invalid options are
- specific to pcre2grep, as is the use of the --only-matching option with
- a capturing parentheses number.
- Although most of the common options work the same way, a few are dif-
- ferent in pcre2grep. For example, the --include option's argument is a
- glob for GNU grep, but in pcre2grep it is a regular expression to which
- the -i option applies. If both the -c and -l options are given, GNU
- grep lists only file names, without counts, but pcre2grep gives the
- counts as well.
- OPTIONS WITH DATA
- There are four different ways in which an option with data can be spec-
- ified. If a short form option is used, the data may follow immedi-
- ately, or (with one exception) in the next command line item. For exam-
- ple:
- -f/some/file
- -f /some/file
- The exception is the -o option, which may appear with or without data.
- Because of this, if data is present, it must follow immediately in the
- same item, for example -o3.
- If a long form option is used, the data may appear in the same command
- line item, separated by an equals character, or (with two exceptions)
- it may appear in the next command line item. For example:
- --file=/some/file
- --file /some/file
- Note, however, that if you want to supply a file name beginning with ~
- as data in a shell command, and have the shell expand ~ to a home di-
- rectory, you must separate the file name from the option, because the
- shell does not treat ~ specially unless it is at the start of an item.
- The exceptions to the above are the --colour (or --color) and --only-
- matching options, for which the data is optional. If one of these op-
- tions does have data, it must be given in the first form, using an
- equals character. Otherwise pcre2grep will assume that it has no data.
- USING PCRE2'S CALLOUT FACILITY
- pcre2grep has, by default, support for calling external programs or
- scripts or echoing specific strings during matching by making use of
- PCRE2's callout facility. However, this support can be completely or
- partially disabled when pcre2grep is built. You can find out whether
- your binary has support for callouts by running it with the --help op-
- tion. If callout support is completely disabled, all callouts in pat-
- terns are ignored by pcre2grep. If the facility is partially disabled,
- calling external programs is not supported, and callouts that request
- it are ignored.
- A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu-
- ment is either a number or a quoted string (see the pcre2callout docu-
- mentation for details). Numbered callouts are ignored by pcre2grep;
- only callouts with string arguments are useful.
- Echoing a specific string
- Starting the callout string with a pipe character invokes an echoing
- facility that avoids calling an external program or script. This facil-
- ity is always available, provided that callouts were not completely
- disabled when pcre2grep was built. The rest of the callout string is
- processed as a zero-terminated string, which means it should not con-
- tain any internal binary zeros. It is written to the output, having
- first been passed through the same escape processing as text from the
- --output (-O) option (see above). However, $0 cannot be used to insert
- a matched substring because the match is still in progress. Instead,
- the single character '0' is inserted. Any syntax errors in the string
- (for example, a dollar not followed by another character) causes the
- callout to be ignored. No terminator is added to the output string, so
- if you want a newline, you must include it explicitly using the escape
- $n. For example:
- pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file>
- Matching continues normally after the string is output. If you want to
- see only the callout output but not any output from an actual match,
- you should end the pattern with (*FAIL).
- Calling external programs or scripts
- This facility can be independently disabled when pcre2grep is built. It
- is supported for Windows, where a call to _spawnvp() is used, for VMS,
- where lib$spawn() is used, and for any Unix-like environment where
- fork() and execv() are available.
- If the callout string does not start with a pipe (vertical bar) charac-
- ter, it is parsed into a list of substrings separated by pipe charac-
- ters. The first substring must be an executable name, with the follow-
- ing substrings specifying arguments:
- executable_name|arg1|arg2|...
- Any substring (including the executable name) may contain escape se-
- quences started by a dollar character. These are the same as for the
- --output (-O) option documented above, except that $0 cannot insert the
- matched string because the match is still in progress. Instead, the
- character '0' is inserted. If you need a literal dollar or pipe charac-
- ter in any substring, use $$ or $| respectively. Here is an example:
- echo -e "abcde\n12345" | pcre2grep \
- '(?x)(.)(..(.))
- (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
- Output:
- Arg1: [a] [bcd] [d] Arg2: |a| ()
- abcde
- Arg1: [1] [234] [4] Arg2: |1| ()
- 12345
- The parameters for the system call that is used to run the program or
- script are zero-terminated strings. This means that binary zero charac-
- ters in the callout argument will cause premature termination of their
- substrings, and therefore should not be present. Any syntax errors in
- the string (for example, a dollar not followed by another character)
- causes the callout to be ignored. If running the program fails for any
- reason (including the non-existence of the executable), a local match-
- ing failure occurs and the matcher backtracks in the normal way.
- MATCHING ERRORS
- It is possible to supply a regular expression that takes a very long
- time to fail to match certain lines. Such patterns normally involve
- nested indefinite repeats, for example: (a+)*\d when matched against a
- line of a's with no final digit. The PCRE2 matching function has a re-
- source limit that causes it to abort in these circumstances. If this
- happens, pcre2grep outputs an error message and the line that caused
- the problem to the standard error stream. If there are more than 20
- such errors, pcre2grep gives up.
- The --match-limit option of pcre2grep can be used to set the overall
- resource limit. There are also other limits that affect the amount of
- memory used during matching; see the discussion of --heap-limit and
- --depth-limit above.
- DIAGNOSTICS
- Exit status is 0 if any matches were found, 1 if no matches were found,
- and 2 for syntax errors, overlong lines, non-existent or inaccessible
- files (even if matches were found in other files) or too many matching
- errors. Using the -s option to suppress error messages about inaccessi-
- ble files does not affect the return code.
- When run under VMS, the return code is placed in the symbol
- PCRE2GREP_RC because VMS does not distinguish between exit(0) and
- exit(1).
- SEE ALSO
- pcre2pattern(3), pcre2syntax(3), pcre2callout(3), pcre2unicode(3).
- AUTHOR
- Philip Hazel
- Retired from University Computing Service
- Cambridge, England.
- REVISION
- Last updated: 22 December 2023
- Copyright (c) 1997-2023 University of Cambridge.
- PCRE2 10.43 22 December 2023 PCRE2GREP(1)
|