123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425 |
- Regex Module
- Iñaki Baz Castillo
- <[email protected]>
- Edited by
- Iñaki Baz Castillo
- <[email protected]>
- Copyright © 2009 Iñaki Baz Castillo
- __________________________________________________________________
- Table of Contents
- 1. Admin Guide
- 1. Overview
- 2. Dependencies
- 2.1. Kamailio Modules
- 2.2. External Libraries or Applications
- 3. Parameters
- 3.1. file (string)
- 3.2. max_groups (int)
- 3.3. group_max_size (int)
- 3.4. pcre_caseless (int)
- 3.5. pcre_multiline (int)
- 3.6. pcre_dotall (int)
- 3.7. pcre_extended (int)
- 4. Functions
- 4.1. pcre_match (string, pcre_regex)
- 4.2. pcre_match_group (string [, group])
- 5. MI Commands
- 5.1. regex_reload
- 6. Installation and Running
- 6.1. File format
- List of Examples
- 1.1. Set file parameter
- 1.2. Set max_groups parameter
- 1.3. Set group_max_size parameter
- 1.4. Set pcre_caseless parameter
- 1.5. Set pcre_multiline parameter
- 1.6. Set pcre_dotall parameter
- 1.7. Set pcre_extended parameter
- 1.8. pcre_match usage (forcing case insensitive)
- 1.9. pcre_match usage (using "end of line" symbol)
- 1.10. pcre_match_group usage
- 1.11. pcre_match_group usage (using a pseudo-variable as group)
- 1.12. regex file
- 1.13. Using with pua_usrloc
- 1.14. Incorrect groups file
- Chapter 1. Admin Guide
- Table of Contents
- 1. Overview
- 2. Dependencies
- 2.1. Kamailio Modules
- 2.2. External Libraries or Applications
- 3. Parameters
- 3.1. file (string)
- 3.2. max_groups (int)
- 3.3. group_max_size (int)
- 3.4. pcre_caseless (int)
- 3.5. pcre_multiline (int)
- 3.6. pcre_dotall (int)
- 3.7. pcre_extended (int)
- 4. Functions
- 4.1. pcre_match (string, pcre_regex)
- 4.2. pcre_match_group (string [, group])
- 5. MI Commands
- 5.1. regex_reload
- 6. Installation and Running
- 6.1. File format
- 1. Overview
- This module offers matching operations using regular expressions based
- on the powerful PCRE library.
- A text file containing regular expressions categorized in groups is
- compiled when the module is loaded, the resulting PCRE objects are
- stored in an array. A function to match a string or pseudo-variable
- against any of these groups is provided. The text file can be modified
- and reloaded at any time via a MI command. The module also offers a
- function to perform a PCRE matching operation against a regular
- expression provided as function parameter.
- For a detailed list of PCRE features read the man page of the library.
- 2. Dependencies
- 2.1. Kamailio Modules
- 2.2. External Libraries or Applications
- 2.1. Kamailio Modules
- The following modules must be loaded before this module:
- * No dependencies on other Kamailio modules.
- 2.2. External Libraries or Applications
- The following libraries or applications must be installed before
- running Kamailio with this module loaded:
- * libpcre - the libraries of PCRE.
- 3. Parameters
- 3.1. file (string)
- 3.2. max_groups (int)
- 3.3. group_max_size (int)
- 3.4. pcre_caseless (int)
- 3.5. pcre_multiline (int)
- 3.6. pcre_dotall (int)
- 3.7. pcre_extended (int)
- 3.1. file (string)
- Text file containing the regular expression groups. It must be set in
- order to enable the group matching function.
- Default value is "NULL".
- Example 1.1. Set file parameter
- ...
- modparam("regex", "file", "/etc/kamailio/regex_groups")
- ...
- 3.2. max_groups (int)
- Max number of regular expression groups in the text file.
- Default value is "20".
- Example 1.2. Set max_groups parameter
- ...
- modparam("regex", "max_groups", 40)
- ...
- 3.3. group_max_size (int)
- Max content size of a group in the text file.
- Default value is "8192".
- Example 1.3. Set group_max_size parameter
- ...
- modparam("regex", "group_max_size", 16384)
- ...
- 3.4. pcre_caseless (int)
- If this options is set, matching is done caseless. It is equivalent to
- Perl's /i option, and it can be changed within a pattern by a (?i) or
- (?-i) option setting.
- Default value is "0".
- Example 1.4. Set pcre_caseless parameter
- ...
- modparam("regex", "pcre_caseless", 1)
- ...
- 3.5. pcre_multiline (int)
- By default, PCRE treats the subject string as consisting of a single
- line of characters (even if it actually contains newlines). The "start
- of line" metacharacter (^) matches only at the start of the string,
- while the "end of line" metacharacter ($) matches only at the end of
- the string, or before a terminating newline.
- When this option is set, the "start of line" and "end of line"
- constructs match immediately following or immediately before internal
- newlines in the subject string, respectively, as well as at the very
- start and end. This is equivalent to Perl's /m option, and it can be
- changed within a pattern by a (?m) or (?-m) option setting. If there
- are no newlines in a subject string, or no occurrences of ^ or $ in a
- pattern, setting this option has no effect.
- Default value is "0".
- Example 1.5. Set pcre_multiline parameter
- ...
- modparam("regex", "pcre_multiline", 1)
- ...
- 3.6. pcre_dotall (int)
- If this option is set, a dot metacharater in the pattern matches all
- characters, including those that indicate newline. Without it, a dot
- does not match when the current position is at a newline. This option
- is equivalent to Perl's /s option, and it can be changed within a
- pattern by a (?s) or (?-s) option setting.
- Default value is "0".
- Example 1.6. Set pcre_dotall parameter
- ...
- modparam("regex", "pcre_dotall", 1)
- ...
- 3.7. pcre_extended (int)
- If this option is set, whitespace data characters in the pattern are
- totally ignored except when escaped or inside a character class.
- Whitespace does not include the VT character (code 11). In addition,
- characters between an unescaped # outside a character class and the
- next newline, inclusive, are also ignored. This is equivalent to Perl's
- /x option, and it can be changed within a pattern by a (?x) or (?-x)
- option setting.
- Default value is "0".
- Example 1.7. Set pcre_extended parameter
- ...
- modparam("regex", "pcre_extended", 1)
- ...
- 4. Functions
- 4.1. pcre_match (string, pcre_regex)
- 4.2. pcre_match_group (string [, group])
- 4.1. pcre_match (string, pcre_regex)
- Matches the given string parameter against the regular expression
- pcre_regex, which is compiled in runtime into a PCRE object. Returns
- TRUE if it matches, FALSE otherwise.
- Meaning of the parameters is as follows:
- * string - String or pseudo-variable to compare.
- * pcre_regex - Regular expression to be compiled in a PCRE object. It
- can be a string or pseudo-variable.
- NOTE: To use the "end of line" symbol '$' in the pcre_regex parameter
- use '$$'.
- This function can be used from REQUEST_ROUTE, FAILURE_ROUTE,
- ONREPLY_ROUTE, BRANCH_ROUTE and LOCAL_ROUTE.
- Example 1.8. pcre_match usage (forcing case insensitive)
- ...
- if (pcre_match("$ua", "(?i)^twinkle")) {
- xlog("L_INFO", "User-Agent matches\n");
- }
- ...
- Example 1.9. pcre_match usage (using "end of line" symbol)
- ...
- if (pcre_match("$rU", "^user[1234]$$")) { # Will be converted to "^user[1234]$"
- xlog("L_INFO", "RURI username matches\n");
- }
- ...
- 4.2. pcre_match_group (string [, group])
- Tries to match the given string against a specific group in the text
- file (see Section 6.1, "File format"). Returns TRUE if it matches,
- FALSE otherwise.
- Meaning of the parameters is as follows:
- * string - String or pseudo-variable to compare.
- * group - Number of group to use in the operation. If not specified
- then 0 (the first group) is used. A pseudo-variable containing an
- integer can also be used.
- This function can be used from REQUEST_ROUTE, FAILURE_ROUTE,
- ONREPLY_ROUTE, BRANCH_ROUTE and LOCAL_ROUTE.
- Example 1.10. pcre_match_group usage
- ...
- if (pcre_match_group("$rU", "2")) {
- xlog("L_INFO", "RURI username matches group 2\n");
- }
- ...
- Example 1.11. pcre_match_group usage (using a pseudo-variable as group)
- ...
- $avp(i:10) = 5; # Maybe got from a DB query.
- if (pcre_match_group("$ua", "$avp(i:10)")) {
- xlog("L_INFO", "User-Agent matches group 5\n");
- }
- ...
- 5. MI Commands
- 5.1. regex_reload
- 5.1. regex_reload
- Causes regex module to re-read the content of the text file and
- re-compile the regular expressions. The number of groups in the file
- can be modified safely.
- Name: regex_reload
- Parameters: none
- MI FIFO Command Format:
- :regex_reload:_reply_fifo_file_
- _empty_line_
- 6. Installation and Running
- 6.1. File format
- 6.1. File format
- The file contains regular expressions categorized in groups. Each group
- starts with "[number]" line. Lines starting by space, tab, CR, LF or #
- (comments) are ignored. Each regular expression must take up just one
- line, this means that a regular expression can't be splitted in various
- lines.
- An example of the file format would be the following:
- Example 1.12. regex file
- ### List of User-Agents publishing presence status
- [0]
- # Softphones
- ^Twinkle/1
- ^X-Lite
- ^eyeBeam
- ^Bria
- ^SIP Communicator
- ^Linphone
- # Deskphones
- ^Snom
- # Others
- ^SIPp
- ^PJSUA
- ### Blacklisted source IP's
- [1]
- ^190\.232\.250\.226$
- ^122\.5\.27\.125$
- ^86\.92\.112\.
- ### Free PSTN destinations in Spain
- [2]
- ^1\d{3}$
- ^((\+|00)34)?900\d{6}$
- The module compiles the text above to the following regular
- expressions:
- group 0: ((^Twinkle/1)|(^X-Lite)|(^eyeBeam)|(^Bria)|(^SIP Communicator)|
- (^Linphone)|(^Snom)|(^SIPp)|(^PJSUA))
- group 1: ((^190\.232\.250\.226$)|(^122\.5\.27\.125$)|(^86\.92\.112\.))
- group 2: ((^1\d{3}$)|(^((\+|00)34)?900\d{6}$))
- The first group can be used to avoid auto-generated PUBLISH (pua_usrloc
- module) for UA's already supporting presence:
- Example 1.13. Using with pua_usrloc
- route[REGISTER] {
- if (! pcre_match_group("$ua", "0")) {
- xlog("L_INFO", "Auto-generated PUBLISH for $fu ($ua)\n");
- pua_set_publish();
- }
- save("location");
- exit;
- }
- NOTE: It's important to understand that the numbers in each group
- header ([number]) must start by 0. If not, the real group number will
- not match the number appearing in the file. For example, the following
- text file:
- Example 1.14. Incorrect groups file
- [1]
- ^aaa
- ^bbb
- [2]
- ^ccc
- ^ddd
- will generate the following regular expressions:
- group 0: ((^aaa)|(^bbb))
- group 1: ((^ccc)|(^ddd))
- Note that the real index doesn't match the group number in the file.
- This is, compiled group 0 always points to the first group in the file,
- regardless of its number in the file. In fact, the group number
- appearing in the file is used for nothing but for delimiting different
- groups.
- NOTE: A line containing a regular expression cannot start by '[' since
- it would be treated as a new group. The same for lines starting by
- space, tab, or '#' (they would be ignored by the parser). As a
- workaround, using brackets would work:
- [0]
- ([0-9]{9})
- ( #abcde)
- ( qwerty)
|