regex_admin.xml 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536
  1. <?xml version="1.0" encoding='ISO-8859-1'?>
  2. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
  4. <!-- Include general documentation entities -->
  5. <!ENTITY % docentities SYSTEM "../../../docbook/entities.xml">
  6. %docentities;
  7. ]>
  8. <!-- Module User's Guide -->
  9. <chapter>
  10. <title>&adminguide;</title>
  11. <section id="regex.overview">
  12. <title>Overview</title>
  13. <para>
  14. This module offers matching operations using regular expressions based on the
  15. powerful <ulink url="http://www.pcre.org/">PCRE</ulink> library.
  16. </para>
  17. <para>
  18. A text file containing regular expressions categorized in groups is compiled
  19. when the module is loaded, the resulting PCRE objects are stored in an array. A
  20. function to match a string or pseudo-variable against any of these groups is
  21. provided. The text file can be modified and reloaded at any time via a MI command.
  22. The module also offers a function to perform a PCRE matching operation against a
  23. regular expression provided as function parameter.
  24. </para>
  25. <para>
  26. For a detailed list of PCRE features read the
  27. <ulink url="http://www.pcre.org/pcre.txt">man page</ulink> of the library.
  28. </para>
  29. </section>
  30. <section>
  31. <title>Dependencies</title>
  32. <section>
  33. <title>&kamailio; Modules</title>
  34. <para>
  35. The following modules must be loaded before this module:
  36. <itemizedlist>
  37. <listitem>
  38. <para>
  39. <emphasis>No dependencies on other &kamailio; modules</emphasis>.
  40. </para>
  41. </listitem>
  42. </itemizedlist>
  43. </para>
  44. </section>
  45. <section>
  46. <title>External Libraries or Applications</title>
  47. <para>
  48. The following libraries or applications must be installed before running
  49. &kamailio; with this module loaded:
  50. <itemizedlist>
  51. <listitem>
  52. <para>
  53. <emphasis>libpcre - the libraries of <ulink url="http://www.pcre.org/">PCRE</ulink></emphasis>.
  54. </para>
  55. </listitem>
  56. </itemizedlist>
  57. </para>
  58. </section>
  59. </section>
  60. <section>
  61. <title>Parameters</title>
  62. <section id="regex.p.file">
  63. <title><varname>file</varname> (string)</title>
  64. <para>
  65. Text file containing the regular expression groups. It must be set in order
  66. to enable the group matching function.
  67. </para>
  68. <para>
  69. <emphasis>Default value is <quote>NULL</quote>.</emphasis>
  70. </para>
  71. <example>
  72. <title>Set <varname>file</varname> parameter</title>
  73. <programlisting format="linespecific">
  74. ...
  75. modparam("regex", "file", "/etc/kamailio/regex_groups")
  76. ...
  77. </programlisting>
  78. </example>
  79. </section>
  80. <section id="regex.p.max_groups">
  81. <title><varname>max_groups</varname> (int)</title>
  82. <para>
  83. Max number of regular expression groups in the text file.
  84. </para>
  85. <para>
  86. <emphasis>Default value is <quote>20</quote>.</emphasis>
  87. </para>
  88. <example>
  89. <title>Set <varname>max_groups</varname> parameter</title>
  90. <programlisting format="linespecific">
  91. ...
  92. modparam("regex", "max_groups", 40)
  93. ...
  94. </programlisting>
  95. </example>
  96. </section>
  97. <section id="regex.p.group_max_size">
  98. <title><varname>group_max_size</varname> (int)</title>
  99. <para>
  100. Max content size of a group in the text file.
  101. </para>
  102. <para>
  103. <emphasis>Default value is <quote>8192</quote>.</emphasis>
  104. </para>
  105. <example>
  106. <title>Set <varname>group_max_size</varname> parameter</title>
  107. <programlisting format="linespecific">
  108. ...
  109. modparam("regex", "group_max_size", 16384)
  110. ...
  111. </programlisting>
  112. </example>
  113. </section>
  114. <section id="regex.p.pcre_caseless">
  115. <title><varname>pcre_caseless</varname> (int)</title>
  116. <para>
  117. If this options is set, matching is done caseless. It is equivalent to
  118. Perl's /i option, and it can be changed within a pattern by a (?i) or
  119. (?-i) option setting.
  120. </para>
  121. <para>
  122. <emphasis>Default value is <quote>0</quote>.</emphasis>
  123. </para>
  124. <example>
  125. <title>Set <varname>pcre_caseless</varname> parameter</title>
  126. <programlisting format="linespecific">
  127. ...
  128. modparam("regex", "pcre_caseless", 1)
  129. ...
  130. </programlisting>
  131. </example>
  132. </section>
  133. <section id="regex.p.pcre_multiline">
  134. <title><varname>pcre_multiline</varname> (int)</title>
  135. <para>
  136. By default, PCRE treats the subject string as consisting of a single line
  137. of characters (even if it actually contains newlines). The "start of line"
  138. metacharacter (^) matches only at the start of the string, while the "end
  139. of line" metacharacter ($) matches only at the end of the string, or before
  140. a terminating newline.
  141. </para>
  142. <para>
  143. When this option is set, the "start of line" and "end of line" constructs
  144. match immediately following or immediately before internal newlines in the
  145. subject string, respectively, as well as at the very start and end. This is
  146. equivalent to Perl's /m option, and it can be changed within a pattern by a
  147. (?m) or (?-m) option setting. If there are no newlines in a subject string,
  148. or no occurrences of ^ or $ in a pattern, setting this option has no effect.
  149. </para>
  150. <para>
  151. <emphasis>Default value is <quote>0</quote>.</emphasis>
  152. </para>
  153. <example>
  154. <title>Set <varname>pcre_multiline</varname> parameter</title>
  155. <programlisting format="linespecific">
  156. ...
  157. modparam("regex", "pcre_multiline", 1)
  158. ...
  159. </programlisting>
  160. </example>
  161. </section>
  162. <section id="regex.p.pcre_dotall">
  163. <title><varname>pcre_dotall</varname> (int)</title>
  164. <para>
  165. If this option is set, a dot metacharater in the pattern matches all characters,
  166. including those that indicate newline. Without it, a dot does not match when
  167. the current position is at a newline. This option is equivalent to Perl's /s
  168. option, and it can be changed within a pattern by a (?s) or (?-s) option setting.
  169. </para>
  170. <para>
  171. <emphasis>Default value is <quote>0</quote>.</emphasis>
  172. </para>
  173. <example>
  174. <title>Set <varname>pcre_dotall</varname> parameter</title>
  175. <programlisting format="linespecific">
  176. ...
  177. modparam("regex", "pcre_dotall", 1)
  178. ...
  179. </programlisting>
  180. </example>
  181. </section>
  182. <section id="regex.p.pcre_extended">
  183. <title><varname>pcre_extended</varname> (int)</title>
  184. <para>
  185. If this option is set, whitespace data characters in the pattern are totally
  186. ignored except when escaped or inside a character class. Whitespace does not
  187. include the VT character (code 11). In addition, characters between an
  188. unescaped # outside a character class and the next newline, inclusive, are
  189. also ignored. This is equivalent to Perl's /x option, and it can be changed
  190. within a pattern by a (?x) or (?-x) option setting.
  191. </para>
  192. <para>
  193. <emphasis>Default value is <quote>0</quote>.</emphasis>
  194. </para>
  195. <example>
  196. <title>Set <varname>pcre_extended</varname> parameter</title>
  197. <programlisting format="linespecific">
  198. ...
  199. modparam("regex", "pcre_extended", 1)
  200. ...
  201. </programlisting>
  202. </example>
  203. </section>
  204. </section>
  205. <section>
  206. <title>Functions</title>
  207. <section id="regex.p.pcre_match">
  208. <title>
  209. <function moreinfo="none">pcre_match (string, pcre_regex)</function>
  210. </title>
  211. <para>
  212. Matches the given string parameter against the regular expression pcre_regex,
  213. which is compiled in runtime into a PCRE object. Returns TRUE if it matches,
  214. FALSE otherwise.
  215. </para>
  216. <para>Meaning of the parameters is as follows:</para>
  217. <itemizedlist>
  218. <listitem>
  219. <para>
  220. <emphasis>string</emphasis> - String or pseudo-variable to compare.
  221. </para>
  222. </listitem>
  223. <listitem>
  224. <para>
  225. <emphasis>pcre_regex</emphasis> - Regular expression to be compiled
  226. in a PCRE object. It can be a string or pseudo-variable.
  227. </para>
  228. </listitem>
  229. </itemizedlist>
  230. <para>
  231. NOTE: To use the "end of line" symbol '$' in the pcre_regex parameter use '$$'.
  232. </para>
  233. <para>
  234. This function can be used from REQUEST_ROUTE, FAILURE_ROUTE, ONREPLY_ROUTE,
  235. BRANCH_ROUTE and LOCAL_ROUTE.
  236. </para>
  237. <example>
  238. <title>
  239. <function>pcre_match</function> usage (forcing case insensitive)
  240. </title>
  241. <programlisting format="linespecific">
  242. ...
  243. if (pcre_match("$ua", "(?i)^twinkle")) {
  244. xlog("L_INFO", "User-Agent matches\n");
  245. }
  246. ...
  247. </programlisting>
  248. </example>
  249. <example>
  250. <title>
  251. <function>pcre_match</function> usage (using "end of line" symbol)
  252. </title>
  253. <programlisting format="linespecific">
  254. ...
  255. if (pcre_match("$rU", "^user[1234]$$")) { # Will be converted to "^user[1234]$"
  256. xlog("L_INFO", "RURI username matches\n");
  257. }
  258. ...
  259. </programlisting>
  260. </example>
  261. </section>
  262. <section id="regex.p.pcre_match_group">
  263. <title>
  264. <function moreinfo="none">pcre_match_group (string [, group])</function>
  265. </title>
  266. <para>
  267. Tries to match the given string against a specific group in the text
  268. file (see <xref linkend="file-format-id"/>). Returns TRUE if it matches,
  269. FALSE otherwise.
  270. </para>
  271. <para>Meaning of the parameters is as follows:</para>
  272. <itemizedlist>
  273. <listitem>
  274. <para>
  275. <emphasis>string</emphasis> - String or pseudo-variable to compare.
  276. </para>
  277. </listitem>
  278. <listitem>
  279. <para>
  280. <emphasis>group</emphasis> - Number of group to use in the operation.
  281. If not specified then 0 (the first group) is used. A pseudo-variable
  282. containing an integer can also be used.
  283. </para>
  284. </listitem>
  285. </itemizedlist>
  286. <para>
  287. This function can be used from REQUEST_ROUTE, FAILURE_ROUTE, ONREPLY_ROUTE,
  288. BRANCH_ROUTE and LOCAL_ROUTE.
  289. </para>
  290. <example>
  291. <title>
  292. <function>pcre_match_group</function> usage
  293. </title>
  294. <programlisting format="linespecific">
  295. ...
  296. if (pcre_match_group("$rU", "2")) {
  297. xlog("L_INFO", "RURI username matches group 2\n");
  298. }
  299. ...
  300. </programlisting>
  301. </example>
  302. <example>
  303. <title>
  304. <function>pcre_match_group</function> usage (using a pseudo-variable as group)
  305. </title>
  306. <programlisting format="linespecific">
  307. ...
  308. $avp(i:10) = 5; # Maybe got from a DB query.
  309. if (pcre_match_group("$ua", "$avp(i:10)")) {
  310. xlog("L_INFO", "User-Agent matches group 5\n");
  311. }
  312. ...
  313. </programlisting>
  314. </example>
  315. </section>
  316. </section>
  317. <section>
  318. <title>MI Commands</title>
  319. <section id="regex.m.regex_reload">
  320. <title>
  321. <function moreinfo="none">regex_reload</function>
  322. </title>
  323. <para>
  324. Causes regex module to re-read the content of the text file
  325. and re-compile the regular expressions. The number of groups
  326. in the file can be modified safely.
  327. </para>
  328. <para>
  329. Name: <emphasis>regex_reload</emphasis>
  330. </para>
  331. <para>Parameters: <emphasis>none</emphasis></para>
  332. <para>
  333. MI FIFO Command Format:
  334. </para>
  335. <programlisting format="linespecific">
  336. :regex_reload:_reply_fifo_file_
  337. _empty_line_
  338. </programlisting>
  339. </section>
  340. </section>
  341. <section>
  342. <title>Installation and Running</title>
  343. <section id="file-format-id">
  344. <title>File format</title>
  345. <para>
  346. The file contains regular expressions categorized in groups. Each
  347. group starts with "[number]" line. Lines starting by space, tab,
  348. CR, LF or # (comments) are ignored. Each regular expression must
  349. take up just one line, this means that a regular expression can't
  350. be splitted in various lines.
  351. </para>
  352. <para>
  353. An example of the file format would be the following:
  354. </para>
  355. <example>
  356. <title>regex file</title>
  357. <programlisting format="linespecific">
  358. ### List of User-Agents publishing presence status
  359. [0]
  360. # Softphones
  361. ^Twinkle/1
  362. ^X-Lite
  363. ^eyeBeam
  364. ^Bria
  365. ^SIP Communicator
  366. ^Linphone
  367. # Deskphones
  368. ^Snom
  369. # Others
  370. ^SIPp
  371. ^PJSUA
  372. ### Blacklisted source IP's
  373. [1]
  374. ^190\.232\.250\.226$
  375. ^122\.5\.27\.125$
  376. ^86\.92\.112\.
  377. ### Free PSTN destinations in Spain
  378. [2]
  379. ^1\d{3}$
  380. ^((\+|00)34)?900\d{6}$
  381. </programlisting>
  382. </example>
  383. <para>
  384. The module compiles the text above to the following regular
  385. expressions:
  386. </para>
  387. <programlisting format="linespecific">
  388. group 0: ((^Twinkle/1)|(^X-Lite)|(^eyeBeam)|(^Bria)|(^SIP Communicator)|
  389. (^Linphone)|(^Snom)|(^SIPp)|(^PJSUA))
  390. group 1: ((^190\.232\.250\.226$)|(^122\.5\.27\.125$)|(^86\.92\.112\.))
  391. group 2: ((^1\d{3}$)|(^((\+|00)34)?900\d{6}$))
  392. </programlisting>
  393. <para>
  394. The first group can be used to avoid auto-generated PUBLISH (pua_usrloc
  395. module) for UA's already supporting presence:
  396. </para>
  397. <example>
  398. <title>Using with pua_usrloc</title>
  399. <programlisting format="linespecific">
  400. route[REGISTER] {
  401. if (! pcre_match_group("$ua", "0")) {
  402. xlog("L_INFO", "Auto-generated PUBLISH for $fu ($ua)\n");
  403. pua_set_publish();
  404. }
  405. save("location");
  406. exit;
  407. }
  408. </programlisting>
  409. </example>
  410. <para>
  411. NOTE: It's important to understand that the numbers in each group
  412. header ([number]) must start by 0. If not, the real group number
  413. will not match the number appearing in the file. For example, the
  414. following text file:
  415. </para>
  416. <example>
  417. <title>Incorrect groups file</title>
  418. <programlisting format="linespecific">
  419. [1]
  420. ^aaa
  421. ^bbb
  422. [2]
  423. ^ccc
  424. ^ddd
  425. </programlisting>
  426. </example>
  427. <para>
  428. will generate the following regular expressions:
  429. </para>
  430. <programlisting format="linespecific">
  431. group 0: ((^aaa)|(^bbb))
  432. group 1: ((^ccc)|(^ddd))
  433. </programlisting>
  434. <para>
  435. Note that the real index doesn't match the group number in the file. This
  436. is, compiled group 0 always points to the first group in the file, regardless
  437. of its number in the file. In fact, the group number appearing in the file is
  438. used for nothing but for delimiting different groups.
  439. </para>
  440. <para>
  441. NOTE: A line containing a regular expression cannot start by '[' since it
  442. would be treated as a new group. The same for lines starting by space, tab,
  443. or '#' (they would be ignored by the parser). As a workaround, using brackets
  444. would work:
  445. </para>
  446. <programlisting format="linespecific">
  447. [0]
  448. ([0-9]{9})
  449. ( #abcde)
  450. ( qwerty)
  451. </programlisting>
  452. </section>
  453. </section>
  454. </chapter>