Kamailio
/
kamailio
mirror of https://github.com/kamailio/kamailio.git


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252
							<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 
   "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">

<section id="hfname_parser" xmlns:xi="http://www.w3.org/2001/XInclude">
    <sectioninfo>
	<revhistory>
	    <revision>
		<revnumber>$Revision$</revnumber>
		<date>$Date$</date>
	    </revision>
	</revhistory>
    </sectioninfo>
    
    <title>The Header Field Name Parser</title>
    <para>
	The purpose of the header field type parser is to recognize type of a
	header field.  The following types of header field will be recognized:
    </para>
    <para>
	Via, To, From, CSeq, Call-ID, Contact, Max-Forwards, Route,
	Record-Route, Content-Type, Content-Length, Authorization, Expires,
	Proxy-Authorization, WWW-Authorization, supported, Require,
	Proxy-Require, Unsupported, Allow, Event.
    </para>
    <para>
	All other header field types will be marked as HDR_OTHER.
    </para>
    <para>
	Main function of header name parser is
	<function>parse_hname2</function>. The function can be found in file
	<filename>parse_hname.c</filename>. The function accepts pointers to
	begin and end of a header field and fills in
	<structname>hdf_field</structname>
	structure. <structfield>name</structfield> field will point to the
	header field name, <structfield>body</structfield> field will point to
	the header field body and <structfield>type</structfield> field will
	contain type of the header field if known and HDR_OTHER if unknown.
    </para>
    <para>
	The parser is 32-bit, it means, that it processes 4 characters of
	header field name at time. 4 characters of a header field name are
	converted to an integer and the integer is then compared. This is much
	faster than comparing byte by byte. Because the server is compiled on
	at least 32-bit architectures, such comparison will be compiled into
	one instruction instead of 4 instructions.
    </para>
    <para>
	We did some performance measurement and 32-bit parsing is about 3 times
	faster for a typical <acronym>SIP</acronym> message than corresponding
	automaton comparing byte by byte. Performance may vary depending on the
	message size, parsed header fields and header fields type. Test showed
	that it was always as fast as corresponding 1-byte comparing automaton.
    </para>
    <para>
	Since comparison must be case insensitive in case of header field
	names, it is necessary to convert it to lower case first and then
	compare. Since converting byte by byte would slow down the parser a
	lot, we have implemented a hash table, that can again convert 4 bytes
	at once. Since set of keys that need to be converted to lowercase is
	known (the set consists of all possible 4-byte parts of all recognized
	header field names) we can pre-calculate size of the hash table to be
	synonym-less. That will simplify (and speed up) the lookup a lot. The
	hash table must be initialized upon the server startup (function
	<function>init_hfname_parser</function>).
    </para>
    <para>
	The header name parser consists of several files, all of them are under
	<filename>parser</filename> subdirectory. Main file is
	<filename>parse_hname2.c</filename> - this files contains the parser
	itself and functions used to initialize and lookup the hash table. File
	<filename>keys.h</filename> contains automatically generated set of
	macros. Each macro is a group of 4 bytes converted to integer. The
	macros are used for comparison and the hash table initialization. For
	example, for Max-Forwards header field name, the following macros are
	defined in the file:
    </para>
    <programlisting>
#define _max__ 0x2d78616d   /* "max-" */
#define _maX__ 0x2d58616d   /* "maX-" */
#define _mAx__ 0x2d78416d   /* "mAx-" */
#define _mAX__ 0x2d58416d   /* "mAX-" */
#define _Max__ 0x2d78614d   /* "Max-" */
#define _MaX__ 0x2d58614d   /* "MaX-" */
#define _MAx__ 0x2d78414d   /* "MAx-" */
#define _MAX__ 0x2d58414d   /* "MAX-" */

#define _forw_ 0x77726f66   /* "forw" */
#define _forW_ 0x57726f66   /* "forW" */
#define _foRw_ 0x77526f66   /* "foRw" */
#define _foRW_ 0x57526f66   /* "foRW" */
#define _fOrw_ 0x77724f66   /* "fOrw" */
#define _fOrW_ 0x57724f66   /* "fOrW" */
#define _fORw_ 0x77524f66   /* "fORw" */
#define _fORW_ 0x57524f66   /* "fORW" */
#define _Forw_ 0x77726f46   /* "Forw" */
#define _ForW_ 0x57726f46   /* "ForW" */
#define _FoRw_ 0x77526f46   /* "FoRw" */
#define _FoRW_ 0x57526f46   /* "FoRW" */
#define _FOrw_ 0x77724f46   /* "FOrw" */
#define _FOrW_ 0x57724f46   /* "FOrW" */
#define _FORw_ 0x77524f46   /* "FORw" */
#define _FORW_ 0x57524f46   /* "FORW" */

#define _ards_ 0x73647261   /* "ards" */
#define _ardS_ 0x53647261   /* "ardS" */
#define _arDs_ 0x73447261   /* "arDs" */
#define _arDS_ 0x53447261   /* "arDS" */
#define _aRds_ 0x73645261   /* "aRds" */
#define _aRdS_ 0x53645261   /* "aRdS" */
#define _aRDs_ 0x73445261   /* "aRDs" */
#define _aRDS_ 0x53445261   /* "aRDS" */
#define _Ards_ 0x73647241   /* "Ards" */
#define _ArdS_ 0x53647241   /* "ArdS" */
#define _ArDs_ 0x73447241   /* "ArDs" */
#define _ArDS_ 0x53447241   /* "ArDS" */
#define _ARds_ 0x73645241   /* "ARds" */
#define _ARdS_ 0x53645241   /* "ARdS" */
#define _ARDs_ 0x73445241   /* "ARDs" */
#define _ARDS_ 0x53445241   /* "ARDS" */
    </programlisting>
    <para>
	As you can see, Max-Forwards name was divided into three 4-byte chunks:
	Max-, Forw, ards. The file contains macros for every possible lower and
	upper case character combination of the chunks. Because the name (and
	therefore chunks) can contain colon (":"), minus or space and these
	characters are not allowed in macro name, they must be
	substituted. Colon is substituted by "1", minus is substituted by
	underscore ("_") and space is substituted by "2".
    </para>
    <para>
	When initializing the hash table, all these macros will be used as keys
	to the hash table. One of each upper and lower case combinations will
	be used as value. Which one ?
    </para>
    <para>
	There is a convention that each word of a header field name starts with
	a upper case character. For example, most of user agents will send
	"Max-Forwards", messages containing some other combination of upper and
	lower case characters (for example: "max-forwards", "MAX-FORWARDS",
	"mAX-fORWARDS") are very rare (but it is possible).
    </para>
    <para>
	Considering the previous paragraph, we optimized the parser for the
	most common case. When all header fields have upper and lower case
	characters according to the convention, there is no need to do hash
	table lookups, which is another speed up.
    </para>
    <para>
	For example suppose we are trying to figure out if the header field
	name is Max-Forwards and the header field name is formed according to
	the convention (i.e. "Max-Forwards"):
	<itemizedlist>
	    <listitem>
		<para>
		    Get the first 4 bytes of the header field name ("Max-"),
		    convert it to an integer and compare to "_Max__"
		    macro. Comparison succeeded, continue with the next step.
		</para>
	    </listitem>
	    <listitem>
		<para>
		    Get next 4 bytes of the header field name ("Forw"), convert
		    it to an integer and compare to "_Forw_" macro. Comparison
		    succeeded, continue with the next step.
		</para>
	    </listitem>
	    <listitem>
		<para>
		    Get next 4 bytes of the header field name ("ards"), convert
		    it to an integer and compare to "_ards_" macro. Comparison
		    succeeded, continue with the next step.
		</para>
	    </listitem>
	    <listitem>
		<para>
		    If the following characters are spaces and tabs followed by
		    a colon (or colon directly without spaces and tabs), we
		    found Max-Forwards header field name and can set
		    <structfield>type</structfield> field to
		    HDR_MAXFORWARDS. Otherwise (other characters than colon,
		    spaces and tabs) it is some other header field and set
		    <structfield>type</structfield> field to HDR_OTHER.
		</para>
	    </listitem>
	</itemizedlist>
    </para>
    <para>
	As you can see, there is no need to do hash table lookups if the header
	field was formed according to the convention and the comparison was
	very fast (only 3 comparisons needed !).
    </para>
    <para>
	Now lets consider another example, the header field was not formed
	according to the convention, for example "MAX-forwards":
	<itemizedlist>
	    <listitem>
		<para>
		    Get the first 4 bytes of the header field name ("MAX-"),
		    convert it to an integer and compare to "_Max__" macro.
		</para>
		<para>
		    Comparison failed, try to lookup "MAX-" converted to
		    integer in the hash table. It was found, result is "Max-"
		    converted to integer.
		</para>
		<para>
		    Try to compare the result from the hash table to "_Max__"
		    macro. Comparison succeeded, continue with the next step.
		</para>
	    </listitem>
	    <listitem>
		<para>
		    Compare next 4 bytes of the header field name ("forw"),
		    convert it to an integer and compare to "_Max__" macro.
		</para>
		<para>
		    Comparison failed, try to lookup "forw" converted to
		    integer in the hash table. It was found, result is "Forw"
		    converted to integer.
		</para>
		<para>
		    Try to compare the result from the hash table to "Forw"
		    macro. Comparison succeeded, continue with the next step.
		</para>
	    </listitem>
	    <listitem>
		<para>
		    Compare next 4 bytes of the header field name ("ards"),
		    convert it to integer and compare to "ards"
		    macro. Comparison succeeded, continue with the next step.
		</para>
	    </listitem>
	    <listitem>
		<para>
		    If the following characters are spaces and tabs followed by
		    a colon (or colon directly without spaces and tabs), we
		    found Max-Forwards header field name and can set
		    <structfield>type</structfield> field to
		    HDR_MAXFORWARDS. Otherwise (other characters than colon,
		    spaces and tabs) it is some other header field and set
		    <structfield>type</structfield> field to HDR_OTHER.
		</para>
	    </listitem>
	</itemizedlist>
    </para>
    <para>
	In this example, we had to do 2 hash table lookups and 2 more
	comparisons. Even this variant is still very fast, because the hash
	table lookup is synonym-less, lookups are very fast.
    </para>
</section>