|
@@ -0,0 +1,246 @@
|
|
|
+HTTP Parser
|
|
|
+===========
|
|
|
+
|
|
|
+[](https://travis-ci.org/nodejs/http-parser)
|
|
|
+
|
|
|
+This is a parser for HTTP messages written in C. It parses both requests and
|
|
|
+responses. The parser is designed to be used in performance HTTP
|
|
|
+applications. It does not make any syscalls nor allocations, it does not
|
|
|
+buffer data, it can be interrupted at anytime. Depending on your
|
|
|
+architecture, it only requires about 40 bytes of data per message
|
|
|
+stream (in a web server that is per connection).
|
|
|
+
|
|
|
+Features:
|
|
|
+
|
|
|
+ * No dependencies
|
|
|
+ * Handles persistent streams (keep-alive).
|
|
|
+ * Decodes chunked encoding.
|
|
|
+ * Upgrade support
|
|
|
+ * Defends against buffer overflow attacks.
|
|
|
+
|
|
|
+The parser extracts the following information from HTTP messages:
|
|
|
+
|
|
|
+ * Header fields and values
|
|
|
+ * Content-Length
|
|
|
+ * Request method
|
|
|
+ * Response status code
|
|
|
+ * Transfer-Encoding
|
|
|
+ * HTTP version
|
|
|
+ * Request URL
|
|
|
+ * Message body
|
|
|
+
|
|
|
+
|
|
|
+Usage
|
|
|
+-----
|
|
|
+
|
|
|
+One `http_parser` object is used per TCP connection. Initialize the struct
|
|
|
+using `http_parser_init()` and set the callbacks. That might look something
|
|
|
+like this for a request parser:
|
|
|
+```c
|
|
|
+http_parser_settings settings;
|
|
|
+settings.on_url = my_url_callback;
|
|
|
+settings.on_header_field = my_header_field_callback;
|
|
|
+/* ... */
|
|
|
+
|
|
|
+http_parser *parser = malloc(sizeof(http_parser));
|
|
|
+http_parser_init(parser, HTTP_REQUEST);
|
|
|
+parser->data = my_socket;
|
|
|
+```
|
|
|
+
|
|
|
+When data is received on the socket execute the parser and check for errors.
|
|
|
+
|
|
|
+```c
|
|
|
+size_t len = 80*1024, nparsed;
|
|
|
+char buf[len];
|
|
|
+ssize_t recved;
|
|
|
+
|
|
|
+recved = recv(fd, buf, len, 0);
|
|
|
+
|
|
|
+if (recved < 0) {
|
|
|
+ /* Handle error. */
|
|
|
+}
|
|
|
+
|
|
|
+/* Start up / continue the parser.
|
|
|
+ * Note we pass recved==0 to signal that EOF has been received.
|
|
|
+ */
|
|
|
+nparsed = http_parser_execute(parser, &settings, buf, recved);
|
|
|
+
|
|
|
+if (parser->upgrade) {
|
|
|
+ /* handle new protocol */
|
|
|
+} else if (nparsed != recved) {
|
|
|
+ /* Handle error. Usually just close the connection. */
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+HTTP needs to know where the end of the stream is. For example, sometimes
|
|
|
+servers send responses without Content-Length and expect the client to
|
|
|
+consume input (for the body) until EOF. To tell http_parser about EOF, give
|
|
|
+`0` as the fourth parameter to `http_parser_execute()`. Callbacks and errors
|
|
|
+can still be encountered during an EOF, so one must still be prepared
|
|
|
+to receive them.
|
|
|
+
|
|
|
+Scalar valued message information such as `status_code`, `method`, and the
|
|
|
+HTTP version are stored in the parser structure. This data is only
|
|
|
+temporally stored in `http_parser` and gets reset on each new message. If
|
|
|
+this information is needed later, copy it out of the structure during the
|
|
|
+`headers_complete` callback.
|
|
|
+
|
|
|
+The parser decodes the transfer-encoding for both requests and responses
|
|
|
+transparently. That is, a chunked encoding is decoded before being sent to
|
|
|
+the on_body callback.
|
|
|
+
|
|
|
+
|
|
|
+The Special Problem of Upgrade
|
|
|
+------------------------------
|
|
|
+
|
|
|
+HTTP supports upgrading the connection to a different protocol. An
|
|
|
+increasingly common example of this is the WebSocket protocol which sends
|
|
|
+a request like
|
|
|
+
|
|
|
+ GET /demo HTTP/1.1
|
|
|
+ Upgrade: WebSocket
|
|
|
+ Connection: Upgrade
|
|
|
+ Host: example.com
|
|
|
+ Origin: http://example.com
|
|
|
+ WebSocket-Protocol: sample
|
|
|
+
|
|
|
+followed by non-HTTP data.
|
|
|
+
|
|
|
+(See [RFC6455](https://tools.ietf.org/html/rfc6455) for more information the
|
|
|
+WebSocket protocol.)
|
|
|
+
|
|
|
+To support this, the parser will treat this as a normal HTTP message without a
|
|
|
+body, issuing both on_headers_complete and on_message_complete callbacks. However
|
|
|
+http_parser_execute() will stop parsing at the end of the headers and return.
|
|
|
+
|
|
|
+The user is expected to check if `parser->upgrade` has been set to 1 after
|
|
|
+`http_parser_execute()` returns. Non-HTTP data begins at the buffer supplied
|
|
|
+offset by the return value of `http_parser_execute()`.
|
|
|
+
|
|
|
+
|
|
|
+Callbacks
|
|
|
+---------
|
|
|
+
|
|
|
+During the `http_parser_execute()` call, the callbacks set in
|
|
|
+`http_parser_settings` will be executed. The parser maintains state and
|
|
|
+never looks behind, so buffering the data is not necessary. If you need to
|
|
|
+save certain data for later usage, you can do that from the callbacks.
|
|
|
+
|
|
|
+There are two types of callbacks:
|
|
|
+
|
|
|
+* notification `typedef int (*http_cb) (http_parser*);`
|
|
|
+ Callbacks: on_message_begin, on_headers_complete, on_message_complete.
|
|
|
+* data `typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);`
|
|
|
+ Callbacks: (requests only) on_url,
|
|
|
+ (common) on_header_field, on_header_value, on_body;
|
|
|
+
|
|
|
+Callbacks must return 0 on success. Returning a non-zero value indicates
|
|
|
+error to the parser, making it exit immediately.
|
|
|
+
|
|
|
+For cases where it is necessary to pass local information to/from a callback,
|
|
|
+the `http_parser` object's `data` field can be used.
|
|
|
+An example of such a case is when using threads to handle a socket connection,
|
|
|
+parse a request, and then give a response over that socket. By instantiation
|
|
|
+of a thread-local struct containing relevant data (e.g. accepted socket,
|
|
|
+allocated memory for callbacks to write into, etc), a parser's callbacks are
|
|
|
+able to communicate data between the scope of the thread and the scope of the
|
|
|
+callback in a threadsafe manner. This allows http-parser to be used in
|
|
|
+multi-threaded contexts.
|
|
|
+
|
|
|
+Example:
|
|
|
+```c
|
|
|
+ typedef struct {
|
|
|
+ socket_t sock;
|
|
|
+ void* buffer;
|
|
|
+ int buf_len;
|
|
|
+ } custom_data_t;
|
|
|
+
|
|
|
+
|
|
|
+int my_url_callback(http_parser* parser, const char *at, size_t length) {
|
|
|
+ /* access to thread local custom_data_t struct.
|
|
|
+ Use this access save parsed data for later use into thread local
|
|
|
+ buffer, or communicate over socket
|
|
|
+ */
|
|
|
+ parser->data;
|
|
|
+ ...
|
|
|
+ return 0;
|
|
|
+}
|
|
|
+
|
|
|
+...
|
|
|
+
|
|
|
+void http_parser_thread(socket_t sock) {
|
|
|
+ int nparsed = 0;
|
|
|
+ /* allocate memory for user data */
|
|
|
+ custom_data_t *my_data = malloc(sizeof(custom_data_t));
|
|
|
+
|
|
|
+ /* some information for use by callbacks.
|
|
|
+ * achieves thread -> callback information flow */
|
|
|
+ my_data->sock = sock;
|
|
|
+
|
|
|
+ /* instantiate a thread-local parser */
|
|
|
+ http_parser *parser = malloc(sizeof(http_parser));
|
|
|
+ http_parser_init(parser, HTTP_REQUEST); /* initialise parser */
|
|
|
+ /* this custom data reference is accessible through the reference to the
|
|
|
+ parser supplied to callback functions */
|
|
|
+ parser->data = my_data;
|
|
|
+
|
|
|
+ http_parser_settings settings; /* set up callbacks */
|
|
|
+ settings.on_url = my_url_callback;
|
|
|
+
|
|
|
+ /* execute parser */
|
|
|
+ nparsed = http_parser_execute(parser, &settings, buf, recved);
|
|
|
+
|
|
|
+ ...
|
|
|
+ /* parsed information copied from callback.
|
|
|
+ can now perform action on data copied into thread-local memory from callbacks.
|
|
|
+ achieves callback -> thread information flow */
|
|
|
+ my_data->buffer;
|
|
|
+ ...
|
|
|
+}
|
|
|
+
|
|
|
+```
|
|
|
+
|
|
|
+In case you parse HTTP message in chunks (i.e. `read()` request line
|
|
|
+from socket, parse, read half headers, parse, etc) your data callbacks
|
|
|
+may be called more than once. Http-parser guarantees that data pointer is only
|
|
|
+valid for the lifetime of callback. You can also `read()` into a heap allocated
|
|
|
+buffer to avoid copying memory around if this fits your application.
|
|
|
+
|
|
|
+Reading headers may be a tricky task if you read/parse headers partially.
|
|
|
+Basically, you need to remember whether last header callback was field or value
|
|
|
+and apply the following logic:
|
|
|
+
|
|
|
+ (on_header_field and on_header_value shortened to on_h_*)
|
|
|
+ ------------------------ ------------ --------------------------------------------
|
|
|
+ | State (prev. callback) | Callback | Description/action |
|
|
|
+ ------------------------ ------------ --------------------------------------------
|
|
|
+ | nothing (first call) | on_h_field | Allocate new buffer and copy callback data |
|
|
|
+ | | | into it |
|
|
|
+ ------------------------ ------------ --------------------------------------------
|
|
|
+ | value | on_h_field | New header started. |
|
|
|
+ | | | Copy current name,value buffers to headers |
|
|
|
+ | | | list and allocate new buffer for new name |
|
|
|
+ ------------------------ ------------ --------------------------------------------
|
|
|
+ | field | on_h_field | Previous name continues. Reallocate name |
|
|
|
+ | | | buffer and append callback data to it |
|
|
|
+ ------------------------ ------------ --------------------------------------------
|
|
|
+ | field | on_h_value | Value for current header started. Allocate |
|
|
|
+ | | | new buffer and copy callback data to it |
|
|
|
+ ------------------------ ------------ --------------------------------------------
|
|
|
+ | value | on_h_value | Value continues. Reallocate value buffer |
|
|
|
+ | | | and append callback data to it |
|
|
|
+ ------------------------ ------------ --------------------------------------------
|
|
|
+
|
|
|
+
|
|
|
+Parsing URLs
|
|
|
+------------
|
|
|
+
|
|
|
+A simplistic zero-copy URL parser is provided as `http_parser_parse_url()`.
|
|
|
+Users of this library may wish to use it to parse URLs constructed from
|
|
|
+consecutive `on_url` callbacks.
|
|
|
+
|
|
|
+See examples of reading in headers:
|
|
|
+
|
|
|
+* [partial example](http://gist.github.com/155877) in C
|
|
|
+* [from http-parser tests](http://github.com/joyent/http-parser/blob/37a0ff8/test.c#L403) in C
|
|
|
+* [from Node library](http://github.com/joyent/node/blob/842eaf4/src/http.js#L284) in Javascript
|