| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348 |
- Additional functionality
- ------------------------
- .. _build_excerpts:
- BuildExcerpts
- ~~~~~~~~~~~~~
- **Prototype:** function BuildExcerpts ( $docs, $index, $words,
- $opts=array() )
- Excerpts (snippets) builder function. Connects to ``searchd``, asks it
- to generate excerpts (snippets) from given documents, and returns the
- results.
- ``$docs`` is a plain array of strings that carry the documents'
- contents. ``$index`` is an index name string. Different settings (such
- as charset, morphology, wordforms) from given index will be used.
- ``$words`` is a string that contains the keywords to highlight. They
- will be processed with respect to index settings. For instance, if
- English stemming is enabled in the index, ``shoes`` will be highlighted
- even if keyword is ``shoe``. Keywords can contain wildcards, that work
- similarly to star-syntax available in queries. ``$opts`` is a hash which
- contains additional optional highlighting parameters:
- - ``before_match``:
- A string to insert before a keyword match. A ``%PASSAGE_ID%`` macro can
- be used in this string. The first match of the macro is replaced with
- an incrementing passage number within a current snippet. Numbering
- starts at 1 by default but can be overridden with
- ``start_passage_id`` option. In a multi-document call, %PASSAGE_ID%
- would restart at every given document. Default is ``<b>``.
- - ``after_match``:
- A string to insert after a keyword match. Starting with version
- 1.10-beta, a %PASSAGE_ID% macro can be used in this string. Default
- is ``</b>``.
- - ``chunk_separator``:
- A string to insert between snippet chunks (passages). Default is ``…``.
- - ``field_separator``:
- A string to insert between fields. Default is ``|``.
- - ``limit``:
- Maximum snippet size, in symbols (codepoints). Integer, default is
- 256.
- - ``around``:
- How much words to pick around each matching keywords block. Integer,
- default is 5.
- - ``exact_phrase``:
- Whether to highlight exact query phrase matches only instead of
- individual keywords. Boolean, default is false.
- - ``use_boundaries``:
- Whether to additionally break passages by phrase boundary characters,
- as configured in index settings with
- :ref:`phrase_boundary <phrase_boundary>`
- directive. Boolean, default is false.
- - ``weight_order``:
- Whether to sort the extracted passages in order of relevance
- (decreasing weight), or in order of appearance in the document
- (increasing position). Boolean, default is false.
- - ``query_mode``:
- Whether to handle $words as a query in :ref:`extended
- syntax <extended_query_syntax>`, or as a bag of words
- (default behavior). For instance, in query mode (``one two`` \| ``three
- four``) will only highlight and include those occurrences ``one two`` or
- ``three four`` when the two words from each pair are adjacent to each
- other. In default mode, any single occurrence of ``one``, ``two``,
- ``three``, or ``four`` would be highlighted. Boolean, default is false.
- - ``force_all_words``:
- Ignores the snippet length limit until it includes all the keywords.
- Boolean, default is false.
- - ``limit_passages``:
- Limits the maximum number of passages that can be included into the
- snippet. Integer, default is 0 (no limit).
- - ``limit_words``:
- Limits the maximum number of words that can be included into the
- snippet. Note the limit applies to any words, and not just the
- matched keywords to highlight. For example, if we are highlighting
- ``Mary`` and a passage ``Mary had a little lamb`` is selected, then it
- contributes 5 words to this limit, not just 1. Integer, default is 0
- (no limit).
- - ``start_passage_id``:
- Specifies the starting value of ``%PASSAGE_ID%`` macro (that gets
- detected and expanded in ``before_match``, ``after_match`` strings).
- Integer, default is 1.
- - ``load_files``:
- Whether to handle $docs as data to extract snippets from (default
- behavior), or to treat it as file names, and load data from specified
- files on the server side. Up to
- :ref:`dist_threads <dist_threads>`
- worker threads per request will be created to parallelize the work
- when this flag is enabled. Boolean, default is false. Building of the
- snippets could be parallelized between remote agents. Just set the
- :ref:`‘dist_threads’ <dist_threads>`
- param in the config to the value greater than 1, and then invoke the
- snippets generation over the distributed index, which contain only
- one(!) :ref:`local <local>` agent
- and several remotes. The
- :ref:`snippets_file_prefix <snippets_file_prefix>`
- option is also in the game and the final filename is calculated by
- concatenation of the prefix with given name. Otherwords, when
- snippets_file_prefix is ‘/var/data’ and filename is ‘text.txt’ the
- sphinx will try to generate the snippets from the file
- ‘/var/datatext.txt’, which is exactly ‘/var/data’ + ‘text.txt’.
- - ``load_files_scattered``:
- It works only with distributed snippets generation with remote
- agents. The source files for snippets could be distributed among
- different agents, and the main daemon will merge together all
- non-erroneous results. So, if one agent of the distributed index has
- ‘file1.txt’, another has ‘file2.txt’ and you call for the snippets
- with both these files, the sphinx will merge results from the agents
- together, so you will get the snippets from both ‘file1.txt’ and
- ‘file2.txt’. Boolean, default is false.
- If the ``load_files`` is also set, the request will return the error
- in case if any of the files is not available anywhere. Otherwise (if
- ``load_files`` is not set) it will just return the empty strings for
- all absent files. The master instance reset this flag when
- distributes the snippets among agents. So, for agents the absence of
- a file is not critical error, but for the master it might be so. If
- you want to be sure that all snippets are actually created, set both
- ``load_files_scattered`` and ``load_files``. If the absence of some
- snippets caused by some agents is not critical for you - set just
- ``load_files_scattered``, leaving ``load_files`` not set.
- - ``html_strip_mode``:
- HTML stripping mode setting. Defaults to ``index``, which means that
- index settings will be used. The other values are ``none`` and ``strip``,
- that forcibly skip or apply stripping irregardless of index settings;
- and ``retain``, that retains HTML markup and protects it from
- highlighting. The ``retain`` mode can only be used when highlighting
- full documents and thus requires that no snippet size limits are set.
- String, allowed values are ``none``, ``strip``, ``index``, and ``retain``.
- - ``allow_empty``:
- Allows empty string to be returned as highlighting result when a
- snippet could not be generated (no keywords match, or no passages fit
- the limit). By default, the beginning of original text would be
- returned instead of an empty string. Boolean, default is false.
- - ``passage_boundary``:
- Ensures that passages do not cross a sentence, paragraph, or zone
- boundary (when used with an index that has the respective indexing
- settings enabled). String, allowed values are ``sentence``,
- ``paragraph``, and ``zone``.
- - ``emit_zones``:
- Emits an HTML tag with an enclosing zone name before each passage.
- Boolean, default is false.
- - ``force_passages``:
- Whether to generate passages for snippet even if limits allow to highlight
- whole text. Boolean, default is false.
- Snippets extraction algorithm currently favors better passages (with
- closer phrase matches), and then passages with keywords not yet in
- snippet. Generally, it will try to highlight the best match with the
- query, and it will also to highlight all the query keywords, as made
- possible by the limits. In case the document does not match the query,
- beginning of the document trimmed down according to the limits will be
- return by default. You can also return an empty snippet instead case by
- setting ``allow_empty`` option to true.
- Returns false on failure. Returns a plain array of strings with excerpts
- (snippets) on success.
- .. _build_keywords:
- BuildKeywords
- ~~~~~~~~~~~~~
- **Prototype:** function BuildKeywords ( $query, $index, $hits )
- Extracts keywords from query using tokenizer settings for given index,
- optionally with per-keyword occurrence statistics. Returns an array of
- hashes with per-keyword information.
- ``$query`` is a query to extract keywords from. ``$index`` is a name of
- the index to get tokenizing settings and keyword occurrence statistics
- from. ``$hits`` is a boolean flag that indicates whether keyword
- occurrence statistics are required.
- Usage example:
- .. code-block:: php
- $keywords = $cl->BuildKeywords ( "this.is.my query", "test1", false );
- .. _escape_string:
- EscapeString
- ~~~~~~~~~~~~
- **Prototype:** function EscapeString ( $string )
- Escapes characters that are treated as special operators by the query
- language parser. Returns an escaped string.
- ``$string`` is a string to escape.
- This function might seem redundant because it's trivial to implement in
- any calling application. However, as the set of special characters might
- change over time, it makes sense to have an API call that is guaranteed
- to escape all such characters at all times.
- Usage example:
- .. code-block:: php
- $escaped = $cl->EscapeString ( "escaping-sample@query/string" );
- .. _flush_attributes:
- FlushAttributes
- ~~~~~~~~~~~~~~~
- **Prototype:** function FlushAttributes ()
- Forces ``searchd`` to flush pending attribute updates to disk, and
- blocks until completion. Returns a non-negative internal ``flush tag`` on
- success. Returns -1 and sets an error message on error.
- Attribute values updated using
- :ref:`UpdateAttributes() <update_attributes>`
- API call are kept in a memory mapped file. Which means the OS
- decides when the updates are actually written to disk.
- FlushAttributes() call lets you enforce a flush, which writes all the
- changes to disk. The call will block
- until ``searchd`` finishes writing the data to disk, which might take
- seconds or even minutes depending on the total data size (.spa file
- size). All the currently updated indexes will be flushed.
- Flush tag should be treated as an ever growing magic number that does
- not mean anything. It's guaranteed to be non-negative. It is guaranteed
- to grow over time, though not necessarily in a sequential fashion; for
- instance, two calls that return 10 and then 1000 respectively are a
- valid situation. If two calls to FlushAttrs() return the same tag, it
- means that there were no actual attribute updates in between them, and
- therefore current flushed state remained the same (for all indexes).
- Usage example:
- .. code-block:: php
- $status = $cl->FlushAttributes ();
- if ( $status<0 )
- print "ERROR: " . $cl->GetLastError();
- .. _Status:
- Status
- ~~~~~~
- **Prototype:** function Status ()
- Queries searchd status, and returns an array of status variable name and
- value pairs.
- Usage example:
- .. code-block:: php
- $status = $cl->Status ();
- foreach ( $status as $row )
- print join ( ": ", $row ) . "\n";
- .. _update_attributes:
- UpdateAttributes
- ~~~~~~~~~~~~~~~~
- **Prototype:** function UpdateAttributes ( $index, $attrs, $values,
- $type=SPH_UPDATE_INT, $ignorenonexistent=false )
- Instantly updates given attribute values in given documents. Returns
- number of actually updated documents (0 or more) on success, or -1 on
- failure.
- ``$index`` is a name of the index (or indexes) to be updated. ``$attrs``
- is a plain array with string attribute names, listing attributes that
- are updated.
- .. warning::
- Note that document ``id`` attribute cannot be updated.
- ``$values`` is a hash with documents IDs as keys and new attribute values,
- see below.
- Optional ``$type`` parameter can have the following values:
- 1. ``SPH_UPDATE_INT``. This is the default value. ``$values`` hash holds
- documents IDs as keys and a plain arrays of new attribute values.
- 2. ``SPH_UPDATE_MVA``. Points that MVA attributes are being updated. In this
- case the ``$values`` must be a hash with document IDs as keys and array of
- arrays of int values (new MVA attribute values).
- 3. ``SPH_UPDATE_STRING``. Points that string attributes are being updated.
- ``$values`` must be a hash with document IDs as keys and array of strings
- as values.
- 4. ``SPH_UPDATE_JSON``. Works the same as ``SPH_UPDATE_STRING``, but for
- JSON attribute updates.
- Optional boolean parameter ``$ignorenonexistent``
- points that the update will silently ignore any warnings about trying to
- update a column which is not exists in current index schema.
- ``$index`` can be either a single index name or a list, like in
- ``Query()``. Unlike ``Query()``, wildcard is not allowed and all the
- indexes to update must be specified explicitly. The list of indexes can
- include distributed index names. Updates on distributed indexes will be
- pushed to all agents.
- Usage example:
- .. code-block:: php
- $cl->UpdateAttributes ( "test1", array("group_id"), array(1=>array(456)) );
- $cl->UpdateAttributes ( "products", array ( "price", "amount_in_stock" ),
- array ( 1001=>array(123,5), 1002=>array(37,11), 1003=>(25,129) ) );
- The first sample statement will update document 1 in index ``test1``,
- setting ``group_id`` to 456. The second one will update documents 1001,
- 1002 and 1003 in index ``products``. For document 1001, the new price will
- be set to 123 and the new amount in stock to 5; for document 1002, the
- new price will be 37 and the new amount will be 11; etc.
|