Adrian Nuta 6 years ago
parent
commit
5826a4ff0b

+ 8 - 3
docs/conf_options_reference/index_configuration_options.rst

@@ -2112,9 +2112,14 @@ as follows:
 
 -  icu_chinese - apply Chinese text segmentation using ICU
 
-Additional values provided by libstemmer are in ‘libstemmer_XXX’
-format, where XXX is libstemmer algorithm codename (refer to
-``libstemmer_c/libstemmer/modules.txt`` for a complete list).
+Additional values provided by libstemmer are in ‘libstemmer_XX’ or 'libstemmer_XXX'
+format, where XX/XXX is libstemmer algorithm codename.
+
+Current list includes: arabic (ar,ara), basque (eu,eus,baq), catalan (ca,cat), danish (da,dan), dutch (nl,dut,nld), english (en,eng), 
+finnish (fi,fin), french (fr,fre,fra), german (de,ger,deu), greek (el,gre,ell), hindi (hi,hin), hungarian (hu,hun), indonesian (id,ind), 
+irish (ga,gle), italian (it,ita), lithuanian (lt,lit), nepali (ne,nep), norwegian (no,nor), portuguese (pt,por), romanian (ro,rum,ron), 
+russian (ru,rus), spanish (es,esl,spa), swedish (sv,swe), tamil (ta,tam), turkish (tr,tur) (refer also to
+``libstemmer_c/libstemmer/modules.txt`` for an up-to-date complete list).
 
 Several stemmers can be specified (comma-separated). They will be
 applied to incoming words in the order they are listed, and the

+ 45 - 43
docs/getting-started/docker.rst

@@ -53,14 +53,14 @@ Now let's look at our RT index:
 .. code-block:: mysql
 
    mysql> DESCRIBE testrt;
-   +---------+--------+
-   | Field   | Type   |
-   +---------+--------+
-   | id      | bigint |
-   | title   | field  |
-   | content | field  |
-   | gid     | uint   |
-   +---------+--------+
+   +----------+--------+------------+
+   | Field    | Type   | Properties |
+   +----------+--------+------------+
+   | id       | bigint |            |
+   | title    | field  | stored     |
+   | content  | field  | stored     |
+   | gid      | uint   |            |
+   +----------+--------+------------+
    4 rows in set (0.00 sec)
 
 As the RT indexes start empty, let's add some data into it first   
@@ -91,14 +91,14 @@ Fulltext searches are done with the special clause MATCH, which is the main work
 .. code-block:: mysql
 
    mysql> SELECT * FROM testrt WHERE MATCH('list of laptops');
-   +------+------+
-   | id   | gid  |
-   +------+------+
-   |    1 |   10 |
-   |    2 |   10 |
-   |    3 |   20 |
-   |    5 |   30 |
-   +------+------+
+   +------+------+-------------------------------------+---------------------------+
+   | id   | gid  | title                               | content                   |
+   +------+------+-------------------------------------+---------------------------+
+   |    1 |   10 | List of HP business laptops         | Elitebook Probook         |
+   |    2 |   10 | List of Dell business laptops       | Latitude Precision Vostro |
+   |    3 |   20 | List of Dell gaming laptops         | Inspirion Alienware       |
+   |    5 |   30 | List of ASUS ultrabooks and laptops | Zenbook Vivobook          |
+   +------+------+-------------------------------------+---------------------------+
    4 rows in set (0.00 sec)
 
 
@@ -110,12 +110,12 @@ Now let's add some filtering and more ordering:
 .. code-block:: mysql
   
    mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCH('list of laptops') AND gid>10  ORDER BY WEIGHT() DESC,gid DESC;
-   +------+------+----------+
-   | id   | gid  | weight() |
-   +------+------+----------+
-   |    5 |   30 |     2334 |
-   |    3 |   20 |     2334 |
-   +------+------+----------+
+   +------+------+-------------------------------------+---------------------+----------+
+   | id   | gid  | title                               | content             | weight() |
+   +------+------+-------------------------------------+---------------------+----------+
+   |    5 |   30 | List of ASUS ultrabooks and laptops | Zenbook Vivobook    |     2334 |
+   |    3 |   20 | List of Dell gaming laptops         | Inspirion Alienware |     2334 |
+   +------+------+-------------------------------------+---------------------+----------+
    2 rows in set (0.00 sec)
 
 
@@ -127,14 +127,14 @@ The search above does a simple matching, where all words need to be present. But
 .. code-block:: mysql
 
    mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCH('"list of business laptops"/3');
-   +------+------+----------+
-   | id   | gid  | weight() |
-   +------+------+----------+
-   |    1 |   10 |     2397 |
-   |    2 |   10 |     2397 |
-   |    3 |   20 |     2375 |
-   |    5 |   30 |     2375 |
-   +------+------+----------+
+   +------+------+-------------------------------------+---------------------------+----------+
+   | id   | gid  | title                               | content                   | weight() |
+   +------+------+-------------------------------------+---------------------------+----------+
+   |    1 |   10 | List of HP business laptops         | Elitebook Probook         |     2397 |
+   |    2 |   10 | List of Dell business laptops       | Latitude Precision Vostro |     2397 |
+   |    3 |   20 | List of Dell gaming laptops         | Inspirion Alienware       |     2375 |
+   |    5 |   30 | List of ASUS ultrabooks and laptops | Zenbook Vivobook          |     2375 |
+   +------+------+-------------------------------------+---------------------------+----------+
    4 rows in set (0.00 sec)
    
    
@@ -179,6 +179,7 @@ To create a new RT index, you need to define it in the manticore.conf. A simple
          rt_attr_uint = attr2
    }
 
+Remember that rt_fields are only indexed and not stored by default. If you want their values back in the results, you need to add 'stored_fields = title'.
 To get the index online you need to either restart the daemon or send a HUP signal to it.
 
 Using plain indexes
@@ -215,6 +216,7 @@ Add in your manticore.conf:
 
         source                  = src1
         path                    = /var/lib/manticore/data/test1
+		stored_fields 			= title, content
         min_word_len            = 1
 
    }
@@ -293,14 +295,14 @@ Index is created and is ready to be used:
    3 rows in set (0.00 sec)
    
    mysql> SELECT * FROM test1;
-   +------+----------+------------+
-   | id   | group_id | date_added |
-   +------+----------+------------+
-   |    1 |        1 | 1507904567 |
-   |    2 |        1 | 1507904567 |
-   |    3 |        2 | 1507904567 |
-   |    4 |        2 | 1507904567 |
-   +------+----------+------------+
+   +------+----------+------------+-----------------+---------------------------------------------------------------------------+
+   | id   | group_id | date_added | title           | content                                                                   |
+   +------+----------+------------+-----------------+---------------------------------------------------------------------------+
+   |    1 |        1 | 1497982018 | test one        | this is my test document number one. also checking search within phrases. |
+   |    2 |        1 | 1497982018 | test two        | this is my test document number two                                       |
+   |    3 |        2 | 1497982018 | another doc     | this is another group                                                     |
+   |    4 |        2 | 1497982018 | doc number four | this is to test groups                                                    |
+   +------+----------+------------+-----------------+---------------------------------------------------------------------------+
    4 rows in set (0.00 sec)
    
 A quick test of a search which should match 2 terms, but not match another one:
@@ -308,9 +310,9 @@ A quick test of a search which should match 2 terms, but not match another one:
 .. code-block:: mysql
    
    mysql> SELECT * FROM test1 WHERE MATCH('test document -one');
-   +------+----------+------------+-------+
-   | id   | group_id | date_added | tag   |
-   +------+----------+------------+-------+
-   |    2 |        1 | 1519040667 | 2,4,6 |
-   +------+----------+------------+-------+
+   +------+----------+------------+----------+-------------------------------------+
+   | id   | group_id | date_added | title    | content                             |
+   +------+----------+------------+----------+-------------------------------------+
+   |    2 |        1 | 1497982018 | test two | this is my test document number two |
+   +------+----------+------------+----------+-------------------------------------+
    1 row in set (0.00 sec)

+ 3 - 45
docs/getting-started/indexes.rst

@@ -110,11 +110,12 @@ An example of Real-Time index configuration:
   index realtime {
     type           = rt
 	path           = /path/to/realtime
-    rt_field       = title
-    rt_field       = description
+	rt_field       = title
+	rt_field       = description
 	rt_attr_uint   = category_id
 	rt_attr_string = title
 	rt_attr_json   = metadata
+	stored_fields  = description
     ...
    }
    
@@ -167,46 +168,3 @@ Now we added mirrors, each shard is found on 2 servers. By default, the master (
 The mode used for picking mirrors can be set with ha_strategy. In addition to random, another simple method is to do a round-robin selection ( ha_strategy= roundrobin).
 
 The more interesting strategies are the latency-weighted probabilities based ones. noerrors and nodeads not only that take out mirrors with issues, but also monitor the response times and do a balancing. If a mirror responds slower (for example due to some operations running on it), it will receive less requests. When the mirror recovers and provides better times, it will get more requests.
-
-
-Replication and cluster
-~~~~~~~~~~~~~~~~~~~~~~~
-
-To use replication define one :ref:`listen <listen>` port for SphinxAPI protocol and one :ref:`listen <listen>` for
-replication address and port range in the config. Define :ref:`data_dir <data_dir>` folder for incoming indexes.
-
-.. code-block::  none
-
-  searchd {
-    listen   = 9312
-    listen   = 192.168.1.101:9360-9370:replication
-    data_dir = /var/lib/manticore/
-    ...
-   }
-
-Create a cluster (via SphinxQL) at the daemon that has local indexes that need to be replicated 
-
-.. code-block:: sql
-
-    CREATE CLUSTER posts
-	
-Add these local indexes to cluster
-
-.. code-block:: sql
-
-    ALTER CLUSTER posts ADD pq_title
-    ALTER CLUSTER posts ADD pq_clicks
-	
-All other nodes that want replica of cluster's indexes should join cluster as
-
-.. code-block:: sql
-
-    JOIN CLUSTER posts AT '192.168.1.101:9312'
-
-When running queries prepend the index name with the cluster name (``posts:``).
-
-.. code-block:: sql
-
-    INSERT INTO posts:pq_title VALUES ( 3, 'test me' )
-
-Now all such queries that modify indexes in the cluster are replicated to all nodes in the cluster.

+ 49 - 46
docs/getting-started/official-packages.rst

@@ -59,14 +59,14 @@ Now let's look at our RT index:
 .. code-block:: mysql
 
    mysql> DESCRIBE testrt;
-   +---------+--------+
-   | Field   | Type   |
-   +---------+--------+
-   | id      | bigint |
-   | title   | field  |
-   | content | field  |
-   | gid     | uint   |
-   +---------+--------+
+   +----------+--------+------------+
+   | Field    | Type   | Properties |
+   +----------+--------+------------+
+   | id       | bigint |            |
+   | title    | field  | stored     |
+   | content  | field  | stored     |
+   | gid      | uint   |            |
+   +----------+--------+------------+
    4 rows in set (0.00 sec)
 
 As the RT indexes start empty, let's add some data into it first   
@@ -97,14 +97,14 @@ Fulltext searches are done with the special clause MATCH, which is the main work
 .. code-block:: mysql
 
    mysql>  SELECT * FROM testrt WHERE MATCH('list of laptops');
-   +------+------+
-   | id   | gid  |
-   +------+------+
-   |    1 |   10 |
-   |    2 |   10 |
-   |    3 |   20 |
-   |    5 |   30 |
-   +------+------+
+   +------+------+-------------------------------------+---------------------------+
+   | id   | gid  | title                               | content                   |
+   +------+------+-------------------------------------+---------------------------+
+   |    1 |   10 | List of HP business laptops         | Elitebook Probook         |
+   |    2 |   10 | List of Dell business laptops       | Latitude Precision Vostro |
+   |    3 |   20 | List of Dell gaming laptops         | Inspirion Alienware       |
+   |    5 |   30 | List of ASUS ultrabooks and laptops | Zenbook Vivobook          |
+   +------+------+-------------------------------------+---------------------------+
    4 rows in set (0.00 sec)
 
 
@@ -116,15 +116,16 @@ Now let's add some filtering and more ordering:
 .. code-block:: mysql
   
    mysql>  SELECT *,WEIGHT() FROM testrt WHERE MATCH('list of laptops') AND gid>10  ORDER BY WEIGHT() DESC,gid DESC;
-   +------+------+----------+
-   | id   | gid  | weight() |
-   +------+------+----------+
-   |    5 |   30 |     2334 |
-   |    3 |   20 |     2334 |
-   +------+------+----------+
+   +------+------+-------------------------------------+---------------------+----------+
+   | id   | gid  | title                               | content             | weight() |
+   +------+------+-------------------------------------+---------------------+----------+
+   |    5 |   30 | List of ASUS ultrabooks and laptops | Zenbook Vivobook    |     2334 |
+   |    3 |   20 | List of Dell gaming laptops         | Inspirion Alienware |     2334 |
+   +------+------+-------------------------------------+---------------------+----------+
    2 rows in set (0.00 sec)
 
 
+
 The WEIGHT() function returns the calculated matching score. If no ordering specified, the result is sorted descending by the score provided by WEIGHT().
 In this example we order first by weight and then by an integer attribute.
 
@@ -133,15 +134,16 @@ The search above does a simple matching, where all words need to be present. But
 .. code-block:: mysql
 
    mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCH('"list of business laptops"/3');
-   +------+------+----------+
-   | id   | gid  | weight() |
-   +------+------+----------+
-   |    1 |   10 |     2397 |
-   |    2 |   10 |     2397 |
-   |    3 |   20 |     2375 |
-   |    5 |   30 |     2375 |
-   +------+------+----------+
+   +------+------+-------------------------------------+---------------------------+----------+
+   | id   | gid  | title                               | content                   | weight() |
+   +------+------+-------------------------------------+---------------------------+----------+
+   |    1 |   10 | List of HP business laptops         | Elitebook Probook         |     2397 |
+   |    2 |   10 | List of Dell business laptops       | Latitude Precision Vostro |     2397 |
+   |    3 |   20 | List of Dell gaming laptops         | Inspirion Alienware       |     2375 |
+   |    5 |   30 | List of ASUS ultrabooks and laptops | Zenbook Vivobook          |     2375 |
+   +------+------+-------------------------------------+---------------------------+----------+
    4 rows in set (0.00 sec)
+
    
    
    mysql> SHOW META;
@@ -249,14 +251,13 @@ In our example group_id and date_added are attributes:
       sql_attr_timestamp      = date_added
 
 
-If we want to also store the texts or enable some features (for example wildcarding), we have to edit the index configuration:
+If we want to also  enable some features (for example wildcarding), we have to edit the index configuration:
 
 .. code-block:: none
 
       index test1
 	  {
 	  ...
-	      stored_fields           = title
           min_infix_len           = 3
 	  ...
 
@@ -290,26 +291,28 @@ Index is created and is ready to be used:
    3 rows in set (0.00 sec)
    
    mysql> SELECT * FROM test1;
-   +------+----------+------------+
-   | id   | group_id | date_added |
-   +------+----------+------------+
-   |    1 |        1 | 1507904567 |
-   |    2 |        1 | 1507904567 |
-   |    3 |        2 | 1507904567 |
-   |    4 |        2 | 1507904567 |
-   +------+----------+------------+
+   +------+----------+------------+-----------------+---------------------------------------------------------------------------+
+   | id   | group_id | date_added | title           | content                                                                   |
+   +------+----------+------------+-----------------+---------------------------------------------------------------------------+
+   |    1 |        1 | 1497982018 | test one        | this is my test document number one. also checking search within phrases. |
+   |    2 |        1 | 1497982018 | test two        | this is my test document number two                                       |
+   |    3 |        2 | 1497982018 | another doc     | this is another group                                                     |
+   |    4 |        2 | 1497982018 | doc number four | this is to test groups                                                    |
+   +------+----------+------------+-----------------+---------------------------------------------------------------------------+
    4 rows in set (0.00 sec)
+
    
 A quick test of a search which should match 2 terms, but not match another one:
 
 .. code-block:: mysql
    
-   mysql> SELECT * FROM test1 WHERE MATCH('test document -one');
-   +------+----------+------------+-------+
-   | id   | group_id | date_added | tag   |
-   +------+----------+------------+-------+
-   |    2 |        1 | 1519040667 | 2,4,6 |
-   +------+----------+------------+-------+
+   mysql>  SELECT * FROM test1 WHERE MATCH('test document -one');
+   +------+----------+------------+----------+-------------------------------------+
+   | id   | group_id | date_added | title    | content                             |
+   +------+----------+------------+----------+-------------------------------------+
+   |    2 |        1 | 1497982018 | test two | this is my test document number two |
+   +------+----------+------------+----------+-------------------------------------+
    1 row in set (0.00 sec)
 
+
    

+ 1 - 1
docs/installation.rst

@@ -430,7 +430,7 @@ Next step is to configure the building with cmake. Available list of configurati
 	* libstemmer_c folder in the source directory
 	* common system path. Please note that in this case, the linking is dynamic and libstemmer should be available system-wide on the installed systems
 	* libstemmer_c.tgz in  ``LIBS_BUNDLE`` folder.
-	* download from snowball project website. This is done by cmake and no additional tool is required
+	* download from snowball project website (https://snowballstem.org/dist/libstemmer_c.tgz). This is done by cmake and no additional tool is required.
 	* NOTE: if you have libstemmer in the system, but still want to use static version, say, to build a binary for a system without such lib, provide ``WITH_STEMMER_FORCE_STATIC=1`` in advance.
 	
 * ``WITH_RE2`` (bool) - specifies if the build should include the RE2 library. The library can be taken from the following locations:

+ 32 - 33
docs/real-time_indexes.rst

@@ -19,16 +19,14 @@ that a) data sources are not required and ignored, and b) you should
 explicitly enumerate all the text fields, not just attributes. Here's an
 example:
 
-Example 4.1. RT index declaration
-                                 
 
 .. code-block:: bash
 
 
     index rt
     {
-        type = rt
-        path = /usr/local/sphinx/data/rt
+        type = testrt
+        path = /var/lib/manticore/data/testrt
         rt_field = title
         rt_field = content
         rt_attr_uint = gid
@@ -48,46 +46,44 @@ is an example session with the sample index above:
 
     Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
 
-    mysql> INSERT INTO rt VALUES ( 1, 'first record', 'test one', 123 );
+    mysql> INSERT INTO testrt VALUES ( 1, 'first record', 'test one', 123 );
     Query OK, 1 row affected (0.05 sec)
 
-    mysql> INSERT INTO rt VALUES ( 2, 'second record', 'test two', 234 );
+    mysql> INSERT INTO testrt VALUES ( 2, 'second record', 'test two', 234 );
     Query OK, 1 row affected (0.00 sec)
 
-    mysql> SELECT * FROM rt;
-    +------+--------+------+
-    | id   | weight | gid  |
-    +------+--------+------+
-    |    1 |      1 |  123 |
-    |    2 |      1 |  234 |
-    +------+--------+------+
-    2 rows in set (0.02 sec)
-
-    mysql> SELECT * FROM rt WHERE MATCH('test');
-    +------+--------+------+
-    | id   | weight | gid  |
-    +------+--------+------+
-    |    1 |   1643 |  123 |
-    |    2 |   1643 |  234 |
-    +------+--------+------+
-    2 rows in set (0.01 sec)
-
-    mysql> SELECT * FROM rt WHERE MATCH('@title test');
-    Empty set (0.00 sec)
+    mysql> SELECT * FROM testrt;
+    +------+------+---------------+----------+
+    | id   | gid  | title         | content  |
+    +------+------+---------------+----------+
+    |    1 |  123 | first record  | test one |
+    |    2 |  234 | second record | test two |
+    +------+------+---------------+----------+
+    2 rows in set (0.00 sec)
+
+
+    mysql> SELECT * FROM testrt WHERE MATCH('first one');
+    +------+------+--------------+----------+
+    | id   | gid  | title        | content  |
+    +------+------+--------------+----------+
+    |    1 |  123 | first record | test one |
+    +------+------+--------------+----------+
+    1 row in set (0.00 sec)
 
 Both partial and batch INSERT syntaxes are supported, ie. you can
 specify a subset of columns, and insert several rows at a time.
 Deletions are also possible using DELETE statement; the only currently
 supported syntax is DELETE FROM <index> WHERE id=<id>. REPLACE is also
 supported, enabling you to implement updates.
+Autoincrement values on the ID primary attribute is supported.
 
 .. code-block:: mysql
 
 
-    mysql> INSERT INTO rt ( id, title ) VALUES ( 3, 'third row' ), ( 4, 'fourth entry' );
+    mysql> INSERT INTO testrt ( id, title ) VALUES ( 3, 'third row' ), ( 4, 'fourth entry' );
     Query OK, 2 rows affected (0.01 sec)
 
-    mysql> SELECT * FROM rt;
+    mysql> SELECT * FROM testrt;
     +------+--------+------+
     | id   | weight | gid  |
     +------+--------+------+
@@ -98,10 +94,10 @@ supported, enabling you to implement updates.
     +------+--------+------+
     4 rows in set (0.00 sec)
 
-    mysql> DELETE FROM rt WHERE id=2;
+    mysql> DELETE FROM testrt WHERE id=2;
     Query OK, 0 rows affected (0.00 sec)
 
-    mysql> SELECT * FROM rt WHERE MATCH('test');
+    mysql> SELECT * FROM testrt WHERE MATCH('test');
     +------+--------+------+
     | id   | weight | gid  |
     +------+--------+------+
@@ -109,13 +105,13 @@ supported, enabling you to implement updates.
     +------+--------+------+
     1 row in set (0.00 sec)
 
-    mysql> INSERT INTO rt VALUES ( 1, 'first record on steroids', 'test one', 123 );
+    mysql> INSERT INTO testrt (title,content,gid) VALUES ('a new record','test three',100);
     ERROR 1064 (42000): duplicate id '1'
 
-    mysql> REPLACE INTO rt VALUES ( 1, 'first record on steroids', 'test one', 123 );
+    mysql> REPLACE INTO testrt VALUES ( 1, 'first record changed', 'test one', 123 );
     Query OK, 1 row affected (0.01 sec)
 
-    mysql> SELECT * FROM rt WHERE MATCH('steroids');
+    mysql> SELECT * FROM testrt WHERE MATCH('steroids');
     +------+--------+------+
     | id   | weight | gid  |
     +------+--------+------+
@@ -155,6 +151,9 @@ known usage quirks. Those quirks are listed in this section.
 -  Multiple INSERTs grouped in a single transaction perform better than
    equivalent single-row transactions and are recommended for batch
    loading of data.
+   
+- Autoincrement ID values don't start from zero, values are generated using an algorithm that makes
+  sure in case of Real-Time indexes replicated in a cluster don't generate same IDs
 
    
 RT index internals

+ 2 - 0
manticore-min.conf.in

@@ -24,6 +24,7 @@ source src1
 index test1
 {
 	source			= src1
+	stored_fields = title,content
 	path			= @CONFDIR@/data/test1
 }
 
@@ -37,6 +38,7 @@ index testrt
 
 	rt_field		= title
 	rt_field		= content
+	stored_fields = title,content
 	rt_attr_uint		= gid
 }
 

+ 6 - 2
manticore.conf.in

@@ -349,7 +349,9 @@ index test1
 	# optional, default is 'keywords'
 	dict			= keywords
 
-
+	# store content of fields
+	stored_fields = title,content
+	
 	# a list of morphology preprocessors to apply
 	# optional, default is empty
 	#
@@ -744,7 +746,9 @@ index testrt
 	# multi-value, mandatory
 	rt_field		= title
 	rt_field		= content
-
+	
+	# store content of fields
+	stored_fields = title,content
 	# unsigned integer attribute declaration
 	# multi-value (an arbitrary number of attributes is allowed), optional
 	# declares an unsigned 32-bit attribute