Collations.md.json 29 KB

1234567891011121314151617
  1. {
  2. "d518a6683de7295364461052f2fa57fd26166f489984a680cad15f4331de1219": {
  3. "original": "# Collations\n\nCollations primarily impact string attribute comparisons. They define both the character set encoding and the strategy Manticore employs for comparing strings when performing `ORDER BY` or `GROUP BY` with a string attribute involved.\n\nString attributes are stored as-is during indexing, and no character set or language information is attached to them. This is fine as long as Manticore only needs to store and return the strings to the calling application verbatim. However, when you ask Manticore to sort by a string value, the request immediately becomes ambiguous.\n\nFirst, single-byte (ASCII, ISO-8859-1, or Windows-1251) strings need to be processed differently than UTF-8 strings, which may encode each character with a variable number of bytes. Thus, we need to know the character set type to properly interpret the raw bytes as meaningful characters.\n\nSecond, we also need to know the language-specific string sorting rules. For example, when sorting according to US rules in the en_US locale, the accented character `\u00ef` (small letter `i` with diaeresis) should be placed somewhere after `z`. However, when sorting with French rules and the fr_FR locale in mind, it should be placed between `i` and `j`. Some other set of rules might choose to ignore accents altogether, allowing `\u00ef` and `i` to be mixed arbitrarily.\n\nThird, in some cases, we may require case-sensitive sorting, while in others, case-insensitive sorting is needed.\n\nCollations encapsulate all of the following: the character set, the language rules, and the case sensitivity. Manticore currently provides four collations:\n\n1. `libc_ci`\n2. `libc_cs`\n3. `utf8_general_ci`\n4. `binary`\n\nThe first two collations rely on several standard C library (libc) calls and can thus support any locale installed on your system. They provide case-insensitive (`_ci`) and case-sensitive (`_cs`) comparisons, respectively. By default, they use the C locale, effectively resorting to bytewise comparisons. To change that, you need to specify a different available locale using the [collation_libc_locale](../Server_settings/Searchd.md#collation_libc_locale) directive. The list of locales available on your system can usually be obtained with the `locale` command:\n\nCODE_BLOCK_0\n\nThe specific list of system locales may vary. Consult your OS documentation to install additional needed locales.\n\n`utf8_general_ci` and `binary` locales are built-in into Manticore. The first one is a generic collation for UTF-8 data (without any so-called language tailoring); it should behave similarly to the `utf8_general_ci` collation in MySQL. The second one is a simple bytewise comparison.\n\nCollation can be overridden via SQL on a per-session basis using the `SET collation_connection` statement. All subsequent SQL queries will use this collation. Otherwise, all queries will use the server default collation or as specified in the [collation_server](../Server_settings/Searchd.md#collation_server) configuration directive. Manticore currently defaults to the `libc_ci` collation.\n\nCollations affect all string attribute comparisons, including those within `ORDER BY` and `GROUP BY`, so differently ordered or grouped results can be returned depending on the collation chosen. Note that collations don't affect full-text searching; for that, use the [charset_table](../Creating_a_table/NLP_and_tokenization/Low-level_tokenization.md#charset_table).\n\n<!-- proofread -->\n\n",
  4. "translations": {
  5. "chinese": "# \u6392\u5e8f\u89c4\u5219\n\n\u6392\u5e8f\u89c4\u5219\u4e3b\u8981\u5f71\u54cd\u5b57\u7b26\u4e32\u5c5e\u6027\u7684\u6bd4\u8f83\u3002\u5b83\u4eec\u5b9a\u4e49\u4e86\u5b57\u7b26\u96c6\u7f16\u7801\u4ee5\u53ca Manticore \u5728\u6267\u884c\u6d89\u53ca\u5b57\u7b26\u4e32\u5c5e\u6027\u7684 `ORDER BY` \u6216 `GROUP BY` \u65f6\u7528\u4e8e\u6bd4\u8f83\u5b57\u7b26\u4e32\u7684\u7b56\u7565\u3002\n\n\u5b57\u7b26\u4e32\u5c5e\u6027\u5728\u7d22\u5f15\u8fc7\u7a0b\u4e2d\u6309\u539f\u6837\u5b58\u50a8\uff0c\u5e76\u4e14\u4e0d\u9644\u5e26\u5b57\u7b26\u96c6\u6216\u8bed\u8a00\u4fe1\u606f\u3002\u53ea\u8981 Manticore \u4ec5\u9700\u5c06\u5b57\u7b26\u4e32\u9010\u5b57\u5b58\u50a8\u548c\u8fd4\u56de\u7ed9\u8c03\u7528\u5e94\u7528\u7a0b\u5e8f\uff0c\u8fd9\u79cd\u65b9\u5f0f\u662f\u53ef\u4ee5\u7684\u3002\u7136\u800c\uff0c\u5f53\u60a8\u8981\u6c42 Manticore \u6309\u5b57\u7b26\u4e32\u503c\u6392\u5e8f\u65f6\uff0c\u8bf7\u6c42\u7acb\u5373\u53d8\u5f97\u6a21\u7cca\u3002\n\n\u9996\u5148\uff0c\u5355\u5b57\u8282\uff08ASCII\u3001ISO-8859-1 \u6216 Windows-1251\uff09\u5b57\u7b26\u4e32\u9700\u8981\u4e0e\u53ef\u80fd\u7528\u53ef\u53d8\u5b57\u8282\u6570\u7f16\u7801\u6bcf\u4e2a\u5b57\u7b26\u7684 UTF-8 \u5b57\u7b26\u4e32\u4e0d\u540c\u5730\u5904\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u9700\u8981\u77e5\u9053\u5b57\u7b26\u96c6\u7c7b\u578b\uff0c\u4ee5\u4fbf\u6b63\u786e\u5730\u5c06\u539f\u59cb\u5b57\u8282\u89e3\u91ca\u4e3a\u6709\u610f\u4e49\u7684\u5b57\u7b26\u3002\n\n\u5176\u6b21\uff0c\u6211\u4eec\u8fd8\u9700\u8981\u4e86\u89e3\u8bed\u8a00\u7279\u5b9a\u7684\u5b57\u7b26\u4e32\u6392\u5e8f\u89c4\u5219\u3002\u4f8b\u5982\uff0c\u5728 en_US \u5730\u533a\u8bbe\u7f6e\u7684\u7f8e\u56fd\u89c4\u5219\u4e0b\u6392\u5e8f\u65f6\uff0c\u5e26\u53d8\u97f3\u7b26\u53f7\u7684\u5b57\u7b26 `\u00ef`\uff08\u5e26\u53d8\u97f3\u7b26\u53f7\u7684\u5c0f\u5199\u5b57\u6bcd `i`\uff09\u5e94\u653e\u5728 `z` \u4e4b\u540e\u7684\u67d0\u4e2a\u4f4d\u7f6e\u3002\u4f46\u5728\u8003\u8651\u6cd5\u8bed\u89c4\u5219\u548c fr_FR \u5730\u533a\u8bbe\u7f6e\u65f6\uff0c\u5b83\u5e94\u653e\u5728 `i` \u548c `j` \u4e4b\u95f4\u3002\u5176\u4ed6\u89c4\u5219\u53ef\u80fd\u5b8c\u5168\u5ffd\u7565\u91cd\u97f3\u7b26\u53f7\uff0c\u4f7f\u5f97 `\u00ef` \u548c `i` \u53ef\u4ee5\u4efb\u610f\u6df7\u5408\u3002\n\n\u7b2c\u4e09\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u53ef\u80fd\u9700\u8981\u533a\u5206\u5927\u5c0f\u5199\u7684\u6392\u5e8f\uff0c\u800c\u5728\u5176\u4ed6\u60c5\u51b5\u4e0b\uff0c\u5219\u9700\u8981\u4e0d\u533a\u5206\u5927\u5c0f\u5199\u7684\u6392\u5e8f\u3002\n\n\u6392\u5e8f\u89c4\u5219\u5c01\u88c5\u4e86\u4ee5\u4e0b\u6240\u6709\u5185\u5bb9\uff1a\u5b57\u7b26\u96c6\u3001\u8bed\u8a00\u89c4\u5219\u4ee5\u53ca\u5927\u5c0f\u5199\u654f\u611f\u6027\u3002Manticore \u5f53\u524d\u63d0\u4f9b\u56db\u79cd\u6392\u5e8f\u89c4\u5219\uff1a\n\n1. `libc_ci`\n2. `libc_cs`\n3. `utf8_general_ci`\n4. `binary`\n\n\u524d\u4e24\u79cd\u6392\u5e8f\u89c4\u5219\u4f9d\u8d56\u4e8e\u51e0\u4e2a\u6807\u51c6 C \u5e93\uff08libc\uff09\u8c03\u7528\uff0c\u56e0\u6b64\u53ef\u4ee5\u652f\u6301\u7cfb\u7edf\u4e0a\u5b89\u88c5\u7684\u4efb\u4f55\u5730\u533a\u8bbe\u7f6e\u3002\u5b83\u4eec\u5206\u522b\u63d0\u4f9b\u4e0d\u533a\u5206\u5927\u5c0f\u5199\uff08`_ci`\uff09\u548c\u533a\u5206\u5927\u5c0f\u5199\uff08`_cs`\uff09\u7684\u6bd4\u8f83\u3002\u9ed8\u8ba4\u60c5\u51b5\u4e0b\uff0c\u5b83\u4eec\u4f7f\u7528 C \u5730\u533a\u8bbe\u7f6e\uff0c\u5b9e\u9645\u4e0a\u9000\u5316\u4e3a\u9010\u5b57\u8282\u6bd4\u8f83\u3002\u82e5\u8981\u66f4\u6539\u6b64\u8bbe\u7f6e\uff0c\u9700\u8981\u4f7f\u7528 [collation_libc_locale](../Server_settings/Searchd.md#collation_libc_locale) \u6307\u4ee4\u6307\u5b9a\u5176\u4ed6\u53ef\u7528\u5730\u533a\u8bbe\u7f6e\u3002\u7cfb\u7edf\u4e0a\u53ef\u7528\u7684\u5730\u533a\u8bbe\u7f6e\u5217\u8868\u901a\u5e38\u53ef\u901a\u8fc7 `locale` \u547d\u4ee4\u83b7\u5f97\uff1a\n\nCODE_BLOCK_0\n\n\u7cfb\u7edf\u5730\u533a\u8bbe\u7f6e\u7684\u5177\u4f53\u5217\u8868\u53ef\u80fd\u6709\u6240\u4e0d\u540c\u3002\u8bf7\u67e5\u9605\u64cd\u4f5c\u7cfb\u7edf\u6587\u6863\u4ee5\u5b89\u88c5\u6240\u9700\u7684\u989d\u5916\u5730\u533a\u8bbe\u7f6e\u3002\n\n`utf8_general_ci` \u548c `binary` \u5730\u533a\u8bbe\u7f6e\u5185\u7f6e\u4e8e Manticore \u4e2d\u3002\u7b2c\u4e00\u79cd\u662f UTF-8 \u6570\u636e\u7684\u901a\u7528\u6392\u5e8f\u89c4\u5219\uff08\u65e0\u6240\u8c13\u7684\u8bed\u8a00\u5b9a\u5236\uff09\uff0c\u5176\u884c\u4e3a\u5e94\u7c7b\u4f3c\u4e8e MySQL \u4e2d\u7684 `utf8_general_ci` \u6392\u5e8f\u89c4\u5219\u3002\u7b2c\u4e8c\u79cd\u5219\u662f\u7b80\u5355\u7684\u9010\u5b57\u8282\u6bd4\u8f83\u3002\n\n\u6392\u5e8f\u89c4\u5219\u53ef\u4ee5\u901a\u8fc7 SQL \u8bed\u53e5 `SET collation_connection` \u6309\u4f1a\u8bdd\u8986\u76d6\u3002\u6240\u6709\u540e\u7eed SQL \u67e5\u8be2\u5c06\u4f7f\u7528\u8be5\u6392\u5e8f\u89c4\u5219\u3002\u5426\u5219\uff0c\u6240\u6709\u67e5\u8be2\u5c06\u4f7f\u7528\u670d\u52a1\u5668\u9ed8\u8ba4\u6392\u5e8f\u89c4\u5219\u6216 [collation_server](../Server_settings/Searchd.md#collation_server) \u914d\u7f6e\u6307\u4ee4\u4e2d\u6307\u5b9a\u7684\u6392\u5e8f\u89c4\u5219\u3002Manticore \u5f53\u524d\u9ed8\u8ba4\u4f7f\u7528 `libc_ci` \u6392\u5e8f\u89c4\u5219\u3002\n\n\u6392\u5e8f\u89c4\u5219\u5f71\u54cd\u6240\u6709\u5b57\u7b26\u4e32\u5c5e\u6027\u7684\u6bd4\u8f83\uff0c\u5305\u62ec `ORDER BY` \u548c `GROUP BY` \u4e2d\u7684\u6bd4\u8f83\uff0c\u56e0\u6b64\u6839\u636e\u9009\u62e9\u7684\u6392\u5e8f\u89c4\u5219\uff0c\u8fd4\u56de\u7684\u7ed3\u679c\u987a\u5e8f\u6216\u5206\u7ec4\u53ef\u80fd\u4e0d\u540c\u3002\u8bf7\u6ce8\u610f\uff0c\u6392\u5e8f\u89c4\u5219\u4e0d\u5f71\u54cd\u5168\u6587\u641c\u7d22\uff1b\u5bf9\u6b64\uff0c\u8bf7\u4f7f\u7528 [charset_table](../Creating_a_table/NLP_and_tokenization/Low-level_tokenization.md#charset_table)\u3002\n\n<!-- proofread -->",
  6. "russian": "# \u0421\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438\n\n\u0421\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438 \u0432 \u043f\u0435\u0440\u0432\u0443\u044e \u043e\u0447\u0435\u0440\u0435\u0434\u044c \u0432\u043b\u0438\u044f\u044e\u0442 \u043d\u0430 \u0441\u0440\u0430\u0432\u043d\u0435\u043d\u0438\u044f \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u044b\u0445 \u0430\u0442\u0440\u0438\u0431\u0443\u0442\u043e\u0432. \u041e\u043d\u0438 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u044e\u0442 \u043a\u0430\u043a \u043a\u043e\u0434\u0438\u0440\u043e\u0432\u043a\u0443 \u043d\u0430\u0431\u043e\u0440\u0430 \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432, \u0442\u0430\u043a \u0438 \u0441\u0442\u0440\u0430\u0442\u0435\u0433\u0438\u044e, \u043a\u043e\u0442\u043e\u0440\u0443\u044e Manticore \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442 \u0434\u043b\u044f \u0441\u0440\u0430\u0432\u043d\u0435\u043d\u0438\u044f \u0441\u0442\u0440\u043e\u043a \u043f\u0440\u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u0438 `ORDER BY` \u0438\u043b\u0438 `GROUP BY` \u0441 \u0443\u0447\u0430\u0441\u0442\u0438\u0435\u043c \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0433\u043e \u0430\u0442\u0440\u0438\u0431\u0443\u0442\u0430.\n\n\u0421\u0442\u0440\u043e\u043a\u043e\u0432\u044b\u0435 \u0430\u0442\u0440\u0438\u0431\u0443\u0442\u044b \u0445\u0440\u0430\u043d\u044f\u0442\u0441\u044f \u0432 \u043d\u0435\u0438\u0437\u043c\u0435\u043d\u043d\u043e\u043c \u0432\u0438\u0434\u0435 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u043d\u0434\u0435\u043a\u0441\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f, \u0438 \u043a \u043d\u0438\u043c \u043d\u0435 \u043f\u0440\u0438\u043a\u0440\u0435\u043f\u043b\u044f\u0435\u0442\u0441\u044f \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u044f \u043e \u043d\u0430\u0431\u043e\u0440\u0435 \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432 \u0438\u043b\u0438 \u044f\u0437\u044b\u043a\u0435. \u042d\u0442\u043e \u043f\u0440\u0438\u0435\u043c\u043b\u0435\u043c\u043e, \u043f\u043e\u043a\u0430 Manticore \u043d\u0443\u0436\u043d\u043e \u043b\u0438\u0448\u044c \u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0438 \u0432\u043e\u0437\u0432\u0440\u0430\u0449\u0430\u0442\u044c \u0441\u0442\u0440\u043e\u043a\u0438 \u0432\u044b\u0437\u044b\u0432\u0430\u044e\u0449\u0435\u043c\u0443 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044e \u0434\u043e\u0441\u043b\u043e\u0432\u043d\u043e. \u041e\u0434\u043d\u0430\u043a\u043e, \u043a\u043e\u0433\u0434\u0430 \u0432\u044b \u043f\u0440\u043e\u0441\u0438\u0442\u0435 Manticore \u043e\u0442\u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u043f\u043e \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u043c\u0443 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044e, \u0437\u0430\u043f\u0440\u043e\u0441 \u0441\u0440\u0430\u0437\u0443 \u0436\u0435 \u0441\u0442\u0430\u043d\u043e\u0432\u0438\u0442\u0441\u044f \u043d\u0435\u043e\u0434\u043d\u043e\u0437\u043d\u0430\u0447\u043d\u044b\u043c.\n\n\u0412\u043e-\u043f\u0435\u0440\u0432\u044b\u0445, \u0441\u0442\u0440\u043e\u043a\u0438 \u0441 \u043e\u0434\u043d\u0438\u043c \u0431\u0430\u0439\u0442\u043e\u043c (ASCII, ISO-8859-1 \u0438\u043b\u0438 Windows-1251) \u0442\u0440\u0435\u0431\u0443\u044e\u0442 \u0438\u043d\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438, \u0447\u0435\u043c \u0441\u0442\u0440\u043e\u043a\u0438 UTF-8, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043c\u043e\u0433\u0443\u0442 \u043a\u043e\u0434\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u043a\u0430\u0436\u0434\u044b\u0439 \u0441\u0438\u043c\u0432\u043e\u043b \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u043c \u0447\u0438\u0441\u043b\u043e\u043c \u0431\u0430\u0439\u0442. \u041f\u043e\u044d\u0442\u043e\u043c\u0443 \u043d\u0430\u043c \u043d\u0443\u0436\u043d\u043e \u0437\u043d\u0430\u0442\u044c \u0442\u0438\u043f \u043d\u0430\u0431\u043e\u0440\u0430 \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0430\u0432\u0438\u043b\u044c\u043d\u043e \u0438\u043d\u0442\u0435\u0440\u043f\u0440\u0435\u0442\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0441\u044b\u0440\u044b\u0435 \u0431\u0430\u0439\u0442\u044b \u043a\u0430\u043a \u043e\u0441\u043c\u044b\u0441\u043b\u0435\u043d\u043d\u044b\u0435 \u0441\u0438\u043c\u0432\u043e\u043b\u044b.\n\n\u0412\u043e-\u0432\u0442\u043e\u0440\u044b\u0445, \u043d\u0430\u043c \u0442\u0430\u043a\u0436\u0435 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0437\u043d\u0430\u0442\u044c \u043f\u0440\u0430\u0432\u0438\u043b\u0430 \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438, \u0441\u043f\u0435\u0446\u0438\u0444\u0438\u0447\u043d\u044b\u0435 \u0434\u043b\u044f \u044f\u0437\u044b\u043a\u0430. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u043f\u0440\u0438 \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0435 \u043f\u043e \u043f\u0440\u0430\u0432\u0438\u043b\u0430\u043c \u0421\u0428\u0410 \u0432 \u043b\u043e\u043a\u0430\u043b\u0438 en_US, \u0430\u043a\u0446\u0435\u043d\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0439 \u0441\u0438\u043c\u0432\u043e\u043b `\u00ef` (\u043c\u0430\u043b\u0430\u044f \u0431\u0443\u043a\u0432\u0430 `i` \u0441 \u0434\u0438\u0430\u0440\u0435\u0437\u043e\u0439) \u0434\u043e\u043b\u0436\u0435\u043d \u0440\u0430\u0437\u043c\u0435\u0449\u0430\u0442\u044c\u0441\u044f \u0433\u0434\u0435-\u0442\u043e \u043f\u043e\u0441\u043b\u0435 `z`. \u041e\u0434\u043d\u0430\u043a\u043e \u043f\u0440\u0438 \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0435 \u0441 \u0443\u0447\u0451\u0442\u043e\u043c \u0444\u0440\u0430\u043d\u0446\u0443\u0437\u0441\u043a\u0438\u0445 \u043f\u0440\u0430\u0432\u0438\u043b \u0438 \u043b\u043e\u043a\u0430\u043b\u0438 fr_FR, \u0435\u0433\u043e \u0441\u043b\u0435\u0434\u0443\u0435\u0442 \u043f\u043e\u043c\u0435\u0441\u0442\u0438\u0442\u044c \u043c\u0435\u0436\u0434\u0443 `i` \u0438 `j`. \u0414\u0440\u0443\u0433\u043e\u0439 \u043d\u0430\u0431\u043e\u0440 \u043f\u0440\u0430\u0432\u0438\u043b \u043c\u043e\u0436\u0435\u0442 \u0432\u043e\u0432\u0441\u0435 \u0438\u0433\u043d\u043e\u0440\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0430\u043a\u0446\u0435\u043d\u0442\u044b, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044f `\u00ef` \u0438 `i` \u0441\u043c\u0435\u0448\u0438\u0432\u0430\u0442\u044c\u0441\u044f \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u043e.\n\n\u0412-\u0442\u0440\u0435\u0442\u044c\u0438\u0445, \u0432 \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u0441\u043b\u0443\u0447\u0430\u044f\u0445 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f \u0447\u0443\u0432\u0441\u0442\u0432\u0438\u0442\u0435\u043b\u044c\u043d\u0430\u044f \u043a \u0440\u0435\u0433\u0438\u0441\u0442\u0440\u0443 \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0430, \u0430 \u0432 \u0434\u0440\u0443\u0433\u0438\u0445 \u2014 \u0440\u0435\u0433\u0438\u0441\u0442\u0440\u043e\u043d\u0435\u0437\u0430\u0432\u0438\u0441\u0438\u043c\u0430\u044f.\n\n\u0421\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438 \u0438\u043d\u043a\u0430\u043f\u0441\u0443\u043b\u0438\u0440\u0443\u044e\u0442 \u0432\u0441\u0451 \u043f\u0435\u0440\u0435\u0447\u0438\u0441\u043b\u0435\u043d\u043d\u043e\u0435: \u043d\u0430\u0431\u043e\u0440 \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432, \u044f\u0437\u044b\u043a\u043e\u0432\u044b\u0435 \u043f\u0440\u0430\u0432\u0438\u043b\u0430 \u0438 \u0447\u0443\u0432\u0441\u0442\u0432\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u044c \u043a \u0440\u0435\u0433\u0438\u0441\u0442\u0440\u0443. \u0412 \u043d\u0430\u0441\u0442\u043e\u044f\u0449\u0435\u0435 \u0432\u0440\u0435\u043c\u044f Manticore \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0447\u0435\u0442\u044b\u0440\u0435 \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438:\n\n1. `libc_ci`\n2. `libc_cs`\n3. `utf8_general_ci`\n4. `binary`\n\n\u041f\u0435\u0440\u0432\u044b\u0435 \u0434\u0432\u0435 \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438 \u043e\u043f\u0438\u0440\u0430\u044e\u0442\u0441\u044f \u043d\u0430 \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u043d\u044b\u0445 \u0432\u044b\u0437\u043e\u0432\u043e\u0432 \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0438 C (libc) \u0438, \u0442\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u043c\u043e\u0433\u0443\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0442\u044c \u043b\u044e\u0431\u0443\u044e \u043b\u043e\u043a\u0430\u043b\u044c, \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u043d\u0443\u044e \u0432 \u0432\u0430\u0448\u0435\u0439 \u0441\u0438\u0441\u0442\u0435\u043c\u0435. \u041e\u043d\u0438 \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u044e\u0442 \u0440\u0435\u0433\u0438\u0441\u0442\u0440\u043e\u043d\u0435\u0437\u0430\u0432\u0438\u0441\u0438\u043c\u044b\u0435 (`_ci`) \u0438 \u0440\u0435\u0433\u0438\u0441\u0442\u0440\u043e\u0437\u0430\u0432\u0438\u0441\u0438\u043c\u044b\u0435 (`_cs`) \u0441\u0440\u0430\u0432\u043d\u0435\u043d\u0438\u044f \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e. \u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u043b\u043e\u043a\u0430\u043b\u044c C, \u0447\u0442\u043e \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438 \u0441\u0432\u043e\u0434\u0438\u0442 \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0443 \u043a \u043f\u043e\u0431\u0430\u0439\u0442\u043e\u0432\u043e\u043c\u0443 \u0441\u0440\u0430\u0432\u043d\u0435\u043d\u0438\u044e. \u0427\u0442\u043e\u0431\u044b \u0438\u0437\u043c\u0435\u043d\u0438\u0442\u044c \u044d\u0442\u043e, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0434\u0440\u0443\u0433\u0443\u044e \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u0443\u044e \u043b\u043e\u043a\u0430\u043b\u044c \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0434\u0438\u0440\u0435\u043a\u0442\u0438\u0432\u044b [collation_libc_locale](../Server_settings/Searchd.md#collation_libc_locale). \u0421\u043f\u0438\u0441\u043e\u043a \u043b\u043e\u043a\u0430\u043b\u0435\u0439, \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u044b\u0445 \u0432 \u0432\u0430\u0448\u0435\u0439 \u0441\u0438\u0441\u0442\u0435\u043c\u0435, \u043e\u0431\u044b\u0447\u043d\u043e \u043c\u043e\u0436\u043d\u043e \u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u043a\u043e\u043c\u0430\u043d\u0434\u044b `locale`:\n\nCODE_BLOCK_0\n\n\u041a\u043e\u043d\u043a\u0440\u0435\u0442\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u0441\u0438\u0441\u0442\u0435\u043c\u043d\u044b\u0445 \u043b\u043e\u043a\u0430\u043b\u0435\u0439 \u043c\u043e\u0436\u0435\u0442 \u0432\u0430\u0440\u044c\u0438\u0440\u043e\u0432\u0430\u0442\u044c\u0441\u044f. \u041e\u0431\u0440\u0430\u0442\u0438\u0442\u0435\u0441\u044c \u043a \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u0438 \u0432\u0430\u0448\u0435\u0439 \u041e\u0421 \u0434\u043b\u044f \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0438 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043d\u0443\u0436\u043d\u044b\u0445 \u043b\u043e\u043a\u0430\u043b\u0435\u0439.\n\n\u041b\u043e\u043a\u0430\u043b\u0438 `utf8_general_ci` \u0438 `binary` \u0432\u0441\u0442\u0440\u043e\u0435\u043d\u044b \u0432 Manticore. \u041f\u0435\u0440\u0432\u0430\u044f \u2014 \u044d\u0442\u043e \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0430\u043b\u044c\u043d\u0430\u044f \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0430 \u0434\u043b\u044f UTF-8 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0431\u0435\u0437 \u0442\u0430\u043a \u043d\u0430\u0437\u044b\u0432\u0430\u0435\u043c\u043e\u0439 \u044f\u0437\u044b\u043a\u043e\u0432\u043e\u0439 \u0430\u0434\u0430\u043f\u0442\u0430\u0446\u0438\u0438); \u043e\u043d\u0430 \u0434\u043e\u043b\u0436\u043d\u0430 \u0432\u0435\u0441\u0442\u0438 \u0441\u0435\u0431\u044f \u0430\u043d\u0430\u043b\u043e\u0433\u0438\u0447\u043d\u043e \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0435 `utf8_general_ci` \u0432 MySQL. \u0412\u0442\u043e\u0440\u0430\u044f \u2014 \u043f\u0440\u043e\u0441\u0442\u043e\u0435 \u043f\u043e\u0431\u0430\u0439\u0442\u043e\u0432\u043e\u0435 \u0441\u0440\u0430\u0432\u043d\u0435\u043d\u0438\u0435.\n\n\u0421\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0430 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u043f\u0435\u0440\u0435\u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430 \u0447\u0435\u0440\u0435\u0437 SQL \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u0441\u0435\u0441\u0441\u0438\u0438 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u043e\u043f\u0435\u0440\u0430\u0442\u043e\u0440\u0430 `SET collation_connection`. \u0412\u0441\u0435 \u043f\u043e\u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 SQL-\u0437\u0430\u043f\u0440\u043e\u0441\u044b \u0431\u0443\u0434\u0443\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u044d\u0442\u0443 \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0443. \u0412 \u043f\u0440\u043e\u0442\u0438\u0432\u043d\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u0432\u0441\u0435 \u0437\u0430\u043f\u0440\u043e\u0441\u044b \u0431\u0443\u0434\u0443\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0443, \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u043d\u0443\u044e \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u043d\u0430 \u0441\u0435\u0440\u0432\u0435\u0440\u0435, \u0438\u043b\u0438 \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u0443\u044e \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u043e\u0439 \u0434\u0438\u0440\u0435\u043a\u0442\u0438\u0432\u0435 [collation_server](../Server_settings/Searchd.md#collation_server). \u0412 \u043d\u0430\u0441\u0442\u043e\u044f\u0449\u0435\u0435 \u0432\u0440\u0435\u043c\u044f \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0432 Manticore \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0430 `libc_ci`.\n\n\u0421\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438 \u0432\u043b\u0438\u044f\u044e\u0442 \u043d\u0430 \u0432\u0441\u0435 \u0441\u0440\u0430\u0432\u043d\u0435\u043d\u0438\u044f \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u044b\u0445 \u0430\u0442\u0440\u0438\u0431\u0443\u0442\u043e\u0432, \u0432\u043a\u043b\u044e\u0447\u0430\u044f \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438 \u0432 `ORDER BY` \u0438 `GROUP BY`, \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u043c\u043e\u0436\u043d\u043e \u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b, \u043e\u0442\u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u0438\u043b\u0438 \u0441\u0433\u0440\u0443\u043f\u043f\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u043f\u043e-\u0440\u0430\u0437\u043d\u043e\u043c\u0443 \u0432 \u0437\u0430\u0432\u0438\u0441\u0438\u043c\u043e\u0441\u0442\u0438 \u043e\u0442 \u0432\u044b\u0431\u0440\u0430\u043d\u043d\u043e\u0439 \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438. \u041e\u0431\u0440\u0430\u0442\u0438\u0442\u0435 \u0432\u043d\u0438\u043c\u0430\u043d\u0438\u0435, \u0447\u0442\u043e \u0441\u043e\u0440\u0442\u0438\u0440\u043e\u0432\u043a\u0438 \u043d\u0435 \u0432\u043b\u0438\u044f\u044e\u0442 \u043d\u0430 \u043f\u043e\u043b\u043d\u043e\u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0439 \u043f\u043e\u0438\u0441\u043a; \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0439\u0442\u0435 [charset_table](../Creating_a_table/NLP_and_tokenization/Low-level_tokenization.md#charset_table).\n\n<!-- proofread -->\n\n"
  7. },
  8. "is_code_or_comment": false,
  9. "model": "openai:gpt-4.1-mini",
  10. "updated_at": 1766339791
  11. },
  12. "__meta": {
  13. "source_text": "# Collations\n\nCollations primarily impact string attribute comparisons. They define both the character set encoding and the strategy Manticore employs for comparing strings when performing `ORDER BY` or `GROUP BY` with a string attribute involved.\n\nString attributes are stored as-is during indexing, and no character set or language information is attached to them. This is fine as long as Manticore only needs to store and return the strings to the calling application verbatim. However, when you ask Manticore to sort by a string value, the request immediately becomes ambiguous.\n\nFirst, single-byte (ASCII, ISO-8859-1, or Windows-1251) strings need to be processed differently than UTF-8 strings, which may encode each character with a variable number of bytes. Thus, we need to know the character set type to properly interpret the raw bytes as meaningful characters.\n\nSecond, we also need to know the language-specific string sorting rules. For example, when sorting according to US rules in the en_US locale, the accented character `\u00ef` (small letter `i` with diaeresis) should be placed somewhere after `z`. However, when sorting with French rules and the fr_FR locale in mind, it should be placed between `i` and `j`. Some other set of rules might choose to ignore accents altogether, allowing `\u00ef` and `i` to be mixed arbitrarily.\n\nThird, in some cases, we may require case-sensitive sorting, while in others, case-insensitive sorting is needed.\n\nCollations encapsulate all of the following: the character set, the language rules, and the case sensitivity. Manticore currently provides four collations:\n\n1. `libc_ci`\n2. `libc_cs`\n3. `utf8_general_ci`\n4. `binary`\n\nThe first two collations rely on several standard C library (libc) calls and can thus support any locale installed on your system. They provide case-insensitive (`_ci`) and case-sensitive (`_cs`) comparisons, respectively. By default, they use the C locale, effectively resorting to bytewise comparisons. To change that, you need to specify a different available locale using the [collation_libc_locale](../Server_settings/Searchd.md#collation_libc_locale) directive. The list of locales available on your system can usually be obtained with the `locale` command:\n\n```bash\n$ locale -a\nC\nen_AG\nen_AU.utf8\nen_BW.utf8\nen_CA.utf8\nen_DK.utf8\nen_GB.utf8\nen_HK.utf8\nen_IE.utf8\nen_IN\nen_NG\nen_NZ.utf8\nen_PH.utf8\nen_SG.utf8\nen_US.utf8\nen_ZA.utf8\nen_ZW.utf8\nes_ES\nfr_FR\nPOSIX\nru_RU.utf8\nru_UA.utf8\n```\n\nThe specific list of system locales may vary. Consult your OS documentation to install additional needed locales.\n\n`utf8_general_ci` and `binary` locales are built-in into Manticore. The first one is a generic collation for UTF-8 data (without any so-called language tailoring); it should behave similarly to the `utf8_general_ci` collation in MySQL. The second one is a simple bytewise comparison.\n\nCollation can be overridden via SQL on a per-session basis using the `SET collation_connection` statement. All subsequent SQL queries will use this collation. Otherwise, all queries will use the server default collation or as specified in the [collation_server](../Server_settings/Searchd.md#collation_server) configuration directive. Manticore currently defaults to the `libc_ci` collation.\n\nCollations affect all string attribute comparisons, including those within `ORDER BY` and `GROUP BY`, so differently ordered or grouped results can be returned depending on the collation chosen. Note that collations don't affect full-text searching; for that, use the [charset_table](../Creating_a_table/NLP_and_tokenization/Low-level_tokenization.md#charset_table).\n\n<!-- proofread -->\n\n",
  14. "updated_at": 1768530797,
  15. "source_md5": "633d74b6d31f3eaef70812ac32ad8836"
  16. }
  17. }