Browse Source

fixed documentation to include details for the real vs wall time and node vs per table time counters

Stas 2 weeks ago
parent
commit
86d033eb39

+ 4 - 4
manual/english/Logging/Query_logging.md

@@ -61,8 +61,6 @@ The features of the Manticore SQL log format compared to the [plain format](../L
 ```
 <!-- end -->
 
-In each entry, **real** is the elapsed time from start to finish of the query (wall-clock). **wall** is the accumulated query time: for a single local table it equals real; for [distributed](../Creating_a_table/Creating_a_distributed_table/Creating_a_distributed_table.md) queries it is the sum of local search time plus each agent's reported query time. Because agents run in parallel, wall can exceed real (e.g. real 10s, wall 50s with 5 agents each taking ~10s). Use real for "how long the user waited"; use wall to see total work done across agents.
-
 ### Plain log format
 
 <!-- example plain_log -->
@@ -84,8 +82,8 @@ The log format is as follows:
 ```
 
 where:
-* `real-time` is the elapsed time from the start to the finish of the query (wall-clock).
-* `wall-time` is the accumulated query time: for a single local table it matches real-time; for [distributed](../Creating_a_table/Creating_a_distributed_table/Creating_a_distributed_table.md) queries it is the sum of local search time plus each agent's query time, so it can exceed real-time when agents run in parallel.
+* `real-time` is the end-to-end time from the start to the finish of the query. In SphinxQL logs it corresponds to the `real` field.
+* `wall-time` is Manticore's internal query wall-time metric. In SphinxQL logs it corresponds to the `wall` field, and this same value is used by `query_log_min_msec`. For distributed and multi-source queries, `wall-time` can differ from `real-time`.
 * `perf-stats` includes CPU/IO stats when Manticore is started with `--cpustats` (or it was enabled via `SET GLOBAL cpustats=1`) and/or `--iostats` (or it was enabled via `SET GLOBAL iostats=1`):
   - `ios` is the number of file I/O operations carried out;
   - `kb` is the amount of data in kilobytes read from the table files;
@@ -108,6 +106,8 @@ where:
 
 Note: the `SPH*` modes are specific to the `sphinx` legacy interface. SQL and JSON interfaces will log, in most cases, `ext2` as `match-mode` and `ext` and `rel` as `sort-mode`.
 
+For distributed queries, use `SHOW STATUS` counters `dist_wall`, `dist_local`, and `dist_wait` to analyze where time is spent. These counters are complementary and not direct substitutes for query log `real`/`wall`.
+
 <!-- intro -->
 Query log example:
 <!-- request Example -->

+ 6 - 0
manual/english/Node_info_and_management/Node_status.md

@@ -216,6 +216,12 @@ The `SHOW STATUS` command gives a detailed report on various performance metrics
 
 These metrics help track performance over specific time intervals, making it easier to spot trends in query response times and find possible bottlenecks.
 
+For search timings:
+- `search_stats_ms_*` in `SHOW STATUS` is based on the same query wall-time family that is shown as `wall` in query logs.
+- `query_wall` / `avg_query_wall` are cumulative/average node counters over executed queries.
+- `dist_wall`, `dist_local`, and `dist_wait` are distributed-only dimensions and should be interpreted together, not as direct replacements for `search_stats_ms_*`.
+- Per-table timing counters (`SHOW TABLE ... STATUS`) may differ from node-level counters, especially for distributed tables.
+
 The following metrics are part of the `SHOW STATUS` output:
 - `*_avg`: The average query time for each type of query over the last 1, 5, and 15 minutes.
 - `*_min`: The shortest query time recorded for each query type.

+ 1 - 1
manual/english/Node_info_and_management/Table_settings_and_status/SHOW_TABLE_STATUS.md

@@ -49,7 +49,7 @@ Here is the meaning of these values:
 * `query_time_*`, `exact_query_time_*`: query execution time statistics for the last 1 minute, 5 minutes, 15 minutes, and total since server start; data is encapsulated as a JSON object, including the number of queries and min, max, avg, 95, and 99 percentile values.
 * `found_rows_*`: statistics of rows found by queries; provided for the last 1 minute, 5 minutes, 15 minutes, and total since server start; data is encapsulated as a JSON object, including the number of queries and min, max, avg, 95, and 99 percentile values.
 * `command_*`: counters for the total number of times a specific command has been successfully executed against this table.
-* `search_stats_ms_*`: statistics on the execution time (in milliseconds) for search queries. The * indicates the time window (e.g., 1min, 5min, 15min, total). These stats are calculated over sliding windows of 1, 5, and 15 minutes, showing average, minimum, maximum, and 95th/99th percentile values for query times.
+* `search_stats_ms_*`: statistics on the execution time (in milliseconds) for search queries. The * indicates the time window (e.g., 1min, 5min, 15min, total). These stats are calculated over sliding windows of 1, 5, and 15 minutes, showing average, minimum, maximum, and 95th/99th percentile values for query times. For distributed tables, these values reflect aggregated local and remote table contributions and can be higher than a single node-level query wall/real value.
 * `insert_replace_stats_ms_*`: statistics on the execution time (in milliseconds) for insert and replace queries. The * indicates the time window (e.g., 1min, 5min, 15min, total). These stats are calculated over sliding windows of 1, 5, and 15 minutes, showing average, minimum, maximum, and 95th/99th percentile values for query times.
 * `update_stats_ms_*`: statistics on the execution time (in milliseconds) for update queries. The * indicates the time window (e.g., 1min, 5min, 15min, total). These stats are calculated over sliding windows of 1, 5, and 15 minutes, showing average, minimum, maximum, and 95th/99th percentile values for query times.
 

+ 4 - 2
src/daemon/query_log.cpp

@@ -667,7 +667,8 @@ static void LogQuerySphinxql ( const CSphQuery & q, const CSphQuery & tJoinOptio
 	QuotationEscapedBuilder tBuf;
 	int iCompactIN = (g_bLogCompactIn ? LOG_COMPACT_IN : 0);
 
-	// real = elapsed wall-clock; wall = accumulated (for distributed: sum of local + agents, so wall can exceed real)
+	// real = elapsed wall-clock; wall = internal query wall-time metric used by query logging.
+	// In distributed/multi-source queries, wall and real may differ.
 	int iQueryTime = Max ( tMeta.m_iQueryTime, 0 );
 	int iRealTime = Max ( tMeta.m_iRealQueryTime, 0 );
 
@@ -796,7 +797,8 @@ void LogBuddyQuery ( Str_t sQuery, BuddyQuery_e tType )
 
 	QuotationEscapedBuilder tBuf;
 
-	// real = elapsed wall-clock; wall = accumulated (for distributed: sum of local + agents, so wall can exceed real)
+	// real = elapsed wall-clock; wall = internal query wall-time metric used by query logging.
+	// In distributed/multi-source queries, wall and real may differ.
 	int iQueryTime = Max ( tMeta.m_iQueryTime, 0 );
 	int iRealTime = Max ( tMeta.m_iRealQueryTime, 0 );
 

+ 1 - 1
src/sphinx.h

@@ -703,7 +703,7 @@ struct ExpansionStats_t
 class CSphQueryResultMeta
 {
 public:
-	int						m_iQueryTime = 0;		///< query time, ms. For distributed: sum of local + all agents' query times (can exceed m_iRealQueryTime when agents run in parallel)
+	int						m_iQueryTime = 0;		///< query wall-time metric, milliseconds; in distributed/multi-source queries it can differ from m_iRealQueryTime
 	int						m_iRealQueryTime = 0;	///< elapsed wall-clock time from start to finish of the query, milliseconds
 	int64_t					m_iCpuTime = 0;			///< user time, microseconds
 	int						m_iMultiplier = 1;		///< multi-query multiplier, -1 to indicate error