diff --git a/docs.json b/docs.json index 9d643be49f..ba8c8b6b09 100644 --- a/docs.json +++ b/docs.json @@ -288,7 +288,8 @@ "learn/indexing/indexing_best_practices", "learn/indexing/ram_multithreading_performance", "learn/indexing/tokenization", - "learn/indexing/multilingual-datasets" + "learn/indexing/multilingual-datasets", + "learn/indexing/optimize_indexing_performance" ] }, { diff --git a/learn/indexing/optimize_indexing_performance.mdx b/learn/indexing/optimize_indexing_performance.mdx new file mode 100644 index 0000000000..98a3eff585 --- /dev/null +++ b/learn/indexing/optimize_indexing_performance.mdx @@ -0,0 +1,119 @@ +--- +title: Optimize indexing performance with batch statistics +description: Learn how to analyze the `progressTrace` to identify and resolve indexing bottlenecks in Meilisearch. +--- + +# Optimize indexing performance by analyzing batch statistics + +Indexing performance can vary significantly depending on your dataset, index settings, and hardware. The [batch object](/reference/api/batches) provides information about the progress of asynchronous indexing operations. + +The `progressTrace` field within the batch object offers a detailed breakdown of where time is spent during the indexing process. Use this data to identify bottlenecks and improve indexing speed. + +## Understanding the `progressTrace` + +`progressTrace` is a hierarchical trace showing each phase of indexing and how long it took. +Each entry follows the structure: + +```json +"processing tasks > indexing > extracting word proximity": "33.71s" +``` + +This means: + +- The step occurred during **indexing**. +- The subtask was **extracting word proximity**. +- It took **33.71 seconds**. + +Focus on the **longest-running steps** and investigate which index settings or data characteristics influence them. + +## Key phases and how to optimize them + +### `computing document changes`and `extracting documents` + +| Description | Optimization | +|--------------|--------------| +| Meilisearch compares incoming documents to existing ones. | No direct optimization possible. Process duration scales with the number and size of incoming documents.| + +### `extracting facets` and `merging facet caches` + +| Description | Optimization | +|--------------|--------------| +| Extracts and merges filterable attributes. | Keep the number of [**filterable attributes**](/reference/api/settings#filterable-attributes) to a minimum. | + +### `extracting words` and `merging word caches` + +| Description | Optimization | +|--------------|--------------| +| Tokenizes text and builds the inverted index. | Ensure the [searchable attributes](/reference/api/settings#searchable-attributes) list only includes the fields you want to be checked for query word matches. | + +### `extracting word proximity` and `merging word proximity` + +| Description | Optimization | +|--------------|--------------| +| Builds data structures for phrase and attribute ranking. | Lower the precision of this operation by setting [proximity precision](/reference/api/settings#proximity-precision) to `byAttribute` | + +### `waiting for database writes` + +| Description | Optimization | +|--------------|--------------| +| Time spent writing data to disk. | No direct optimization possible. Either the disk is too slow or you are writing too much data in a single operation. Avoid HDDs (Hard Disk Drives) | + +### `waiting for extractors` + +| Description | Optimization | +|--------------|--------------| +| Time spent waiting for CPU-bound extraction. | No direct optimization possible. Indicates a CPU bottleneck. Use more cores or scale horizontally with [sharding](/learn/advanced/sharding). | + +### `post processing facets > strings bulk` / `numbers bulk` + +| Description | Optimization | +|--------------|--------------| +| Processes equality or comparison filters. | - Disable unused [**filter features**](/reference/api/settings#features), such as comparison operators on string values.
- Reduce the number of [**sortable attributes**](reference/api/settings#sortable-attributes). | + +### `post processing facets > facet search` + +| Description | Optimization | +|--------------|--------------| +| Builds structures for the [facet search API](/reference/api/facet_search). | If you don’t use the facet search API, [disable it](/reference/api/settings#update-facet-search-settings).| + +### Embeddings + +| Trace key | Description | Optimization | +|------------|--------------|--------------| +| `writing embeddings to database` | Time spent saving vector embeddings. | Use embedding vectors with fewer dimensions.
- [Disabling embedding regeneration on document update](/reference/api/documents#vectors).
- Consider enabling [binary quantization](/reference/api/settings#binaryquantized). | + +### `post processing words > word prefix *` + +| Description | Optimization | +|--------------|--------------| +| | Builds prefix data for autocomplete. Allows matching documents that begin with a specific query term, instead of only exact matches.| Disable [**prefix search**](/reference/api/settings#prefix-search) (`prefixSearch: disabled`). _This can severely impact search result relevancy._ | + +### `post processing words > word fst` + +| Description | Optimization | +|--------------|--------------| +| Builds the word FST (finite state transducer). | No direct action possible, as FST size reflect the number of different words in the database. Using documents with fewer searchable words may improve operation speed. | + +## Example analysis + +If you see: + +```json +"processing tasks > indexing > post processing facets > facet search": "1763.06s" +``` + +[Facet searching](/learn/filtering_and_sorting/search_with_facet_filters#searching-facet-values) is raking significant indexing time. If your application doesn’t use facets, disable the feature: + +```bash +curl \ + -X PUT 'MEILISEARCH_URL/indexes/INDEX_UID/settings/facet-search' \ + -H 'Content-Type: application/json' \ + --data-binary 'false' +``` + +## Learn more + +- [Indexing best practices](/learn/indexing/indexing_best_practices) +- [Impact of RAM and multi-threading on indexing performance +](/learn/indexing/ram_multithreading_performance) +- [Configuring index settings](/learn/configuration/configuring_index_settings) diff --git a/snippets/samples/code_samples_compact_index_1.mdx b/snippets/samples/code_samples_compact_index_1.mdx new file mode 100644 index 0000000000..3ae5de23df --- /dev/null +++ b/snippets/samples/code_samples_compact_index_1.mdx @@ -0,0 +1,7 @@ + + +```bash cURL +curl \ + -X POST 'MEILISEARCH_URL/indexes/INDEX_UID/compact' +``` + \ No newline at end of file diff --git a/snippets/samples/code_samples_webhooks_delete_1.mdx b/snippets/samples/code_samples_webhooks_delete_1.mdx index 7b81de8e4c..d27883400d 100644 --- a/snippets/samples/code_samples_webhooks_delete_1.mdx +++ b/snippets/samples/code_samples_webhooks_delete_1.mdx @@ -12,4 +12,8 @@ client.deleteWebhook(WEBHOOK_UUID) ```go Go client.DeleteWebhook("WEBHOOK_UUID"); ``` + +```rust Rust +client.delete_webhook("WEBHOOK_UUID").await.unwrap(); +``` \ No newline at end of file diff --git a/snippets/samples/code_samples_webhooks_get_1.mdx b/snippets/samples/code_samples_webhooks_get_1.mdx index a9be7d1ba0..dad2b7c704 100644 --- a/snippets/samples/code_samples_webhooks_get_1.mdx +++ b/snippets/samples/code_samples_webhooks_get_1.mdx @@ -12,4 +12,8 @@ client.getWebhooks() ```go Go client.ListWebhooks(); ``` + +```rust Rust +let webhooks = client.get_webhooks().await.unwrap(); +``` \ No newline at end of file diff --git a/snippets/samples/code_samples_webhooks_get_single_1.mdx b/snippets/samples/code_samples_webhooks_get_single_1.mdx index 11990ec4cb..345224939f 100644 --- a/snippets/samples/code_samples_webhooks_get_single_1.mdx +++ b/snippets/samples/code_samples_webhooks_get_single_1.mdx @@ -12,4 +12,8 @@ client.getWebhook(WEBHOOK_UUID) ```go Go client.GetWebhook("WEBHOOK_UUID"); ``` + +```rust Rust +let webhook = client.get_webhook("WEBHOOK_UUID").await.unwrap(); +``` \ No newline at end of file diff --git a/snippets/samples/code_samples_webhooks_patch_1.mdx b/snippets/samples/code_samples_webhooks_patch_1.mdx index 4b9012dfde..b7b6d8aba7 100644 --- a/snippets/samples/code_samples_webhooks_patch_1.mdx +++ b/snippets/samples/code_samples_webhooks_patch_1.mdx @@ -26,4 +26,13 @@ client.UpdateWebhook("WEBHOOK_UUID", &meilisearch.UpdateWebhookRequest{ }, }); ``` + +```rust Rust +let mut update = meilisearch_sdk::webhooks::WebhookUpdate::new(); +update.remove_header("referer"); +let webhook = client + .update_webhook("WEBHOOK_UUID", &update) + .await + .unwrap(); +``` \ No newline at end of file diff --git a/snippets/samples/code_samples_webhooks_post_1.mdx b/snippets/samples/code_samples_webhooks_post_1.mdx index 05598865ca..ffe892ed05 100644 --- a/snippets/samples/code_samples_webhooks_post_1.mdx +++ b/snippets/samples/code_samples_webhooks_post_1.mdx @@ -32,4 +32,12 @@ client.AddWebhook(&meilisearch.AddWebhookRequest{ }, }); ``` + +```rust Rust +let mut payload = meilisearch_sdk::webhooks::WebhookCreate::new("WEBHOOK_TARGET_URL"); +payload + .insert_header("authorization", "SECURITY_KEY") + .insert_header("referer", "https://example.com"); +let webhook = client.create_webhook(&payload).await.unwrap(); +``` \ No newline at end of file