From f7726d78f338e96002a65651fc20a77861c29200 Mon Sep 17 00:00:00 2001 From: matt Date: Mon, 27 Oct 2025 15:15:52 +0000 Subject: [PATCH 1/6] add link to cuda container toolkit --- pages/advanced-algorithms/install-mage.mdx | 3 +++ 1 file changed, 3 insertions(+) diff --git a/pages/advanced-algorithms/install-mage.mdx b/pages/advanced-algorithms/install-mage.mdx index 909415e0d..8e51bbe27 100644 --- a/pages/advanced-algorithms/install-mage.mdx +++ b/pages/advanced-algorithms/install-mage.mdx @@ -36,6 +36,9 @@ The following tags are available on Docker Hub: - `x.y-relwithdebinfo-cuda` - Memgraph built with CUDA support* - available since version `3.6.1`. *To run GPU-accelerated algorithms, you need to launch the container with the `--gpus all` flag. +This requires the installation of NVIDIA Container Toolkit. See the +[NVIDIA Container Toolkit documentation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) +for more details. For versions prior to `3.2`, MAGE image tags included both MAGE and Memgraph versions, e.g. From 418b58f1a8520888a66bb1786fa1132aa459411b Mon Sep 17 00:00:00 2001 From: matt Date: Mon, 27 Oct 2025 16:13:30 +0000 Subject: [PATCH 2/6] update embeddings page --- .../available-algorithms/embeddings.mdx | 98 +++++++++++++++---- 1 file changed, 81 insertions(+), 17 deletions(-) diff --git a/pages/advanced-algorithms/available-algorithms/embeddings.mdx b/pages/advanced-algorithms/available-algorithms/embeddings.mdx index 6e70c835b..e99d67ae0 100644 --- a/pages/advanced-algorithms/available-algorithms/embeddings.mdx +++ b/pages/advanced-algorithms/available-algorithms/embeddings.mdx @@ -7,6 +7,7 @@ description: Calculate sentence embeddings on node strings using pytorch. import { Cards } from 'nextra/components' import GitHub from '/components/icons/GitHub' +import { Callout } from 'nextra/components' The embeddings module provides tools for calculating sentence embeddings on node strings using pytorch. @@ -35,23 +36,37 @@ created as a property of the nodes in the graph. {

Input:

} - `input_nodes: List[Vertex]` (**OPTIONAL**) ➡ The list of nodes to compute the embeddings for. If not provided, the embeddings are computed for all nodes in the graph. -- `embedding_property: string` ➡ The name of the property to store the embeddings in. This property is `embedding` by default. -- `excluded_properties: List[string]` ➡ The list of properties to exclude from the embeddings computation. This list is empty by default. -- `model_name: string` ➡ The name of the model to use for the embeddings computation, buy default this module uses the `all-MiniLM-L6-v2` model provided by the `sentence-transformers` library. -- `batch_size: int` ➡ The batch size to use for the embeddings computation. This is set to `2000` by default. -- `chunk_size: int` ➡ The number of batches per "chunk". This is used when computing embeddings across multiple GPUs, as this has to be done by spawning multiple processes. Each spawned process computes the embeddings for a single chunk. This is set to 48 by default. -- `device: string|int|List[string|int]` ➡ The device to use for the embeddings computation. This can be any of the following: +- `'configuration`: (`mgp.Map`, **OPTIONAL**)`: User defined parameters from query module. Defaults to {}. + +| Name | Type | Default | Description | +|----------------------------|--------------|-------------------|----------------------------------------------------------------------------------------------------------| +| `embedding_property` | string | `"embedding"` | The name of the property to store the embeddings in. | +| `excluded_properties` | List[string] | `[]` | The list of properties to exclude from the embeddings computation. | +| `model_name` | string | `"all-MiniLM-L6-v2"` | The name of the model to use for the embeddings computation, provided by the `sentence-transformers` library. | +| `return_embeddings` | bool | `False` | Whether to return the embeddings as an additional output or not. | +| `batch_size` | int | `2000` | The batch size to use for the embeddings computation. | +| `chunk_size` | int | `48` | The number of batches per "chunk". This is used when computing embeddings across multiple GPUs, as this has to be done by spawning multiple processes. Each spawned process computes the embeddings for a single chunk. | +| `device` | NULL\|string\| int\|List[string\|int] | `NULL` | The device to use for the embeddings computation (see below). | + + +The `device` parameter can be one of the following: + - `NULL` (default) - Use first GPU if available, otherwise use CPU. - `"cpu"` - Use CPU for computation. - `"cuda"` or `"all"` - Use all available CUDA devices for computation. - `"cuda:id"` - Use a specific CUDA device for computation. - `id` - Use a specific device for computation. - `[id1, id2, ...]` - Use a list of device ids for computation. - `["cuda:id1", "cuda:id2", ...]` - Use a list of CUDA devices for computation. -by default, the first device (`0`) is used. + + + + {

Output:

} - `success: bool` ➡ Whether the embeddings computation was successful. +- `embeddings: List[List[float]]|NULL` ➡ The list of embeddings. Only returned if the +`return_embeddings` parameter is set to `true` in the configuration, otherwise `NULL`. {

Usage:

} @@ -77,18 +92,50 @@ YIELD success; To run the computation on specific device(s), use the following query: ```cypher -CALL embeddings.compute( - NULL, - "embedding", - NULL, - "all-MiniLM-L6-v2", - 2000, - 48, - "cuda:1" -) +WITH {device: "cuda:1"} AS configuration +CALL embeddings.compute(NULL, configuration) YIELD success; ``` +To return the embeddings as an additional output, use the following query: + +```cypher +WITH {return_embeddings: True} AS configuration +CALL embeddings.compute(NULL, configuration) +YIELD success, embeddings; +``` + + +### `embed()` + +This procedure cna be used to return a list of embeddings when given a list of strings. + +{

Input:

} + +- `strings: List[string]` ➡ The list of strings to compute the embeddings for. +- `configuration: mgp.Map` (**OPTIONAL**) ➡ User defined parameters from query module. Defaults to {}. + +| Name | Type | Default | Description | +|----------------------------|--------------|-------------------|----------------------------------------------------------------------------------------------------------| +| `model_name` | string | `"all-MiniLM-L6-v2"` | The name of the model to use for the embeddings computation, provided by the `sentence-transformers` library. | +| `batch_size` | int | `2000` | The batch size to use for the embeddings computation. | +| `chunk_size` | int | `48` | The number of batches per "chunk". This is used when computing embeddings across multiple GPUs, as this has to be done by spawning multiple processes. Each spawned process computes the embeddings for a single chunk. | +| `device` | NULL\|string\| int\|List[string\|int] | `NULL` | The device to use for the embeddings computation. | + + +{

Output:

} + +- `success: bool` ➡ Whether the embeddings computation was successful. +- `embeddings: List[List[float]]` ➡ The list of embeddings. + +{

Usage:

} + +To compute the embeddings for a list of strings, use the following query: + +```cypher +CALL embeddings.embed(["Hello", "World"]) +YIELD success, embeddings; +``` ## Example @@ -132,4 +179,21 @@ Results: | "Parmesan" | [-0.0755439, 0.00906182, -0.010977, 0.0208911, -0.0527448, 0.0085... | | "Red Leicester" | [-0.0244318, -0.0280038, -0.0373183, 0.0284436, -0.0277753, 0.066... | +----------------------------------------------------------------------+----------------------------------------------------------------------+ -``` \ No newline at end of file +``` + +To compute the embeddings for a list of strings, use the following query: + +```cypher +CALL embeddings.embed(["Hello", "World"]) +YIELD success, embeddings; +``` + +Results: + +```plaintext ++----------------------------------------------------------+----------------------------------------------------------------------------------+ +| success | embeddings | ++----------------------------------------------------------+----------------------------------------------------------------------------------+ +| true | [[-0.0627718, 0.0549588, 0.0521648, 0.08579, -0.0827489, -0.074573, 0.0685547... | ++----------------------------------------------------------+----------------------------------------------------------------------------------+ +``` From 4abcae8974baf677d7eb9a3c10f7ba5a954cd9e8 Mon Sep 17 00:00:00 2001 From: matt Date: Mon, 27 Oct 2025 18:03:58 +0000 Subject: [PATCH 3/6] updated function names --- .../available-algorithms/embeddings.mdx | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/pages/advanced-algorithms/available-algorithms/embeddings.mdx b/pages/advanced-algorithms/available-algorithms/embeddings.mdx index e99d67ae0..f99dd16d6 100644 --- a/pages/advanced-algorithms/available-algorithms/embeddings.mdx +++ b/pages/advanced-algorithms/available-algorithms/embeddings.mdx @@ -28,7 +28,7 @@ The embeddings module provides tools for calculating sentence embeddings on node ## Procedures -### `compute()` +### `compute_node_sentence()` The procedure computes the sentence embeddings on the string properties of nodes. Embeddings are created as a property of the nodes in the graph. @@ -73,7 +73,7 @@ The `device` parameter can be one of the following: To compute the embeddings across the entire graph with the default parameters, use the following query: ```cypher -CALL embeddings.compute() +CALL embeddings.compute_node_sentence() YIELD success; ``` @@ -85,7 +85,7 @@ MATCH (n) WITH n ORDER BY id(n) LIMIT 5 WITH collect(n) AS subset -CALL embeddings.compute(subset) +CALL embeddings.compute_node_sentence(subset) YIELD success; ``` @@ -93,7 +93,7 @@ To run the computation on specific device(s), use the following query: ```cypher WITH {device: "cuda:1"} AS configuration -CALL embeddings.compute(NULL, configuration) +CALL embeddings.compute_node_sentence(NULL, configuration) YIELD success; ``` @@ -101,14 +101,14 @@ To return the embeddings as an additional output, use the following query: ```cypher WITH {return_embeddings: True} AS configuration -CALL embeddings.compute(NULL, configuration) +CALL embeddings.compute_node_sentence(NULL, configuration) YIELD success, embeddings; ``` -### `embed()` +### `compute_text()` -This procedure cna be used to return a list of embeddings when given a list of strings. +This procedure can be used to return a list of embeddings when given a list of strings. {

Input:

} @@ -133,7 +133,7 @@ This procedure cna be used to return a list of embeddings when given a list of s To compute the embeddings for a list of strings, use the following query: ```cypher -CALL embeddings.embed(["Hello", "World"]) +CALL embeddings.compute_text(["Hello", "World"]) YIELD success, embeddings; ``` @@ -153,7 +153,7 @@ CREATE (a:Node {id: 1, Title: "Stilton", Description: "A stinky cheese from the Run the following query to compute the embeddings: ```cypher -CALL embeddings.compute() +CALL embeddings.compute_node_sentence() YIELD success; MATCH (n) @@ -184,7 +184,7 @@ Results: To compute the embeddings for a list of strings, use the following query: ```cypher -CALL embeddings.embed(["Hello", "World"]) +CALL embeddings.compute_text(["Hello", "World"]) YIELD success, embeddings; ``` From bb63905059c5560db933f63333ceec43906dbf0b Mon Sep 17 00:00:00 2001 From: matea16 Date: Tue, 28 Oct 2025 11:09:12 +0100 Subject: [PATCH 4/6] update callout --- .../available-algorithms/embeddings.mdx | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/pages/advanced-algorithms/available-algorithms/embeddings.mdx b/pages/advanced-algorithms/available-algorithms/embeddings.mdx index f99dd16d6..16a730322 100644 --- a/pages/advanced-algorithms/available-algorithms/embeddings.mdx +++ b/pages/advanced-algorithms/available-algorithms/embeddings.mdx @@ -36,7 +36,9 @@ created as a property of the nodes in the graph. {

Input:

} - `input_nodes: List[Vertex]` (**OPTIONAL**) ➡ The list of nodes to compute the embeddings for. If not provided, the embeddings are computed for all nodes in the graph. -- `'configuration`: (`mgp.Map`, **OPTIONAL**)`: User defined parameters from query module. Defaults to {}. +- `configuration`: (`mgp.Map`, **OPTIONAL**): User defined parameters from query module. Defaults to `{}`. + +**Configuration options:** | Name | Type | Default | Description | |----------------------------|--------------|-------------------|----------------------------------------------------------------------------------------------------------| @@ -57,9 +59,12 @@ The `device` parameter can be one of the following: - `id` - Use a specific device for computation. - `[id1, id2, ...]` - Use a list of device ids for computation. - `["cuda:id1", "cuda:id2", ...]` - Use a list of CUDA devices for computation. - - +**Note**: If you're running on a GPU device, make sure to start your container +with the `--gpus=all` flag. +For more details, see the [Install MAGE +documentation](/advanced-algorithms/install-mage). + {

Output:

} @@ -70,7 +75,8 @@ The `device` parameter can be one of the following: {

Usage:

} -To compute the embeddings across the entire graph with the default parameters, use the following query: +To compute the embeddings across the entire graph with the default parameters, +use the following query: ```cypher CALL embeddings.compute_node_sentence() @@ -113,7 +119,7 @@ This procedure can be used to return a list of embeddings when given a list of s {

Input:

} - `strings: List[string]` ➡ The list of strings to compute the embeddings for. -- `configuration: mgp.Map` (**OPTIONAL**) ➡ User defined parameters from query module. Defaults to {}. +- `configuration: mgp.Map` (**OPTIONAL**) ➡ User defined parameters from query module. Defaults to `{}`. | Name | Type | Default | Description | |----------------------------|--------------|-------------------|----------------------------------------------------------------------------------------------------------| From a0e3d4299e0be2783861c06b3fe9fac5d70421f8 Mon Sep 17 00:00:00 2001 From: matt Date: Tue, 28 Oct 2025 13:02:35 +0000 Subject: [PATCH 5/6] updated embeddings page --- .../available-algorithms/embeddings.mdx | 56 ++++++++++++++++--- 1 file changed, 47 insertions(+), 9 deletions(-) diff --git a/pages/advanced-algorithms/available-algorithms/embeddings.mdx b/pages/advanced-algorithms/available-algorithms/embeddings.mdx index 16a730322..6b8c17a6b 100644 --- a/pages/advanced-algorithms/available-algorithms/embeddings.mdx +++ b/pages/advanced-algorithms/available-algorithms/embeddings.mdx @@ -28,7 +28,7 @@ The embeddings module provides tools for calculating sentence embeddings on node ## Procedures -### `compute_node_sentence()` +### `node_sentence()` The procedure computes the sentence embeddings on the string properties of nodes. Embeddings are created as a property of the nodes in the graph. @@ -72,6 +72,7 @@ documentation](/advanced-algorithms/install-mage). - `success: bool` ➡ Whether the embeddings computation was successful. - `embeddings: List[List[float]]|NULL` ➡ The list of embeddings. Only returned if the `return_embeddings` parameter is set to `true` in the configuration, otherwise `NULL`. +- `dimension: int` ➡ The dimension of the embeddings. {

Usage:

} @@ -79,7 +80,7 @@ To compute the embeddings across the entire graph with the default parameters, use the following query: ```cypher -CALL embeddings.compute_node_sentence() +CALL embeddings.node_sentence() YIELD success; ``` @@ -91,7 +92,7 @@ MATCH (n) WITH n ORDER BY id(n) LIMIT 5 WITH collect(n) AS subset -CALL embeddings.compute_node_sentence(subset) +CALL embeddings.node_sentence(subset) YIELD success; ``` @@ -99,7 +100,7 @@ To run the computation on specific device(s), use the following query: ```cypher WITH {device: "cuda:1"} AS configuration -CALL embeddings.compute_node_sentence(NULL, configuration) +CALL embeddings.node_sentence(NULL, configuration) YIELD success; ``` @@ -107,12 +108,12 @@ To return the embeddings as an additional output, use the following query: ```cypher WITH {return_embeddings: True} AS configuration -CALL embeddings.compute_node_sentence(NULL, configuration) +CALL embeddings.node_sentence(NULL, configuration) YIELD success, embeddings; ``` -### `compute_text()` +### `text()` This procedure can be used to return a list of embeddings when given a list of strings. @@ -133,16 +134,36 @@ This procedure can be used to return a list of embeddings when given a list of s - `success: bool` ➡ Whether the embeddings computation was successful. - `embeddings: List[List[float]]` ➡ The list of embeddings. +- `dimension: int` ➡ The dimension of the embeddings. {

Usage:

} To compute the embeddings for a list of strings, use the following query: ```cypher -CALL embeddings.compute_text(["Hello", "World"]) +CALL embeddings.text(["Hello", "World"]) YIELD success, embeddings; ``` +### `model_info()` + +The procedure returns the information about the model used for the embeddings computation. + +{

Input:

} + +- `configuration: mgp.Map` (**OPTIONAL**) ➡ User defined parameters from query module. Defaults to `{}`. +The key `model_name` is used to specify the name of the model to use for the embeddings computation. + +{

Output:

} + +- `model_info: mgp.Map` ➡ The information about the model used for the embeddings computation. + +| Name | Type | Default | Description | +|----------------------------|--------------|-------------------|----------------------------------------------------------------------------------------------------------| +| `model_name` | string | `"all-MiniLM-L6-v2"` | The name of the model to use for the embeddings computation, provided by the `sentence-transformers` library. | +| `dimension` | int | `384` | The dimension of the embeddings. | +| `max_seq_length` | int | `256` | The maximum sequence length. | + ## Example Create the following graph: @@ -159,7 +180,7 @@ CREATE (a:Node {id: 1, Title: "Stilton", Description: "A stinky cheese from the Run the following query to compute the embeddings: ```cypher -CALL embeddings.compute_node_sentence() +CALL embeddings.node_sentence() YIELD success; MATCH (n) @@ -190,7 +211,7 @@ Results: To compute the embeddings for a list of strings, use the following query: ```cypher -CALL embeddings.compute_text(["Hello", "World"]) +CALL embeddings.text(["Hello", "World"]) YIELD success, embeddings; ``` @@ -203,3 +224,20 @@ Results: | true | [[-0.0627718, 0.0549588, 0.0521648, 0.08579, -0.0827489, -0.074573, 0.0685547... | +----------------------------------------------------------+----------------------------------------------------------------------------------+ ``` + +To get the information about the model used for the embeddings computation, use the following query: + +```cypher +CALL embeddings.model_info() +YIELD info; +``` + +Results: + +```plaintext ++----------------------------------------------------------------------------+ +| info | ++----------------------------------------------------------------------------+ +| {dimension: 384, max_sequence_length: 256, model_name: "all-MiniLM-L6-v2"} | ++----------------------------------------------------------------------------+ +``` From 30f6c4f03afe6cecb666153b1d81a7677dd6d268 Mon Sep 17 00:00:00 2001 From: Matea Pesic <80577904+matea16@users.noreply.github.com> Date: Tue, 28 Oct 2025 14:32:44 +0100 Subject: [PATCH 6/6] Update pages/advanced-algorithms/available-algorithms/embeddings.mdx --- pages/advanced-algorithms/available-algorithms/embeddings.mdx | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pages/advanced-algorithms/available-algorithms/embeddings.mdx b/pages/advanced-algorithms/available-algorithms/embeddings.mdx index 6b8c17a6b..fd3ce9cac 100644 --- a/pages/advanced-algorithms/available-algorithms/embeddings.mdx +++ b/pages/advanced-algorithms/available-algorithms/embeddings.mdx @@ -122,6 +122,8 @@ This procedure can be used to return a list of embeddings when given a list of s - `strings: List[string]` ➡ The list of strings to compute the embeddings for. - `configuration: mgp.Map` (**OPTIONAL**) ➡ User defined parameters from query module. Defaults to `{}`. +**Configuration options:** + | Name | Type | Default | Description | |----------------------------|--------------|-------------------|----------------------------------------------------------------------------------------------------------| | `model_name` | string | `"all-MiniLM-L6-v2"` | The name of the model to use for the embeddings computation, provided by the `sentence-transformers` library. |