Skip to content

Conversation

@mattkjames7
Copy link
Contributor

@mattkjames7 mattkjames7 commented Oct 27, 2025

  • Add dimension as return value

  • Remove the compute_ prefix from the procedure names

  • Add embeddings.model_info(configuration) -> Map

  • (Breaking) Renamed compute() - > node_sentence()

  • (Breaking) Simplified node_sentence() function arguments such that it now only takes the optional list of nodes and an optional configuration map, e.g.

WITH {device: "cuda:0"} as configuration
CALL embeddings.node_sentence(NULL, configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
  • Added dimension output to node_sentence() to show what the length of the embedding is for the current model.

  • Added option to return embeddings list from node_sentence() alongside the success parameter (using return_embeddings parameter inside the configuration map).

  • Added text() function for computing embeddings directly on lists of strings, e.g.:

WITH {device: "cuda:0"} as configuration
CALL embeddings.text(["Extra", "Cheese", "Please"], configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
  • Added model_info() procedure to return information about the model being used as a Map, e.g.:
WITH {model_name: "all-MiniLM-L6-v2"} AS configuration
CALL embeddings.model_info(configuration)
YIELD info
RETURN info;
  • Fixed default device selection. Previously, the device was set to 0 (i.e. cuda:0), which meant that CPU-only image users would have to manually specify that the embeddings should be computed on CPU. Now, if no device is specified, the module will check if CUDA is available; if so: it will use the first available (cuda:0); otherwise it will fallback to CPU.

@mattkjames7 mattkjames7 self-assigned this Oct 27, 2025
@mattkjames7 mattkjames7 added Docs needed Docs needed feature feature labels Oct 27, 2025
@mattkjames7
Copy link
Contributor Author

mattkjames7 commented Oct 27, 2025

Description

  • (Breaking) Renamed compute() - > node_sentence()
  • (Breaking) Simplified node_sentence() function arguments such that it now only takes the optional list of nodes and an optional configuration map, e.g.
WITH {device: "cuda:0"} as configuration
CALL embeddings.node_sentence(NULL, configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
  • Added dimension output to node_sentence() to show what the length of the embedding is for the current model.

  • Added option to return embeddings list from node_sentence() alongside the success parameter (using return_embeddings parameter inside the configuration map).

  • Added text() function for computing embeddings directly on lists of strings, e.g.:

WITH {device: "cuda:0"} as configuration
CALL embeddings.text(["Extra", "Cheese", "Please"], configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
  • Added model_info() procedure to return information about the model being used as a Map, e.g.:
WITH {model_name: "all-MiniLM-L6-v2"} AS configuration
CALL embeddings.model_info(configuration)
YIELD info
RETURN info;
  • Fixed default device selection. Previously, the device was set to 0 (i.e. cuda:0), which meant that CPU-only image users would have to manually specify that the embeddings should be computed on CPU. Now, if no device is specified, the module will check if CUDA is available; if so: it will use the first available (cuda:0); otherwise it will fallback to CPU.

Pull request type

  • Bugfix
  • Algorithm/Module
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

Related issues

Reviewer checklist (the reviewer checks this part)

Module/Algorithm

  • Core algorithm/module implementation
  • Query module implementation
  • Tests provided (unit / e2e)
  • Code documentation
  • README short description

Documentation checklist

  • Add the documentation label tag
  • Add the bug / feature label tag
  • Add the milestone for which this feature is intended
    • If not known, set for a later milestone
  • Write a release note, including added/changed clauses
    • Breaking: renamed compute() function to node_sentence() and simplified input arguments arguments such that it now accepts a configuration Map to define parameters such as device and batch_size. Added configuration option return_embeddings to node_sentence() so that a list of embeddings are returned. node_sentence() now returns dimension - the length of the output of the embedding model. Added text() function to embed lists of strings directly. Added model_info() function to return a Map of information about the model being used. Fixed default device such that CPU-only containers fallback to CPU compute without having to specify that device="cpu". #686
  • Link the documentation PR here

@mattkjames7 mattkjames7 requested a review from gitbuda October 27, 2025 16:54
@mattkjames7 mattkjames7 marked this pull request as ready for review October 27, 2025 16:55
@mattkjames7 mattkjames7 added this to the mage-v3.7.0 milestone Oct 28, 2025
@sonarqubecloud
Copy link

@mattkjames7 mattkjames7 added this pull request to the merge queue Oct 29, 2025
Merged via the queue into main with commit 1a47072 Oct 29, 2025
29 checks passed
@mattkjames7 mattkjames7 deleted the embeddings-improvements branch October 29, 2025 11:45
@mattkjames7 mattkjames7 modified the milestones: mage-v3.7.0, mage-v3.6.2 Nov 3, 2025
mattkjames7 added a commit that referenced this pull request Nov 3, 2025
- [x] Add `dimension` as return value 
- [x] Remove the `compute_` prefix from the procedure names
- [x] Add `embeddings.model_info(configuration) -> Map`


- (Breaking) Renamed `compute()` - > `node_sentence()`
- (Breaking) Simplified `node_sentence()` function arguments such that
it now only takes the optional list of nodes and an optional
configuration map, e.g.

```cypher
WITH {device: "cuda:0"} as configuration
CALL embeddings.node_sentence(NULL, configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
````

- Added `dimension` output to `node_sentence()` to show what the length
of the embedding is for the current model.

- Added option to return embeddings list from `node_sentence()`
alongside the `success` parameter (using `return_embeddings` parameter
inside the configuration map).

- Added `text()` function for computing embeddings directly on lists of
strings, e.g.:

```cypher
WITH {device: "cuda:0"} as configuration
CALL embeddings.text(["Extra", "Cheese", "Please"], configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
```

- Added `model_info()` procedure to return information about the model
being used as a `Map`, e.g.:

```cypher
WITH {model_name: "all-MiniLM-L6-v2"} AS configuration
CALL embeddings.model_info(configuration)
YIELD info
RETURN info;
```
- Fixed default `device` selection. Previously, the device was set to
`0` (i.e. `cuda:0`), which meant that CPU-only image users would have to
manually specify that the embeddings should be computed on CPU. Now, if
no device is specified, the module will check if CUDA is available; if
so: it will use the first available (`cuda:0`); otherwise it will
fallback to CPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants