-
Notifications
You must be signed in to change notification settings - Fork 31
Embeddings improvements #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pick CPU/first GPU depending upon availability
Description
WITH {device: "cuda:0"} as configuration
CALL embeddings.node_sentence(NULL, configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
WITH {device: "cuda:0"} as configuration
CALL embeddings.text(["Extra", "Cheese", "Please"], configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
WITH {model_name: "all-MiniLM-L6-v2"} AS configuration
CALL embeddings.model_info(configuration)
YIELD info
RETURN info;
Pull request type
Related issuesReviewer checklist (the reviewer checks this part)Module/Algorithm
Documentation checklist
|
|
- [x] Add `dimension` as return value
- [x] Remove the `compute_` prefix from the procedure names
- [x] Add `embeddings.model_info(configuration) -> Map`
- (Breaking) Renamed `compute()` - > `node_sentence()`
- (Breaking) Simplified `node_sentence()` function arguments such that
it now only takes the optional list of nodes and an optional
configuration map, e.g.
```cypher
WITH {device: "cuda:0"} as configuration
CALL embeddings.node_sentence(NULL, configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
````
- Added `dimension` output to `node_sentence()` to show what the length
of the embedding is for the current model.
- Added option to return embeddings list from `node_sentence()`
alongside the `success` parameter (using `return_embeddings` parameter
inside the configuration map).
- Added `text()` function for computing embeddings directly on lists of
strings, e.g.:
```cypher
WITH {device: "cuda:0"} as configuration
CALL embeddings.text(["Extra", "Cheese", "Please"], configuration)
YIELD success, embeddings, dimension
RETURN success, embeddings, dimension;
```
- Added `model_info()` procedure to return information about the model
being used as a `Map`, e.g.:
```cypher
WITH {model_name: "all-MiniLM-L6-v2"} AS configuration
CALL embeddings.model_info(configuration)
YIELD info
RETURN info;
```
- Fixed default `device` selection. Previously, the device was set to
`0` (i.e. `cuda:0`), which meant that CPU-only image users would have to
manually specify that the embeddings should be computed on CPU. Now, if
no device is specified, the module will check if CUDA is available; if
so: it will use the first available (`cuda:0`); otherwise it will
fallback to CPU.



Add
dimensionas return valueRemove the
compute_prefix from the procedure namesAdd
embeddings.model_info(configuration) -> Map(Breaking) Renamed
compute()- >node_sentence()(Breaking) Simplified
node_sentence()function arguments such that it now only takes the optional list of nodes and an optional configuration map, e.g.Added
dimensionoutput tonode_sentence()to show what the length of the embedding is for the current model.Added option to return embeddings list from
node_sentence()alongside thesuccessparameter (usingreturn_embeddingsparameter inside the configuration map).Added
text()function for computing embeddings directly on lists of strings, e.g.:model_info()procedure to return information about the model being used as aMap, e.g.:deviceselection. Previously, the device was set to0(i.e.cuda:0), which meant that CPU-only image users would have to manually specify that the embeddings should be computed on CPU. Now, if no device is specified, the module will check if CUDA is available; if so: it will use the first available (cuda:0); otherwise it will fallback to CPU.