-
Notifications
You must be signed in to change notification settings - Fork 163
chore: support for pre-filters in $vectorSearch MCP-240 #689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for pre-filters in the $vectorSearch stage of the aggregate tool, along with new matcher utilities to improve accuracy test flexibility.
Key Changes:
- Enhanced
$vectorSearchfilter field description to distinguish between pre-filtering (using indexed filter fields) and post-filtering (using$matchstages) - Added three new matcher utilities:
caseInsensitiveStringfor case-insensitive string comparisons,notfor negating matchers, andarrayOrSinglefor matching either array or single values - Expanded accuracy tests to validate pre-filter and post-filter scenarios in vector search queries
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/accuracy/sdk/matcher.ts | Added new matcher classes (CaseInsensitiveStringMatcher, NotMatcher, ArrayOrSingleValueMatching) and corresponding factory methods |
| tests/accuracy/aggregate.test.ts | Added new test cases for pre-filter and post-filter scenarios, refactored common embedding parameters, and fixed typo "sci-fy" → "sci-fi" |
| src/tools/mongodb/read/aggregate.ts | Updated pipeline description with detailed guidance on pre-filtering vs post-filtering in $vectorSearch stages |
This comment has been minimized.
This comment has been minimized.
Pull Request Test Coverage Report for Build 18784361545Details
💛 - Coveralls |
Voyage will reject all requests with extra parameters
📊 Accuracy Test Results📈 Summary
📎 Download Full HTML Report - Look for the Report generated on: 10/24/2025, 4:13:07 PM |
himanshusinghs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I left minor concerns/questions.
| .preprocess( | ||
| unboxNumber, | ||
| z.union([z.literal(256), z.literal(512), z.literal(1024), z.literal(2048), z.literal(4096)]) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it SDK that is doing the boxing of Number to Int32, etc? wondering if that is something we should be doing everywhere else as well.
| ), | ||
| pipeline: z.array(z.union([AnyStage, VectorSearchStage])).describe( | ||
| `An array of aggregation stages to execute. | ||
| \`$vectorSearch\` **MUST** be the first stage of the pipeline, or the first stage of a \`$unionWith\` subpipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless proven otherwise by accuracy tests I fear this sounding more like $vectorSearch always has to be the first stage regardless of the pipeline needing one or not.
Would it make sense to frame it like -
If $vectorSearch is to be used it **MUST** ... rest of the content
Proposed changes
Adds support pre-filters in the aggregate tool, when using the $vectorSearch stage. We are also adding more matchers for better accuracy tests, namely:
Matcher.caseInsensitiveString: for checking if a string is equal to another ignoring the case. Some LLMs can change the casing of values, and we don't have control over that unless the user specifically prompts for it, so for our tests, we will assume they are correct.Matcher.not: negates a matcher. For example, to ensure an array does not contain a specific value.Matcher.arrayOrSingle: Matches either[ value ]orvalue. This is important because MQL queries sometimes support both values.Checklist