-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Describe the bug
When using the IEmbeddingGenerator
implementation for Google AI (AddGoogleAIEmbeddingGenerator
), providing a task_type
within the EmbeddingGenerationOptions.AdditionalProperties
does not add the taskType
field to the outgoing HTTP request body sent to the Google API.
This prevents the use of task-specific embeddings (e.g., RETRIEVAL_DOCUMENT
, RETRIEVAL_QUERY
), which is a critical feature for optimizing search and RAG applications. The generator silently discards the option, leading to default embeddings being generated for all tasks.
To Reproduce
The issue can be reproduced by intercepting the outgoing HttpClient
request.
-
Set up a logging handler:
public class LoggingDelegatingHandler : DelegatingHandler { protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) { if (request.Content != null) { var requestBody = await request.Content.ReadAsStringAsync(cancellationToken); Console.WriteLine($"--> REQUEST BODY:\n{requestBody}"); } return await base.SendAsync(request, cancellationToken); } }
-
Register the services:
services.AddTransient<LoggingDelegatingHandler>(); services.AddHttpClient("GoogleAIClient").AddHttpMessageHandler<LoggingDelegatingHandler>(); // Register the generator, ensuring it uses the instrumented HttpClient services.AddGoogleAIEmbeddingGenerator( modelId: "embedding-001", apiKey: "YOUR_API_KEY", httpClient: services.BuildServiceProvider().GetRequiredService<IHttpClientFactory>().CreateClient("GoogleAIClient") );
-
Execute the call with
task_type
:var embeddingGenerator = serviceProvider.GetRequiredService<IEmbeddingGenerator<string, Embedding<float>>>(); var text = "This is a document for retrieval."; var options = new EmbeddingGenerationOptions() { AdditionalProperties = new() { { "task_type", "RETRIEVAL_DOCUMENT" } } }; await embeddingGenerator.GenerateVectorAsync(text, options);
-
Observe the logged request body:
The output shows a request body without thetaskType
field:--> REQUEST BODY: {"requests":[{"model":"models/gemini-embedding-001","content":{"parts":[{"text":"This is a document for retrieval."}]},"outputDimensionality":1998}]}
Expected behavior
The logged HTTP request body should include the taskType
field, as specified in the Google API documentation.
Expected request body:
--> REQUEST BODY:
{"requests":[{"model":"models/gemini-embedding-001","content":{"parts":[{"text":"This is a document for retrieval."}]},"outputDimensionality":1998}]},"taskType":"RETRIEVAL_DOCUMENT"}]}
Platform
- Language: C#
- Source:
Microsoft.SemanticKernel.Connectors.Google
, version1.66.0-alpha
(and likely earlier versions) - AI model: Google
gemini-embedding-001
- IDE: Visual Studio 2022
- OS: Windows
Additional context
The root cause appears to be that the EmbeddingGenerationOptions
are not being passed down through the internal call stack.
GoogleAIEmbeddingGenerator.GenerateAsync
calls an internal generator.- This internal generator is an adapter for the obsolete
GoogleAITextEmbeddingGenerationService
. - The
GenerateEmbeddingsAsync
method onGoogleAITextEmbeddingGenerationService
does not have a parameter forEmbeddingGenerationOptions
, so the options are discarded at this point. - Consequently, the deeper
GoogleAIEmbeddingRequest.FromData
method, which builds the request, is never supplied with thetaskType
value.
This functionality is crucial for building effective search systems, as the performance difference between default embeddings and specialized RETRIEVAL_DOCUMENT
/RETRIEVAL_QUERY
embeddings is significant.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status