-
Notifications
You must be signed in to change notification settings - Fork 23
feat: Add openai support for semantic parse_pdf #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add openai support for semantic parse_pdf #253
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
f71a6bc to
37e3057
Compare
Running tests with 200 pages of financial documentationO3 and gpt-4o-mini did the best at reproducing the table structure. ###gpt-4o-mini ###o3 ###gpt-5-nano ###gpt-5-mini ###gpt-5 |
c4ccd60 to
ef58d6d
Compare
3395eac to
64879d0
Compare
bcallender
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just a bit confused as to why the token limit needs to be set in each client when they're all using the same constant -- I get the simplification of not trying to calculate it right now.
src/fenic/_inference/google/gemini_native_chat_completions_client.py
Outdated
Show resolved
Hide resolved
07cf907 to
baf3028
Compare
bcallender
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! separating out the concepts of output limit vs output estimate across all of the clients is a nice way of allowing us to implement provider-specific behavior in a standardized way.
baf3028 to
faeb5db
Compare

TL;DR
Added PDF file support for OpenAI models with proper token counting and estimation.
What changed?
max_completion_tokensoptional in OpenAI chat completions requests_max_output_tokensuser limit concept from_estimate_output_tokensfor cost estimation and throttling- OR, for page parsing specifically, use an upper limit based on output limit of our smallest VLM supported (8000 tokens)
Out of scope
The token estimation should happen at the semantic operator level, since it has the context of what its expecting from the model. Currently, semantic operator only passes 'max token' limit to the client and we use that upper limit in our estimates. As a future improvement we should refactor and have the semantic operator decide on the output token limit for the request
How to test?
pytest tests/_inference/test_openai_token_counter.pypytest tests/_backends/local/functions/test_semantic_parse_pdf.py