feat: Add openai support for semantic parse_pdf #253

YoungVor · 2025-10-07T15:54:13Z

TL;DR

Added PDF file support for OpenAI models with proper token counting and estimation.

What changed?

Made max_completion_tokens optional in OpenAI chat completions requests
Implemented PDF file token counting for OpenAI models:
- Added methods to count tokens for PDF files in input and output
- Updated token counter to handle PDF files with proper token estimation
- Added support for PDF parsing to various OpenAI models in the model catalog
Refactored openai token counting logic
Mini refactor - separate _max_output_tokens user limit concept from _estimate_output_tokens for cost estimation and throttling
- added to openai and updated gemini
- Max tokens (put in request)
  - Use max tokens provided by semantic_operator if exists
    - OR, for page parsing specifically, use an upper limit based on output limit of our smallest VLM supported (8000 tokens)
  - Add expected reasoning effort
- Estimate output tokens (for cost estimate and throttling)
  - Use max tokens provided by semantic_operator if exists
  - OR estimate file output tokens
  - Add expected reasoning effort
Added openai models to semantic_parse_pdf tests

Out of scope

The token estimation should happen at the semantic operator level, since it has the context of what its expecting from the model. Currently, semantic operator only passes 'max token' limit to the client and we use that upper limit in our estimates. As a future improvement we should refactor and have the semantic operator decide on the output token limit for the request

How to test?

Run the new token counter tests: pytest tests/_inference/test_openai_token_counter.py
Test PDF parsing with OpenAI models: pytest tests/_backends/local/functions/test_semantic_parse_pdf.py
Verify that token estimation works correctly with PDF files by using a model that supports PDF parsing

YoungVor · 2025-10-07T15:54:29Z

feat: tweak pdf parser for corner cases and add 120s demo #259 : 2 dependent PRs (#260 , #265 )
feat: Add pdf_parsing to openrouter #257
feat: Add openai support for semantic parse_pdf #253 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

YoungVor · 2025-10-07T20:33:47Z

Running tests with 200 pages of financial documentation

O3 and gpt-4o-mini did the best at reproducing the table structure.

###gpt-4o-mini

- Time taken: 0:04:11.227989
estimated_input_tokens=365142, estimated_output_tokens=147632
Session Usage Summary:
  App Name: evaluate_pdf_parsing
  Session ID: f7d378de-2d81-47f8-923f-e2b34cbbd8d5
  Total queries executed: 4
  Total execution time: 249608.92ms
  Total rows processed: 9
  Total language model cost: $0.078701
  Total language model requests: 69
  Total language model tokens: 171,084 input tokens, 2,944 cached input tokens, 88,030 output tokens
  Total embedding model cost: $0.00
  Total cost: $0.078701

###o3

- Time taken: 0:05:44.930210
estimated_input_tokens=365142, estimated_output_tokens=430256
Session Usage Summary:
  App Name: evaluate_pdf_parsing
  Session ID: dd3224ae-ce1a-41b1-b97b-935aa24a8d0f
  Total queries executed: 4
  Total execution time: 340675.98ms
  Total rows processed: 9
  Total language model cost: $1.473413
  Total language model requests: 69
  Total language model tokens: 173,959 input tokens, 0 cached input tokens, 140,687 output tokens
  Total embedding model cost: $0.00
  Total cost: $1.473413

###gpt-5-nano

- Time taken: 0:02:38.444469
estimated_input_tokens=365142, estimated_output_tokens=288944
Session Usage Summary:
  App Name: evaluate_pdf_parsing
  Session ID: a43d8f82-cd38-4965-8223-54e01663a5b7
  Total queries executed: 4
  Total execution time: 155641.77ms
  Total rows processed: 9
  Total language model cost: $0.061618
  Total language model requests: 69
  Total language model tokens: 221,763 input tokens, 0 cached input tokens, 126,325 output tokens
  Total embedding model cost: $0.00
  Total cost: $0.061618

###gpt-5-mini

- Time taken: 0:04:05.514505
estimated_input_tokens=365142, estimated_output_tokens=288944
Session Usage Summary:
  App Name: evaluate_pdf_parsing
  Session ID: 9a7c6fe8-6e01-4a10-98ab-c49dda8410f7
  Total queries executed: 4
  Total execution time: 243652.21ms
  Total rows processed: 9
  Total language model cost: $0.347699
  Total language model requests: 69
  Total language model tokens: 218,307 input tokens, 3,456 cached input tokens, 146,518 output tokens
  Total embedding model cost: $0.00
  Total cost: $0.347699

###gpt-5

- Time taken: 0:05:00.949263
estimated_input_tokens=365142, estimated_output_tokens=288944
Session Usage Summary:
  App Name: evaluate_pdf_parsing
  Session ID: 8ab36464-19ec-4499-b8ca-832131cf2a33
  Total queries executed: 4
  Total execution time: 299190.23ms
  Total rows processed: 9
  Total language model cost: $1.680932
  Total language model requests: 69
  Total language model tokens: 172,167 input tokens, 1,792 cached input tokens, 146,550 output tokens
  Total embedding model cost: $0.00
  Total cost: $1.680932

bcallender

I'm just a bit confused as to why the token limit needs to be set in each client when they're all using the same constant -- I get the simplification of not trying to calculate it right now.

src/fenic/_inference/google/gemini_native_chat_completions_client.py

bcallender

lgtm! separating out the concepts of output limit vs output estimate across all of the clients is a nice way of allowing us to implement provider-specific behavior in a standardized way.

src/fenic/_backends/local/semantic_operators/parse_pdf.py

src/fenic/_inference/google/gemini_native_chat_completions_client.py

src/fenic/api/functions/semantic.py

YoungVor changed the title ~~Add openai support for semantic parse_pdf~~ feat: Add openai support for semantic parse_pdf Oct 7, 2025

YoungVor force-pushed the 09-25-add_openai_support_for_semantic_parse_pdf branch 3 times, most recently from f71a6bc to 37e3057 Compare October 7, 2025 20:29

YoungVor mentioned this pull request Oct 8, 2025

feat: Add pdf_parsing to openrouter #257

Merged

YoungVor force-pushed the 09-25-add_openai_support_for_semantic_parse_pdf branch 2 times, most recently from c4ccd60 to ef58d6d Compare October 8, 2025 20:23

YoungVor marked this pull request as ready for review October 8, 2025 20:24

YoungVor requested a review from bcallender October 8, 2025 20:24

This was referenced Oct 12, 2025

feat: tweak pdf parser for corner cases and add 120s demo #259

Open

feat: pdf parsing evaluation tool and test pipeline #260

Draft

YoungVor force-pushed the 09-25-add_openai_support_for_semantic_parse_pdf branch 2 times, most recently from 3395eac to 64879d0 Compare October 13, 2025 16:38

bcallender reviewed Oct 13, 2025

View reviewed changes

src/fenic/_inference/google/gemini_native_chat_completions_client.py Outdated Show resolved Hide resolved

YoungVor force-pushed the 09-25-add_openai_support_for_semantic_parse_pdf branch 2 times, most recently from 07cf907 to baf3028 Compare October 15, 2025 21:15

Add openai support for semantic parse_pdf

faeb5db

bcallender approved these changes Oct 16, 2025

View reviewed changes

src/fenic/_backends/local/semantic_operators/parse_pdf.py Show resolved Hide resolved

src/fenic/_inference/google/gemini_native_chat_completions_client.py Show resolved Hide resolved

src/fenic/api/functions/semantic.py Show resolved Hide resolved

YoungVor force-pushed the 09-25-add_openai_support_for_semantic_parse_pdf branch from baf3028 to faeb5db Compare October 21, 2025 18:29

YoungVor merged commit e3f58cd into main Oct 21, 2025
12 checks passed

YoungVor deleted the 09-25-add_openai_support_for_semantic_parse_pdf branch October 21, 2025 18:52

typedef-ai-gha bot mentioned this pull request Oct 15, 2025

chore(main): release 0.6.0 #256

Open

YoungVor mentioned this pull request Oct 24, 2025

feat: add page-range arg to pdf parse #265

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add openai support for semantic parse_pdf #253

feat: Add openai support for semantic parse_pdf #253

YoungVor commented Oct 7, 2025 •

edited

Loading

Uh oh!

YoungVor commented Oct 7, 2025 •

edited

Loading

Uh oh!

YoungVor commented Oct 7, 2025

Uh oh!

bcallender left a comment

Uh oh!

Uh oh!

bcallender left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add openai support for semantic parse_pdf #253

feat: Add openai support for semantic parse_pdf #253

Conversation

YoungVor commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

What changed?

Out of scope

How to test?

Uh oh!

YoungVor commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YoungVor commented Oct 7, 2025

Running tests with 200 pages of financial documentation

Uh oh!

bcallender left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bcallender left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YoungVor commented Oct 7, 2025 •

edited

Loading

YoungVor commented Oct 7, 2025 •

edited

Loading