-
-
Couldn't load subscription status.
- Fork 1.9k
Feature/enhanced adaptive scraper #1018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
LYordanovClearware
wants to merge
16
commits into
ScrapeGraphAI:main
Choose a base branch
from
LYordanovClearware:feature/enhanced-adaptive-scraper
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Feature/enhanced adaptive scraper #1018
LYordanovClearware
wants to merge
16
commits into
ScrapeGraphAI:main
from
LYordanovClearware:feature/enhanced-adaptive-scraper
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Fixed model ID bug (strip openai/ prefix) - Made max_tokens configurable for image extraction - Enhanced screenshot scrolling to capture full pages - Merged SmartScraperGraph + ScreenshotScraperGraph results - Added hallucination filter for fake speakers - Improved prompt to work with OpenAI content policies - Added lazy-load scrolling support (timeout-based) - Created FastAPI backend with web UI - Added Excel export with metadata 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
tiktoken==0.7.0 requires Rust compiler on Python 3.13 (no prebuilt wheels). Using Python 3.11 to ensure smooth deployment on Streamlit Cloud. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Streamlit Cloud may not recognize .streamlit/config.toml python version. Using .python-version file as fallback to force Python 3.11. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Trying runtime.txt as Streamlit Cloud standard for Python version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Changed requires-python from '>=3.10,<4.0' to '>=3.10,<3.13' This forces Streamlit Cloud to use Python 3.12 or below, which has prebuilt tiktoken wheels (no Rust compiler needed). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Installing rust-all system package to compile tiktoken on Python 3.13 if pyproject.toml constraint doesn't force earlier Python version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed incorrect import in generate_code_node.py that was causing ModuleNotFoundError. langchain_classic doesn't exist, should be langchain_community. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The code uses langchain_classic but it wasn't in dependencies. Added langchain-classic>=1.0.0 to pyproject.toml and reverted generate_code_node.py to use langchain_classic (the correct import). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
langchain_classic is bundled inside langchain starting from version 1.0.0. Removed separate langchain-classic dependency and bumped langchain min version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… issue CodeGeneratorGraph requires langchain_classic which has packaging issues. Since we don't use CodeGeneratorGraph for speaker scraping, commenting it out is the simplest workaround. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added langchain-classic as explicit dependency to fix import errors on Streamlit Cloud deployment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added try/except block to gracefully fall back to langchain.output_parsers if langchain_classic is not available. This ensures compatibility across different deployment environments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added fallback to load OPENAI_API_KEY from Streamlit secrets for hosted deployments. Also added langchain-classic to requirements.txt. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.