-
Notifications
You must be signed in to change notification settings - Fork 13
feat: performance evaluation #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/block-media
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing the changes. There was no need to start new crawler for blocking media.
I'm not sure why the blockRequests function doesn't work. But the page.route function seems to be the way to go for this use case (outside of Crawlee).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 And thank you for fixing this, I haven't noticed that it spawns another crawler instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's test the perf a bit more
| * Only blocks resources if blockMedia is true. | ||
| */ | ||
| async function blockMediaResourcesHook({ page, request }: PlaywrightCrawlingContext<ContentCrawlerUserData>) { | ||
| await page.route('**/*', async (route) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
page.route disables native browser cache which is why blockRequests is normally recommended (that is a native Chromium CDP call). The cache disabling is only bad if you do more requests for the same site. I would do a perf test on more URLs of the same site and test more sites because this could slow us down as well.
Based on @matyascimbulka's suggestion, I refactored the code and moved
preNavigationHooksto a separate function so that selectingblockMedia: true/falsedoes not create a new instance of the crawler.There may be a better way to block media, but it didn’t work for me—perhaps @metalwarrior665 can help here?
Another issue (#60) in standby mode causes multiple crawlers to be created without reason. I’ll leave this for a separate PR.
And some number not as good as I hoped for but still it is an improvement
