| Branch | CI | Python Versions |
|---|---|---|
| master |
scrapy-datadog-extension is a Scrapy extension to send metrics from your spiders executions to Datadog (scrapy stats).
There is no public pre-packaged version yet. If you want to use it you
will have to clone the project and make it installable easilly from the
requirements.txt.
First, you will need to include the extension to the EXTENSIONS dict located
in your settings.py file. For example:
EXTENSIONS = {
'scrapy-datadog-extension': 1,
}
Then you need to provide the followings variables, directly from the scrapinghub settings of your jobs:
DATADOG_API_KEY: Your Datadog API key.DATADOG_APP_KEY: Your Datadog APP key.DATADOG_CUSTOM_TAGS: List of tags to bind on metricsDATADOG_CUSTOM_METRICS: Sub list of metrics to send to DatadogDATADOG_METRICS_PREFIX: What prefix you want to apply to all of your metrics, e.g.:kp.DATADOG_HOST_NAME: The hostname you want your metrics to be associated with. e.g.:app.scrapinghub.com.
Sometimes one might need to set tags at runtime. For example to compute
them out of the spider arguments. To allow such scenario, just set a
tags attribute to your spider with a list of statsd compatible keys
(i.e. ["foo", ...] or ["foo:bar", ...]). Note that all metrics will
then be tagged as well.
Basically, this extension will, on the spider_closed signal execution, collect
the scrapy stats associated to a given projct/spider/job and extract a list
of variables listed in a stats_to_collect list, custom variables will be also
be added:
elapsed_time: which is a simple computation offinish-time - start_time.done: a simple counter, acting like a ping to indicate that a job is ran regularly.
At the end, we have a list of metrics, with tags associated (to enable better filtering from Datadog):
project: The scrapinghub project ID.spider_name: The scrapinghub spider name as defined in the spider class.
Then, everything is sent to Datadog, using the Datadog API.
- Sometimes, when the
spider_closedis executed right after the job completion, some scrapy stats are missing so we send incomplete list of metrics, preventing us to rely 100% on this extension.
By the way we're hiring across the world 👇
Join our engineering team to help us building data intensive projects! We are looking for people who love their craft and are the best at it.
- Data Engineers in Singapore and Paris
- Data Support Engineers in Singapore
- Data Engineer interns in Singapore and Paris
This code is MIT licensed.
Designed & built by Kpler engineers with a 💻 and some 🍣.

