An html scraper microservice based on x-ray & micro
Request
Send a GET request to /scrape endpoint with query string if:
- Scraping a text
| Params | Required | Description |
|---|---|---|
| s-url | yes | destination website url to be scraped |
| s-selector | yes | css selector of data to be extracted |
- Scraping multiple of data objects
| Params | Required | Description |
|---|---|---|
| s-url | yes | destination website url to be scraped |
| s-scope | yes | css selector of data's scope |
| s-limit | no | limit number of objects returned |
| [selector] | yes | css selector of each data to be extracted |
Response
A text or an array of objects in json whose keys are specified selectors in the request's query string.
Scraping Bitcoin price in USD from CoinMarketCap
- Request (uri encoded):
https://scraper.fun/scrape?s-url=https://coinmarketcap.com&s-selector=%23id-bitcoin%20.price - Response: as shown below
- Request (uri encoded):
https://scraper.fun/scrape?s-url=https://coinmarketcap.com&s-scope=table%23currencies%20tbody%20tr&name=.currency-name%20.currency-name-container&price=.price&s-limit=3 - Response: as shown below
Make sure NodeJS (9.0.0 or newer), Yarn or NPM installed on your local machine. Then install project dependencies by running:
yarnyarn startThe service will be up at 127.0.0.1:9000 by default
We use ESLint to lint source code. Simply run:
yarn testBy the command:
PORT=80 yarn serveThe app will be up at 127.0.0.1
You can use the existing docker image from https://hub.docker.com/r/phatpham9/scraper by running:
docker pull phatpham9/scraper
docker run -d -p 80:80 phatpham9/scraperThe app will be up at 127.0.0.1
CaptainDuckDuck is a nice heroku-liked tool to deploy your apps easily. You need to install CaptainDuckDuck client on your local, follow the instruction here to do it then run on your local:
captainduckduck deployThat's it!
Click the below button to deploy to Heroku dyno
- Fork this repository to your own GitHub account and then clone it to your local device
- Follow the Development guide or just simply run:
yarn start - Lint code by running: yarn test
- Create a pull request for us
- Phat Pham (@phatpham9)

