Skip to content
This repository has been archived by the owner on May 19, 2020. It is now read-only.

Latest commit

 

History

History
49 lines (34 loc) · 1 KB

README.md

File metadata and controls

49 lines (34 loc) · 1 KB

scrapy-chromedebugproto

Example of how to integrate Scrapy with Chrome Debugging Protocol

WARNING: highly toxic code!! Not production-ready, not at all!! You've been warned.

Getting started

Get a recent Chrome, with headless mode if you can

Run with for example

$ google-chrome-unstable --disable-gpu --headless --remote-debugging-port=9223

Install Python dependencies

  • scrapy
  • treq
  • twisted
  • autobahn

Add the dowloader middleware

DOWNLOADER_MIDDLEWARES = {
    'middleware.HeadlesschromeDownloaderMiddleware': 543,
}

Todo

  • Handle non-HTTP 200 responses
  • switch debug logs on/off
  • configurable debugger URL
  • a proper state machine
  • make a Python package out of it
  • check how load and requests concurrency is handled
  • add tests
  • ...

Inspiration