go-scrapy

A scrapy implementation in Go. (Work in progres)

Overview

go-scrapy is a very useful and productive web crawlign framework, used to crawl websites and extract structured data from parsed pages.

Requirements

Golang 1.x - 1.9.x
Works on Linux, Windows, Mac OSX, BSD

Installation

Install:

go get github.com/kabelsea/go-scrapy

Import:

import scrapy "github.com/kabelsea/go-scrapy/scrapy"

Quickstart

func main() {
  // Init spider configuration
  config := &scrapy.SpiderConfig{
    Name:               "HabraBot",
    MaxDepth:           5,
    ConcurrentRequests: 20,
    StartUrls: []string{
      "https://habrahabr.ru/",
    },
    Rules: []scrapy.Rule{
      {
        LinkExtractor: &scrapy.LinkExtractor{
          Allow:        []string{`^/post/\d+/$`},
          AllowDomains: []string{`^habrahabr\.ru`},
        },
        Follow: true,
      },
      {
        LinkExtractor: &scrapy.LinkExtractor{
          Allow:        []string{`^/users/[^/]+/$`},
          AllowDomains: []string{`^habrahabr\.ru`},
        },
        Handler: ProcessItem,
      },
    },
  }

  // Create new spider
  spider, err := scrapy.NewSpider(config)
  if err != nil {
    panic(err)
  }

  // Run spider and wait
  spider.Wait()
}

// Process crawled page
func ProcessItem(resp *scrapy.Response) {
  log.Println("Process item:", resp.Url, resp.StatusCode)
}

Howto

Please go through examples to get an idea how to use this package.

Roadmap

Middlewares
More examples
Tests

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
examples/habrahabr.ru		examples/habrahabr.ru
scrapy		scrapy
.gitignore		.gitignore
.travis.yml		.travis.yml
Makefile		Makefile
README.md		README.md
glide.lock		glide.lock
glide.yaml		glide.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

go-scrapy

Overview

Requirements

Installation

Quickstart

Howto

Roadmap

About

Releases

Packages

Languages

kabelsea/go-scrapy

Folders and files

Latest commit

History

Repository files navigation

go-scrapy

Overview

Requirements

Installation

Quickstart

Howto

Roadmap

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages