SpidersΒΆ

Spiders are the central component of Skyscraper. To be able to do targeted scraping you need to implement a spider for each scraping task. Each spider is a standard scrapy spider. To create your own spider you have to subclass scrapy.Spider:

import scrapy

class ExampleSpider(scrapy.Spider):
    name = 'example'
    start_urls = ['http://example.com/']

    def parse(self, response):
        pass

You have to set the name field to exactly the same value as the name of your spider in your storage. For PostgreSQL the name must be the same as the name in your PostrgreSQL, for file storage the filename of your spider must have exactly the same name.

Warning

Since Python files cannot contain dashes, do not use dash characters in your spider names. Instead, use underscores.