从redis启动Spider

scrapy_redis.spiders下有两个类RedisSpider和RedisCrawlSpider,能够使spider从Redis读取start_urls

spider从redis中读取要爬的start_urls,然后执行爬取,若爬取过程中返回更多的request url,那么它会继续进行直至所有的request完成之后,再从redis start_urls中读取下一个url,循环这个过程

RedisSpider

examplemycrawler_redis.py举例

  1. 运行

    scrapy runspider example/spiders/myspider_redis.py
    
  2. push urls to redis:

    redis-cli lpush myspider:start_urls http://baidu.com
    

RedisCrawlSpider

examplemycrawler_redis.py举例

  1. run the spider:

    scrapy runspider example/spiders/mycrawler_redis.py
    
  2. push urls to redis:

    redis-cli lpush mycrawler:start_urls http://baidu.com