出错记录

Scrapy URLError

错误信息如下:

2015-12-03 16:05:08 [scrapy] INFO: Scrapy 1.0.3 started (bot: LabelCrawler) 2015-12-03 16:05:08 [scrapy] INFO: Optional features available: ssl, http11, boto 2015-12-03 16:05:08 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'LabelCrawler.spiders', 'SPIDER_MODULES': ['LabelCrawler.spiders'], 'BOT_NAME': 'LabelCrawler'} 2015-12-03 16:05:08 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState 2015-12-03 16:05:09 [boto] DEBUG: Retrieving credentials from metadata server. 2015-12-03 16:05:09 [boto] ERROR: Caught exception reading instance data Traceback (most recent call last): File "D:\Anaconda\lib\site-packages\boto\utils.py", line 210, in retry_url r = opener.open(req, timeout=timeout) File "D:\Anaconda\lib\urllib2.py", line 431, in open response = self._open(req, data) File "D:\Anaconda\lib\urllib2.py", line 449, in _open '_open', req) File "D:\Anaconda\lib\urllib2.py", line 409, in _call_chain result = func(*args) File "D:\Anaconda\lib\urllib2.py", line 1227, in http_open return self.do_open(httplib.HTTPConnection, req) File "D:\Anaconda\lib\urllib2.py", line 1197, in do_open raise URLError(err) URLError: <urlopen error [Errno 10051] >

原因如下:

  That particular error message is being generated by boto (boto 2.38.0 py27_0), which is used to connect to Amazon S3. Scrapy doesn't have this enabled by default.

解决办法:

1.在settings.py文件中,加上

DOWNLOAD_HANDLERS = {'S3': None,}
  1. 在settings.py文件中,加上
    AWS_ACCESS_KEY_ID = ""
    AWS_SECRET_ACCESS_KEY = ""
    
    即使报错,也不影响爬虫

(error) MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk.

今天运行Redis时发生错误,错误信息如下:

(error) MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.

Redis被配置为保存数据库快照,但它目前不能持久化到硬盘。用来修改集合数据的命令不能用。请查看Redis日志的详细错误信息。

原因:

强制关闭Redis快照导致不能持久化。

解决方案:

运行config set stop-writes-on-bgsave-error no 命令后,关闭配置项stop-writes-on-bgsave-error解决该问题。

root@ubuntu:/usr/local/redis/bin# ./redis-cli
127.0.0.1:6379> config set stop-writes-on-bgsave-error no
OK
127.0.0.1:6379> lpush myColour "red"
(integer) 1

redis-scrapy

settings.py 千万不能添加

LOG_STDOUT=True