Scrapy Gets Stuck Crawling A Long List Of Urls
I am scraping a large list of urls (1000-ish) and after a set time the crawler gets stuck with crawling 0 pages/min. The problem always occurs at the same spot when crawling. The l
Solution 1:
The reason scrapy syas 0 item is that it counts the yielded data while you are not yielding anything but inserting in your database.
Solution 2:
I just had this happen to me, so I wanted to share what caused the bug, in case someone encounters the exact same issue.
Apparently, if you don't specify a callback for a Request, it defaults to the spider's parse method as a callback (my intention was to not have a callback at all for those requests).
In my spider, I used the parse method to make most of the Requests, so this behavior caused many unnecessary requests that eventually led to Scrapy crashing.
Simply adding an empty callback function (lambda a: None
) for those requests solved my issue.
Post a Comment for "Scrapy Gets Stuck Crawling A Long List Of Urls"