Scrapy Gets Stuck Crawling A Long List Of Urls

March 17, 2024 Post a Comment

I am scraping a large list of urls (1000-ish) and after a set time the crawler gets stuck with crawling 0 pages/min. The problem always occurs at the same spot when crawling. The l

Solution 1:

The reason scrapy syas 0 item is that it counts the yielded data while you are not yielding anything but inserting in your database.

Solution 2:

I just had this happen to me, so I wanted to share what caused the bug, in case someone encounters the exact same issue.

Apparently, if you don't specify a callback for a Request, it defaults to the spider's parse method as a callback (my intention was to not have a callback at all for those requests).

Baca Juga

Patch Django Site Package From A Pull Request Using Pip
Python: Finding The Input Of Pandas Datetimeindex.asof()
How Can I Get Selected Text In Another Textinput. Getting Error "'screenmanager' Object Has No Attribute 'widget_1'"

In my spider, I used the parse method to make most of the Requests, so this behavior caused many unnecessary requests that eventually led to Scrapy crashing. Simply adding an empty callback function (lambda a: None) for those requests solved my issue.

Python Tutorial for Beginners

Scrapy Gets Stuck Crawling A Long List Of Urls

Solution 1:

Solution 2:

Post a Comment for "Scrapy Gets Stuck Crawling A Long List Of Urls"