Scrapy - Doesn't Crawl

August 28, 2023 Post a Comment

I'm trying to get a recursive crawl running and since the one I wrote wasn't working fine, I pulled an example from web and tried. I really don't know, where the problem is, but th

Solution 1:

Modify your SgmlLinkExtractor as payala suggested
Remove the restrict_xpaths section of the link extractor

These changes will fix the issue being experienced. I'd also make the following suggestion to the xpath used to select titles, as this will remove the empty items that will occur because the next page links are also being selected.

defparse_items(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select("//p[@class='row']")

Solution 2:

Try substituting in your SgmlLinkExtractor "d00\.html" with ".*00\.html" or "index\d+00\.html"

Python Tutorial for Beginners

Scrapy - Doesn't Crawl

Solution 1:

Solution 2:

Post a Comment for "Scrapy - Doesn't Crawl"