Skip to content Skip to sidebar Skip to footer

Scrapy - Doesn't Crawl

I'm trying to get a recursive crawl running and since the one I wrote wasn't working fine, I pulled an example from web and tried. I really don't know, where the problem is, but th

Solution 1:

  1. Modify your SgmlLinkExtractor as payala suggested
  2. Remove the restrict_xpaths section of the link extractor

These changes will fix the issue being experienced. I'd also make the following suggestion to the xpath used to select titles, as this will remove the empty items that will occur because the next page links are also being selected.

defparse_items(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select("//p[@class='row']")

Solution 2:

Try substituting in your SgmlLinkExtractor "d00\.html" with ".*00\.html" or "index\d+00\.html"

Post a Comment for "Scrapy - Doesn't Crawl"