How To Crawl Data From The Linked Webpages On A Webpage We Are Crawling

December 24, 2023 Post a Comment

I am crawling the names of the colleges on this webpage, but, i also want to crawl the number of faculties in these colleges which is available if open the specific webpages of the

Solution 1:

import scrapy
classQuotesSpider(scrapy.Spider):
    name = "student"
    start_urls = [
        'http://www.engineering.careers360.com/colleges/list-of-engineering-colleges-in-karnataka?sort_filter=alpha',
    ]

    defparse(self, response):
        for students in response.css('li.search-result'):
            req = scrapy.Request(students.css(SELECT_URL), callback=self.parse_student)
            req.meta['name'] = students.css('div.title a::text').extract()
            yield req

    defparse_student(self, response):
        yield {
            'name': response.meta.get('name')
            'other data': response.css(SELECTOR)
        }

Should be something like this. So you send the name of the student in the meta data of the request. That allows you to request it in your next request.

If the data is also available on the last page you scrape in parse_student you might want to consider not sending it in the meta data but just to scrape it from the last page.

Python Tutorial for Beginners

How To Crawl Data From The Linked Webpages On A Webpage We Are Crawling

Solution 1:

Post a Comment for "How To Crawl Data From The Linked Webpages On A Webpage We Are Crawling"