Scrapy Results Are Repeating
I am trying to get names of the songs from this site https://pagalworld.me/category/11598/Latest%20Bollywood%20Hindi%20Mp3%20Songs%20-%202017.html using link extractor but the resu
Solution 1:
Everything seems to be correct with your spider. However if you look at the song page it offers two versions of each song:
$ scrapy shell "https://pagalworld.me/files/12450/Babumoshai%20Bandookbaaz%20(2017)%20Movie%20Mp3%20Songs.html"
>[1]: response.xpath('//li/b/a/text()').extract()
<[1]:
['03 Aye Saiyan - Babumoshai Bandookbaaz 190Kbps.mp3',
'03 Aye Saiyan - Babumoshai Bandookbaaz 320Kbps.mp3',
'01 Barfani - Male (Armaan Malik) 190Kbps.mp3',
'01 Barfani - Male (Armaan Malik) 320Kbps.mp3',
'02 Barfani - Female (Orunima Bhattacharya) 190Kbps.mp3',
'02 Barfani - Female (Orunima Bhattacharya) 320Kbps.mp3']
One version is lower 190kbps quality and the other is higher 320kbps quality. In this you probably want just to keep one of those:
>[2]: response.xpath('//li/b/a/text()[contains(.,"320Kb")]').extract()
<[2]:
['03 Aye Saiyan - Babumoshai Bandookbaaz 320Kbps.mp3',
'01 Barfani - Male (Armaan Malik) 320Kbps.mp3',
'02 Barfani - Female (Orunima Bhattacharya) 320Kbps.mp3']
Edit:
Seems like there are also duplication issues. Try disabling follow=True
on your link extractor since in this case you don't want to follow.
Post a Comment for "Scrapy Results Are Repeating"