Skip to content Skip to sidebar Skip to footer

Getting Typeerror When Trying To Concatenate 'str' And 'nonetype' Objects

I have the following code: import requests from bs4 import BeautifulSoup def hltvmatch_spider(max_offset): offset = 0 while offset < max_offset: url = 'h

Solution 1:

You should set href=True so you only get anchors with href attributes, calling .get("href") will return None when the anchor has no href attribute:

forlinkin soup.findAll('a', href=True):

As I commented getting all anchors is not probably what you want as joining to the base url is not going to work, this will get all the anchor tags from the main content:

cont = soup.select_one("div.covMainBoxContent")

print([a["href"] for a in cont.select("a[href]")])

Which would give you:

['/?pageid=188&matchid=31342', '/?pageid=179&teamid=4411', '/?pageid=179&teamid=5995', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31343', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6865', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31339', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6865', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31338', '/?pageid=179&teamid=5995', '/?pageid=179&teamid=6736', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31341', '/?pageid=179&teamid=6620', '/?pageid=179&teamid=6807', '/?pageid=188&eventid=2238', '/?pageid=188&matchid=31340', '/?pageid=179&teamid=6807', '/?pageid=179&teamid=6620', '/?pageid=188&eventid=2238', '/?pageid=188&matchid=31329', '/?pageid=179&teamid=5995', '/?pageid=179&teamid=6736', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31336', '/?pageid=179&teamid=6998', '/?pageid=179&teamid=4602', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31334', '/?pageid=179&teamid=4602', '/?pageid=179&teamid=6998', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31331', '/?pageid=179&teamid=4674', '/?pageid=179&teamid=6133', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31330', '/?pageid=179&teamid=6133', '/?pageid=179&teamid=4674', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31333', '/?pageid=179&teamid=4501', '/?pageid=179&teamid=6686', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31332', '/?pageid=179&teamid=4501', '/?pageid=179&teamid=6686', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31319', '/?pageid=179&teamid=4411', '/?pageid=179&teamid=6615', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31321', '/?pageid=179&teamid=6133', '/?pageid=179&teamid=5929', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31320', '/?pageid=179&teamid=5929', '/?pageid=179&teamid=6133', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31318', '/?pageid=179&teamid=6615', '/?pageid=179&teamid=4411', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31328', '/?pageid=179&teamid=6222', '/?pageid=179&teamid=6408', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31327', '/?pageid=179&teamid=6408', '/?pageid=179&teamid=6222', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31326', '/?pageid=179&teamid=6621', '/?pageid=179&teamid=6968', '/?pageid=188&eventid=2252', '/?pageid=188&matchid=31325', '/?pageid=179&teamid=6621', '/?pageid=179&teamid=6968', '/?pageid=188&eventid=2252', '/?pageid=188&matchid=31324', '/?pageid=179&teamid=6619', '/?pageid=179&teamid=6785', '/?pageid=188&eventid=2252', '/?pageid=188&matchid=31322', '/?pageid=179&teamid=6619', '/?pageid=179&teamid=6785', '/?pageid=188&eventid=2252', '/?pageid=188&matchid=31317', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31316', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31315', '/?pageid=179&teamid=6995', '/?pageid=179&teamid=7009', '/?pageid=188&eventid=2253', '/?pageid=188&matchid=31306', '/?pageid=179&teamid=6995', '/?pageid=179&teamid=7009', '/?pageid=188&eventid=2253', '/?pageid=188&matchid=31314', '/?pageid=179&teamid=4501', '/?pageid=179&teamid=6686', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31310', '/?pageid=179&teamid=6686', '/?pageid=179&teamid=4501', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31304', '/?pageid=179&teamid=6889', '/?pageid=179&teamid=6994', '/?pageid=188&eventid=2253', '/?pageid=188&matchid=31302', '/?pageid=179&teamid=6994', '/?pageid=179&teamid=6889', '/?pageid=188&eventid=2253', '/?pageid=188&matchid=31313', '/?pageid=179&teamid=6865', '/?pageid=179&teamid=4688', '/?pageid=188&eventid=2255', '/?pageid=188&matchid=31312', '/?pageid=179&teamid=4688', '/?pageid=179&teamid=6865', '/?pageid=188&eventid=2255', '/?pageid=188&matchid=31311', '/?pageid=179&teamid=4688', '/?pageid=179&teamid=6865', '/?pageid=188&eventid=2255', '/?pageid=188&matchid=31309', '/?pageid=179&teamid=4602', '/?pageid=179&teamid=6408', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31308', '/?pageid=179&teamid=4602', '/?pageid=179&teamid=6408', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31307', '/?pageid=179&teamid=4602', '/?pageid=179&teamid=6408', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31305', '/?pageid=179&teamid=6133', '/?pageid=179&teamid=4548', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31303', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6133', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31294', '/?pageid=179&teamid=4869', '/?pageid=179&teamid=6137', '/?pageid=188&eventid=2176', '/?pageid=188&matchid=31293', '/?pageid=179&teamid=6137', '/?pageid=179&teamid=4869', '/?pageid=188&eventid=2176', '/?pageid=188&matchid=31292', '/?pageid=179&teamid=4869', '/?pageid=179&teamid=6137', '/?pageid=188&eventid=2176', '/?pageid=188&matchid=31291', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31290', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31289', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31301', '/?pageid=179&teamid=5996', '/?pageid=179&teamid=6981', '/?pageid=188&eventid=2273', '/?pageid=188&matchid=31300', '/?pageid=179&teamid=5996', '/?pageid=179&teamid=6981', '/?pageid=188&eventid=2273', '/?pageid=188&matchid=31299', '/?pageid=179&teamid=5996', '/?pageid=179&teamid=6869', '/?pageid=188&eventid=2273', '/?pageid=188&matchid=31298', '/?pageid=179&teamid=5996', '/?pageid=179&teamid=6869', '/?pageid=188&eventid=2273', '/?pageid=188&matchid=31297', '/?pageid=179&teamid=6981', '/?pageid=179&teamid=6792', '/?pageid=188&eventid=2273']

If you are not more specific you will get links like:

 http://static.hltv.org//images/category/5.gif

when + with http://www.hltv.org is not going to work well.

To get the links under the first i.e date column we can add a[href*=matchid=] to our select, the html layout is not very nice to parse as there is not what I would call a reliable way to parse the data but since matchid= is unique to the hrefs in that column that will work.

soup = BeautifulSoup(requests.get("http://www.hltv.org/?pageid=188&offset=1").content)
cont = soup.select("div.covMainBoxContent a[href*=matchid=]")

print([a["href"] for a in cont])

To get the full html you need to join the base url to the href:

from urlparse import urljoin

base = "http://www.hltv.org/"
soup = BeautifulSoup(requests.get("http://www.hltv.org/?pageid=188&offset=1").content)
cont = soup.select("div.covMainBoxContent a[href*=matchid=]")

print([urljoin(base, a["href"]) for a in cont])

Which gives you:

['http://www.hltv.org/?pageid=188&matchid=31342', 'http://www.hltv.org/?pageid=188&matchid=31343', 'http://www.hltv.org/?pageid=188&matchid=31339', 'http://www.hltv.org/?pageid=188&matchid=31338', 'http://www.hltv.org/?pageid=188&matchid=31341', 'http://www.hltv.org/?pageid=188&matchid=31340', 'http://www.hltv.org/?pageid=188&matchid=31329', 'http://www.hltv.org/?pageid=188&matchid=31336', 'http://www.hltv.org/?pageid=188&matchid=31334', 'http://www.hltv.org/?pageid=188&matchid=31331', 'http://www.hltv.org/?pageid=188&matchid=31330', 'http://www.hltv.org/?pageid=188&matchid=31333', 'http://www.hltv.org/?pageid=188&matchid=31332', 'http://www.hltv.org/?pageid=188&matchid=31319', 'http://www.hltv.org/?pageid=188&matchid=31321', 'http://www.hltv.org/?pageid=188&matchid=31320', 'http://www.hltv.org/?pageid=188&matchid=31318', 'http://www.hltv.org/?pageid=188&matchid=31328', 'http://www.hltv.org/?pageid=188&matchid=31327', 'http://www.hltv.org/?pageid=188&matchid=31326', 'http://www.hltv.org/?pageid=188&matchid=31325', 'http://www.hltv.org/?pageid=188&matchid=31324', 'http://www.hltv.org/?pageid=188&matchid=31322', 'http://www.hltv.org/?pageid=188&matchid=31317', 'http://www.hltv.org/?pageid=188&matchid=31316', 'http://www.hltv.org/?pageid=188&matchid=31315', 'http://www.hltv.org/?pageid=188&matchid=31306', 'http://www.hltv.org/?pageid=188&matchid=31314', 'http://www.hltv.org/?pageid=188&matchid=31310', 'http://www.hltv.org/?pageid=188&matchid=31304', 'http://www.hltv.org/?pageid=188&matchid=31302', 'http://www.hltv.org/?pageid=188&matchid=31313', 'http://www.hltv.org/?pageid=188&matchid=31312', 'http://www.hltv.org/?pageid=188&matchid=31311', 'http://www.hltv.org/?pageid=188&matchid=31309', 'http://www.hltv.org/?pageid=188&matchid=31308', 'http://www.hltv.org/?pageid=188&matchid=31307', 'http://www.hltv.org/?pageid=188&matchid=31305', 'http://www.hltv.org/?pageid=188&matchid=31303', 'http://www.hltv.org/?pageid=188&matchid=31294', 'http://www.hltv.org/?pageid=188&matchid=31293', 'http://www.hltv.org/?pageid=188&matchid=31292', 'http://www.hltv.org/?pageid=188&matchid=31291', 'http://www.hltv.org/?pageid=188&matchid=31290', 'http://www.hltv.org/?pageid=188&matchid=31289', 'http://www.hltv.org/?pageid=188&matchid=31301', 'http://www.hltv.org/?pageid=188&matchid=31300', 'http://www.hltv.org/?pageid=188&matchid=31299', 'http://www.hltv.org/?pageid=188&matchid=31298', 'http://www.hltv.org/?pageid=188&matchid=31297']

Post a Comment for "Getting Typeerror When Trying To Concatenate 'str' And 'nonetype' Objects"