Skip to content Skip to sidebar Skip to footer

How To Go To Next Page Using Beautiful Soup?

I have to extract information from 5 pages from a website. At the end of every page there is 'NEXT PAGE' button. this is the html code of the next button -
  • Copy

    If we run the code for a few iterations, you can see we get each page:

    In [1]: import requests
       ...: from bs4 import BeautifulSoup
       ...: start= "https://colleges.niche.com/?degree=4-year&sort=best"
       ...: url = "https://colleges.niche.com/?degree=4-year&sort=best&page={}"
       ...: soup = BeautifulSoup(requests.get(start).content, "html.parser")
       ...: pages =int(soup.select("select.pagination__pages__selector option")[-1]
       ...: .text.split(None, 1)[1])
       ...: print([a.text for a in soup.select("a.search__results__list__item__entit
       ...: y")])
       ...: for page inrange(2, pages):
       ...:     soup = BeautifulSoup(requests.get(url.format(page)).content, "html.p
       ...: arser")
       ...:     print([a.text for a in soup.select("a.search__results__list__item__e
       ...: ntity")])
       ...:     
    [u'Stanford University', u'Massachusetts Institute of Technology', u'Yale University', u'Harvard University', u'Princeton University', u'Rice University', u'Bowdoin College', u'University of Pennsylvania', u'Washington University in St. Louis', u'Brown University', u'Duke University', u'Columbia University', u'Dartmouth College', u'Vanderbilt University', u'Pomona College', u'California Institute of Technology', u'University of Southern California', u'University of Notre Dame', u'University of Chicago', u'Washington & Lee University', u'Carleton College', u'Colgate University', u'University of Michigan - Ann Arbor', u'Northwestern University', u'Tufts University']
    [u'Williams College', u'Georgetown University', u'Amherst College', u'Cornell University', u'Thomas Jefferson University', u'University of Texas - Health Science Center at Houston', u'Barnard College', u'Haverford College', u'Carnegie Mellon University', u'Emory University', u'University of California - Los Angeles', u'Harvey Mudd College', u'Medical University of South Carolina', u'Franklin W. Olin College of Engineering', u'Claremont McKenna College', u'Middlebury College', u'Swarthmore College', u'Bates College', u'University of Virginia', u'University of Texas - Austin', u'University of California - Berkeley', u'Virginia Tech', u'University of North Carolina at Chapel Hill', u'University of Texas - Medical Branch at Galveston', u'Davidson College']
    [u'Colby College', u'Hamilton College', u'Samuel Merritt University', u'Georgia Institute of Technology', u'University of Richmond', u'Lehigh University', u'Grinnell College', u'Northeastern University', u'University of Illinois at Urbana-Champaign', u'New York University', u'University of Wisconsin', u'Wake Forest University', u'Reed College', u'Bucknell University', u'Oregon Health & Science University', u'Johns Hopkins University', u'Lafayette College', u'University of Texas - Health Science Center at San Antonio', u'Smith College', u'Wellesley College', u'University of Rochester', u'Scripps College', u'College of William & Mary', u'University of Florida', u'The Curtis Institute of Music']
    [u'United States Coast Guard Academy', u'College of the Holy Cross', u'Penn State', u'Bryn Mawr College', u'Wesleyan University', u'Ohio State University', u'Colorado School of Mines', u'Texas A&M University', u'University of Maryland - Baltimore', u'Purdue University', u'University of California - Santa Barbara', u'University of Georgia', u'University of Miami', u'Tulane University', u'University of Tulsa', u'Boston College', u'The Juilliard School', u'Texas Tech University Health Sciences Center', u'Worcester Polytechnic Institute', u'Franklin & Marshall College', u'Brigham Young University', u'Southern Methodist University', u'Mount Holyoke College', u'Kenyon College', u'University of Washington']
    

    If you were to mimic the post, the following would work. Depending on what data you want this actually may be preferable as you get json back:

    import requests
    from bs4 import BeautifulSoup
    
    start = "https://colleges.niche.com/?degree=4-year&sort=best"
    post = "https://colleges.niche.com/entity-search/"
    
    data = {"degreeType": ["4-year"], "sort": "best", "page": 1, "vertical": "colleges"}
    
    soup = BeautifulSoup(requests.get(start).content, "html.parser")
    pages = int(soup.select("select.pagination__pages__selector option")[-1].text.split(None, 1)[1])
    for page inrange(1, pages+ 1):
        data["page"] = page
        r = requests.post(post, json=data)
        print(r.json())
    

    That gives you data like:

    {u'count': 2854, u'results': [{u'reviewCount': 258, u'netPrice': 20315, u'reviewAvg': 3.7713178294573644, u'totalStudents': 2034, u'grade': 4.33, u'tagline': u'4 Year · Williamstown, MA', u'SATRange': u'1350-1560', u'label': u'Williams College', u'url': u'https://colleges.niche.com/williams-college/', u'ACTRange': u'31-34', u'location': {u'lat': 42.7117, u'lng': -73.2059}, u'guid': u'465D4A73-875C-498E-9C8F-E47568E156F2', u'type': u'College'}, {u'reviewCount': 1081, u'netPrice': 25786, u'reviewAvg': 3.698427382053654, u'totalStudents': 7226, u'grade': 4.33, u'tagline': u'4 Year · Washington, DC', u'SATRange': u'1320-1520', u'label': u'Georgetown University', u'url': u'https://colleges.niche.com/georgetown-university/', u'ACTRange': u'30-33', u'location': {u'lat': 38.9088, u'lng': -77.0735}, u'guid': u'34AF6312-6F20-4D90-B512-AC5CD720AB25', u'type': u'College'}, {u'reviewCount': 247, u'netPrice': 14687, u'reviewAvg': 3.8259109311740893, u'totalStudents': 1792, u'grade': 4.33, u'tagline': u'4 Year · Amherst, MA', u'SATRange': u'1350-1548', u'label': u'Amherst College', u'url': u'https://colleges.niche.com/amherst-college/', u'ACTRange': u'30-34', u'location': {u'lat': 42.3725, u'lng': -72.5185}, u'guid': u'127EC524-4BAC-4A5C-A7F5-1EAD9C309F44', u'type': u'College'}, {u'reviewCount': 1730, u'netPrice': 28537, u'reviewAvg': 3.654913294797688, u'totalStudents': 14269, u'grade': 4.33, u'tagline': u'4 Year · Ithaca, NY', u'SATRange': u'1330-1510', u'label': u'Cornell University', u'url': u'https://colleges.niche.com/cornell-university/', u'ACTRange': u'30-34', u'location': {u'lat': 42.4453, u'lng': -76.4827}, u'guid': u'C35E497B-10BC-4482-92E5-F27941433B02', u'type': u'College'}, {u'reviewCount': 254, u'netPrice': None, u'reviewAvg': 3.8149606299212597, u'totalStudents': 649, u'grade': 4.33, u'tagline': u'4 Year · Philadelphia, PA', u'SATRange': None, u'label': u'Thomas Jefferson University', u'url': u'https://colleges.niche.com/thomas-jefferson-university/', u'ACTRange': None, u'location': {u'lat': 39.9491, u'lng': -75.1581}, u'guid': u'E8C9EBC6-90C5-4CDF-A324-2CCE16060B61', u'type': u'College'}, {u'reviewCount': 131, u'netPrice': None, u'reviewAvg': 3.740458015267176, u'totalStudents': 539, u'grade': 4.33, u'tagline': u'4 Year · Houston, TX', u'SATRange': None, u'label': u'University of Texas - Health Science Center at Houston', u'url': u'https://colleges.niche.com/university-of-texas----health-science-center-at-houston/', u'ACTRange': None, u'location': {u'lat': 29.7029, u'lng': -95.4032}, u'guid': u'43EEDD7D-8204-4014-961B-BEDDBD4C6417', u'type': u'College'}, {u'reviewCount': 390, u'netPrice': 21791, u'reviewAvg': 3.776923076923077, u'totalStudents': 2537, u'grade': 4.33, u'tagline': u'4 Year · New York, NY', u'SATRange': u'1250-1440', u'label': u'Barnard College', u'url': u'https://colleges.niche.com/barnard-college/', u'ACTRange': u'28-32', u'location': {u'lat': 40.8091, u'lng': -73.964}, u'guid': u'DD4FCD82-8E4E-4F4C-A7DC-FADCEBB49681', u'type': u'College'}, {u'reviewCount': 190, u'netPrice': 22409, u'reviewAvg': 3.789473684210526, u'totalStudents': 1189, u'grade': 4.33, u'tagline': u'4 Year · Haverford, PA', u'SATRange': u'1330-1490', u'label': u'Haverford College', u'url': u'https://colleges.niche.com/haverford-college/', u'ACTRange': u'31-34', u'location': {u'lat': 40.0134, u'lng': -75.3026}, u'guid': u'271075B3-07A0-450B-B4F3-78EB1FC7C03A', u'type': u'College'}, {u'reviewCount': 1310, u'netPrice': 33670, u'reviewAvg': 3.6068702290076335, u'totalStudents': 5699, u'grade': 4.33, u'tagline': u'4 Year · Pittsburgh, PA', u'SATRange': u'1340-1540', u'label': u'Carnegie Mellon University', u'url': u'https://colleges.niche.com/carnegie-mellon-university/', u'ACTRange': u'30-34', u'location': {u'lat': 40.4446, u'lng': -79.9429}, u'guid': u'D8A17C0F-CC25-4D2A-B231-0303EA016427', u'type': u'College'}, {u'reviewCount': 1392, u'netPrice': 28203, u'reviewAvg': 3.757183908045977, u'totalStudents': 7732, u'grade': 4.33, u'tagline': u'4 Year · Atlanta, GA', u'SATRange': u'1280-1460', u'label': u'Emory University', u'url': u'https://colleges.niche.com/emory-university/', u'ACTRange': u'29-32', u'location': {u'lat': 33.7988, u'lng': -84.3258}, u'guid': u'86AD5853-ED72-4EFD-855C-4746FF698941', u'type': u'College'}, {u'reviewCount': 4465, u'netPrice': 12510, u'reviewAvg': 3.838521836506159, u'totalStudents': 29033, u'grade': 4.33, u'tagline': u'4 Year · Los Angeles, CA', u'SATRange': u'1190-1460', u'label': u'University of California - Los Angeles', u'url': u'https://colleges.niche.com/university-of-california----los-angeles/', u'ACTRange': u'27-33', u'location': {u'lat': 34.0689, u'lng': -118.444}, u'guid': u'1D1D82CF-C659-49F0-A526-7AFB85BD3A4F', u'type': u'College'}, {u'reviewCount': 122, u'netPrice': 33137, u'reviewAvg': 3.6639344262295084, u'totalStudents': 802, u'grade': 4.33, u'tagline': u'4 Year · Claremont, CA', u'SATRange': u'1418-1570', u'label': u'Harvey Mudd College', u'url': u'https://colleges.niche.com/harvey-mudd-college/', u'ACTRange': u'33-35', u'location': {u'lat': 34.1061, u'lng': -117.711}, u'guid': u'20D662BE-8428-4DE2-BF0D-72D22F0A04B5', u'type': u'College'}, {u'reviewCount': 71, u'netPrice': None, u'reviewAvg': 4.014084507042253, u'totalStudents': 281, u'grade': 4.33, u'tagline': u'4 Year · Charleston, SC', u'SATRange': None, u'label': u'Medical University of South Carolina', u'url': u'https://colleges.niche.com/medical-university-of-south-carolina/', u'ACTRange': None, u'location': {u'lat': 32.786, u'lng': -79.9469}, u'guid': u'7CD7C977-D16A-4399-8D7E-3B1FA0DFAB7D', u'type': u'College'}, {u'reviewCount': 115, u'netPrice': 29979, u'reviewAvg': 4.095652173913043, u'totalStudents': 350, u'grade': 4.33, u'tagline': u'4 Year · Needham, MA', u'SATRange': u'1410-1550', u'label': u'Franklin W. Olin College of Engineering', u'url': u'https://colleges.niche.com/franklin-w-olin-college-of-engineering/', u'ACTRange': u'32-34', u'location': {u'lat': 42.2928, u'lng': -71.264}, u'guid': u'88A3438F-9304-481E-8022-0AE353991161', u'type': u'College'}, {u'reviewCount': 399, u'netPrice': 23982, u'reviewAvg': 3.87468671679198, u'totalStudents': 1298, u'grade': 4.33, u'tagline': u'4 Year · Claremont, CA', u'SATRange': u'1350-1520', u'label': u'Claremont McKenna College', u'url': u'https://colleges.niche.com/claremont-mckenna-college/', u'ACTRange': u'30-33', u'location': {u'lat': 34.1023, u'lng': -117.707}, u'guid': u'DAE7241A-4D00-4C50-B1A5-F33BAF3A6C3B', u'type': u'College'}, {u'reviewCount': 458, u'netPrice': 20903, u'reviewAvg': 3.7139737991266375, u'totalStudents': 2492, u'grade': 4.33, u'tagline': u'4 Year · Middlebury, VT', u'SATRange': u'1260-1470', u'label': u'Middlebury College', u'url': u'https://colleges.niche.com/middlebury-college/', u'ACTRange': u'30-33', u'location': {u'lat': 44.0091, u'lng': -73.1761}, u'guid': u'0E72BF23-A3CF-4995-9585-33B5BD0F9222', u'type': u'College'}, {u'reviewCount': 401, u'netPrice': 22557, u'reviewAvg': 3.56857855361596, u'totalStudents': 1534, u'grade': 4.33, u'tagline': u'4 Year · Swarthmore, PA', u'SATRange': u'1360-1540', u'label': u'Swarthmore College', u'url': u'https://colleges.niche.com/swarthmore-college/', u'ACTRange': u'29-34', u'location': {u'lat': 39.9041, u'lng': -75.3561}, u'guid': u'891F20E2-4B6F-4626-83F3-15D502B2E7C1', u'type': u'College'}, {u'reviewCount': 320, u'netPrice': 22062, u'reviewAvg': 3.878125, u'totalStudents': 1773, u'grade': 4.33, u'tagline': u'4 Year · Lewiston, ME', u'SATRange': None, u'label': u'Bates College', u'url': u'https://colleges.niche.com/bates-college/', u'ACTRange': None, u'location': {u'lat': 44.1053, u'lng': -70.2033}, u'guid': u'2C036559-5EBB-4C00-B3B8-6679A91FB040', u'type': u'College'}, {u'reviewCount': 1995, u'netPrice': 14069, u'reviewAvg': 3.800501253132832, u'totalStudents': 15622, u'grade': 4.33, u'tagline': u'4 Year · Charlottesville, VA', u'SATRange': u'1250-1460', u'label': u'University of Virginia', u'url': u'https://colleges.niche.com/university-of-virginia/', u'ACTRange': u'28-33', u'location': {u'lat': 38.0365, u'lng': -78.5026}, u'guid': u'9EA86CB5-E8A6-47E6-A219-FDCABC31AE51', u'type': u'College'}, {u'reviewCount': 5513, u'netPrice': 16832, u'reviewAvg': 3.8824596408489027, u'totalStudents': 36309, u'grade': 4.33, u'tagline': u'4 Year · Austin, TX', u'SATRange': u'1170-1410', u'label': u'University of Texas - Austin', u'url': u'https://colleges.niche.com/university-of-texas----austin/', u'ACTRange': u'26-32', u'location': {u'lat': 30.2847, u'lng': -97.7373}, u'guid': u'BC90E2B6-E112-43ED-AC5C-3548829EA3DD', u'type': u'College'}, {u'reviewCount': 3718, u'netPrice': 16655, u'reviewAvg': 3.5922538999462077, u'totalStudents': 26320, u'grade': 4.33, u'tagline': u'4 Year · Berkeley, CA', u'SATRange': u'1240-1500', u'label': u'University of California - Berkeley', u'url': u'https://colleges.niche.com/university-of-california----berkeley/', u'ACTRange': u'29-34', u'location': {u'lat': 37.8715, u'lng': -122.26}, u'guid': u'09E8CD9A-F401-4C8B-A79C-F02E10AC0201', u'type': u'College'}, {u'reviewCount': 3382, u'netPrice': 18398, u'reviewAvg': 3.8793613246599645, u'totalStudents': 23685, u'grade': 4.33, u'tagline': u'4 Year · Blacksburg, VA', u'SATRange': u'1110-1320', u'label': u'Virginia Tech', u'url': u'https://colleges.niche.com/virginia-tech/', u'ACTRange': None, u'location': {u'lat': 37.2286, u'lng': -80.4233}, u'guid': u'EEB0E829-996A-45B1-9671-3EF4AF096423', u'type': u'College'}, {u'reviewCount': 2138, u'netPrice': 10936, u'reviewAvg': 3.7787652011225443, u'totalStudents': 17570, u'grade': 4.33, u'tagline': u'4 Year · Chapel Hill, NC', u'SATRange': u'1220-1420', u'label': u'University of North Carolina at Chapel Hill', u'url': u'https://colleges.niche.com/university-of-north-carolina-at-chapel-hill/', u'ACTRange': u'28-32', u'location': {u'lat': 35.9122, u'lng': -79.051}, u'guid': u'5712B0C1-3A40-4EA1-A324-9C4F76FEFD10', u'type': u'College'}, {u'reviewCount': 110, u'netPrice': None, u'reviewAvg': 3.8545454545454545, u'totalStudents': 586, u'grade': 4.33, u'tagline': u'4 Year · Galveston, TX', u'SATRange': None, u'label': u'University of Texas - Medical Branch at Galveston', u'url': u'https://colleges.niche.com/university-of-texas----medical-branch-at-galveston/', u'ACTRange': None, u'location': {u'lat': 29.3113, u'lng': -94.7764}, u'guid': u'5FEEDB69-A566-4671-B821-28304A74F474', u'type': u'College'}, {u'reviewCount': 264, u'netPrice': 22457, u'reviewAvg': 3.8333333333333335, u'totalStudents': 1770, u'grade': 4.33, u'tagline': u'4 Year · Davidson, NC', u'SATRange': u'1230-1440', u'label': u'Davidson College', u'url': u'https://colleges.niche.com/davidson-college/', u'ACTRange': u'28-32', u'location': {u'lat': 35.5, u'lng': -80.8452}, u'guid': u'1AD50A05-6325-4392-B428-A08C944E61EF', u'type': u'College'}], u'page': 1, u'pageSize': 25, u'pageCount': 40}
    

    Which probably includes dynamically created content that you would not get in the source returned.

    For the reviews url https://colleges.niche.com/williams-college/reviews, you need to parse a token from the source then do a post exactly like before:

    import requests
    import re
    
    patt = re.compile('"entityGuid":"(.*?)"')
    url = "https://colleges.niche.com/williams-college/reviews/"
    soup = BeautifulSoup(requests.get(url).content)
    data_tag = patt.search(soup.select_one("#dataLayerTag").text).group(1)
    params = {"e": data_tag, "page": 2, "limit": "20"}
    url = "https://niche.com/api/entity-reviews/"
    resp = requests.get(url, params=params)
    print(resp.json())
    

    Which gives you:

    {u'reviews': [{u'body': u'I enjoy being in classes here, but the work gets overwhelming. People are great but very cliquy.', u'rating': 4, u'guid': u'35b6faeb-95b2-4385-b3ee-19e6c7984e1b', u'created': u'2016-04-20T22:24:56Z', u'author': u'College Sophomore'}, {u'body': u'The alumni network is great. Easy to use. But the career center sucks.', u'rating': 4, u'guid': u'beddcae1-d860-4a8a-a431-45bf7e7087e6', u'created': u'2016-04-20T22:24:56Z', u'author': u'College Sophomore'}, {u'body': u"It's hard for sophomores to get good housing. Even as a senior, the good housings are far away from campus. But almost everyone has singles, even freshman.", u'rating': 3, u'guid': u'fff99560-0b4f-499d-a95b-7b3b3f9826f0', u'created': u'2016-04-20T22:19:27Z', u'author': u'College Sophomore'}, {u'body': u"We don't have greek life.", u'rating': 1, u'guid': u'69e60cf0-ff3c-4b34-acf1-6315d878c205', u'created': u'2016-04-20T22:17:35Z', u'author': u'College Sophomore'}, {u'body': u"There's not a lot of team spirit here. Athletes are nice, but they tend to hang among themselves.", u'rating': 3, u'guid': u'b31ee366-1b68-4c0f-b262-ff628243887c', u'created': u'2016-04-20T22:17:02Z', u'author': u'College Sophomore'}, {u'body': u'Williams offer a lot of chances to study abroad, but the social scene is very very limited.', u'rating': 4, u'guid': u'11a3feb2-21fa-45d9-8ee0-e6e1e8cea0c0', u'created': u'2016-04-20T22:15:35Z', u'author': u'College Sophomore'}, {u'body': u"Most people will live on campus all four years. It's not a bad deal!", u'rating': 4, u'guid': u'4a845124-7cfd-4059-8d63-cb1d414ce0cc', u'created': u'2016-04-08T13:58:30Z', u'author': u'College Senior'}, {u'body': u'The facilities have everything you could need as a varsity or non-varsity athlete. With our new football/lacrosse field and track, we have it made! Still, with an active there is always competition for prime field time, and IM sports are relegated either to early/late hours or ungroomed fields.', u'rating': 4, u'guid': u'31c89c4d-91ee-4b92-a198-3e12c304d7e1', u'created': u'2016-04-08T13:55:12Z', u'author': u'College Senior'}, {u'body': u'I have loved my time at Williams! The best part of my experience has been the people here, and as a senior trying to figure out post graduate plans, I am comforted by the willingness to help and commitment to the College from alumni. Go Ephs!', u'rating': 4, u'guid': u'4458ed87-4183-4784-908a-6ae67582e82c', u'created': u'2016-04-08T13:51:51Z', u'author': u'College Senior'}, {u'body': u'Could be better but overall good.', u'rating': 4, u'guid': u'08327955-2698-4fe6-ac1f-13108327cc21', u'created': u'2016-01-01T22:51:16Z', u'author': u'College Junior'}, {u'body': u'Better this year than past years.', u'rating': 3, u'guid': u'1892de02-eb45-42b5-b728-34912499e5eb', u'created': u'2016-01-01T22:43:54Z', u'author': u'College Junior'}, {u'body': u'Could have better facilities. Otherwise, great.', u'rating': 4, u'guid': u'2dc48cb2-d21f-4fd6-a9c7-19a5e513e6d6', u'created': u'2016-01-01T22:40:45Z', u'author': u'College Junior'}, {u'body': u'Awesome experience. Very community-oriented school. I love this place. Great people. Everyone wants to help you, the professors are amazing.', u'rating': 5, u'guid': u'5fa28a31-9391-4db7-b70d-5e2aa58708b3', u'created': u'2016-01-01T22:39:06Z', u'author': u'College Junior'}, {u'body': u"Williams has been the perfect place for me. My professors have been incredible mentors--I've gone to three professors' houses for dinner. The location is beautiful, and perfect for focusing on academics. I've been able to get very involved in all my clubs and really find what makes me passionate. But best of all is the people. They're all smart and talented and wonderful. I am so lucky.", u'rating': 5, u'guid': u'81ff499b-4721-4625-bee1-acf1e9b21916', u'created': u'2015-08-25T13:08:28Z', u'author': u'College Junior'}, {u'body': u"I don't know much, only seniors can live off campus.", u'rating': 3, u'guid': u'd9dc2e2f-a08d-4a01-8fe2-410623f93d7a', u'created': u'2015-04-27T19:31:06Z', u'author': u'College Freshman'}, {u'body': u"Everything closes really early, but there's some good food. No chains really.", u'rating': 3, u'guid': u'5993a99e-a936-40c8-ae0d-4581c8d089ef', u'created': u'2015-04-27T19:30:01Z', u'author': u'College Freshman'}, {u'body': u"It's kind of sad. There's never more than a handful of things happening on fridays or satudays and there's nothing for the rest of the week", u'rating': 3, u'guid': u'65c83983-2f6f-4b08-b870-06c35fd2b0e9', u'created': u'2015-04-27T19:27:34Z', u'author': u'College Freshman'}, {u'body': u"Having visitors is pretty easy. One of the officers is the worst but otherwise they're generally lenient about weed and alcohol.", u'rating': 4, u'guid': u'bcd95788-22b7-4a23-b942-2493206d1734', u'created': u'2015-04-27T19:21:34Z', u'author': u'College Freshman'}, {u'body': u"They usually give you a good package, but a lot of it is work-study and students don't have the free time for that here.", u'rating': 3, u'guid': u'1a87483c-952c-479b-9a57-65fb09895e75', u'created': u'2015-04-27T19:19:35Z', u'author': u'College Freshman'}, {u'body': u"Food is kind of repetitive. Pretty much all the kitchens are very wasteful. We can't use meal plans anywhere off campus.", u'rating': 3, u'guid': u'361b725f-bedc-4452-843d-5dc284c18dcd', u'created': u'2015-04-27T19:17:22Z', u'author': u'College Freshman'}], u'total': 246, u'limit': 20, u'page': 2}
    

    You should be able to figure that rest out yourself based on the other parts to the answer.

  • Solution 2:

    BeautifulSoup is an HTML parser, not a web browser, it can't navigation or download pages. For that you'd typically use an HTTP library like urllib or request to fetch the HTML from a particular URL in order to feed it to BeautifulSoup. In your case, mechanize could be used to do this.

    Unfortunately, the HTML supplied from your pagination button isn't a link, so it doesn't have an href attribute. If it did, you'd be easily able to parse the URL from it and tell your HTTP library to go fetch it.

    Instead, you'll need to use mechanize to simulate a click event on that button, wait a short amount of time, then assume that the new page has loaded and then pass the resulting HTML to BeautifulSoup.

    Solution 3:

    If the "next page" involves javascript, then yes, you can only mechanize. You can do it with selenium

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    
    client = webbrowser.get('firefox')
    browser = webdriver.Chrome('./chromedriver')
    
    url = "www.example.com"
    browser.get(url)
    ###### Wait until you see some element that signals the page is completely loaded
    WebDriverWait(browser, timeout=10).until(lambda x: x.find_element_by_class_name('Even'))
    
    ############## do your things with the first page
    content =  browser.page_source.encode('ascii','ignore').decode("utf-8")
    
    
    #### Now if you are sure there is next page
    next_button_class = 'icon-arrowright-thin--pagination'###here insert the class of 'next button'
    browser.find_element_by_class_name(next_button_class).click()
    time.sleep(3)
    
    ###### Wait until you see some element that signals the page is completely loaded
    WebDriverWait(browser, timeout=10).until(lambda x: x.find_element_by_class_name('Even'))
    
    content =  browser.page_source.encode('ascii','ignore').decode("utf-8")
    

    Post a Comment for "How To Go To Next Page Using Beautiful Soup?"