I'm trying to create a program that will go through a bunch of tumblr photos and extract the username of the person who uploaded them.
http://www.tumblr.com/tagged/food
If you look here, you can see multiple pictures of food with multiple different uploaders. If you scroll down you will begin to see even more pictures with even more uploaders. If you right click in your browser to view the source, and search "username", however, it will only yield 10 results. Every time, no matter how far down you scroll.
Is there any way to counter this and have instead have it display the entire source for all images, or for X amount of images, or for however far you scrolled?
Here is my code to show what I'm doing:
#Imports
import requests
from bs4 import BeautifulSoup
import re
#Start of code
r = requests.get('http://www.tumblr.com/tagged/skateboard')
page = r.content
soup = BeautifulSoup(page)
soup.prettify()
arrayDiv = []
for anchor in soup.findAll("div", { "class" : "post_info" }):
anchor = str(anchor)
tempString = anchor.replace('</a>:', '')
tempString = tempString.replace('<div class="post_info">', '')
tempString = tempString.replace('</div>', '')
tempString = tempString.split('>')
newString = tempString[1]
newString = newString.strip()
arrayDiv.append(newString)
print arrayDiv
I had solved a similiar problem using beautifulsoup. what I did is looping through the paged pages. check with beautifulsoup is there is a continue element - here(in the tumbler page) for example this is an element with an id "next_page_link" if there is one I would loop the photo scraping code while changing the url fetched by requests. you would need to encapsulate all the code in a function ofcourse
good luck.