I am trying to get the number of citations for a specific profile from Google Scholar. I use python and BeautifulSoup.
These elements are in the table citations indices. The code that I use returns only nine elements while there are more elements with the same tag when you click on the graph.
What's the problem?
from urllib import urlopen
from bs4 import BeautifulSoup
from lista_url import*
url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el'#profile- scholar
webpage=urlopen(url)
soup=BeautifulSoup(webpage)
for t in soup.findAll('span',{"class":"gsc_g_al"}):
a=t.text
print a
The larger citations table you appear to be looking for is loaded asynchronously using JavaScript (an AJAX request). You'll have to do this in your own code.
The URL for the AJAX request simply adds a view_op=citations_histogram
parameter:
url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el&view_op=citations_histogram'
This produces 24 entries:
>>> url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el&view_op=citations_histogram'
>>> webpage=urlopen(url)
>>> soup=BeautifulSoup(webpage)
>>> len(soup.find_all('span', class_='gsc_g_al'))
24
>>> [el.string for el in soup.find_all('span', class_='gsc_g_al')]
[u'2', u'5', u'1', u'4', u'9', u'6', u'2', u'2', u'2', u'7', u'23', u'15', u'21', u'12', u'26', u'20', u'38', u'32', u'6', u'38', u'38', u'39', u'87', u'10']
>>> [el.string for el in soup.find_all('span', class_='gsc_g_t')]
[u'1992', u'1993', u'1994', u'1995', u'1996', u'1997', u'1998', u'1999', u'2000', u'2001', u'2002', u'2003', u'2004', u'2005', u'2006', u'2007', u'2008', u'2009', u'2010', u'2011', u'2012', u'2013', u'2014', u'2015']