I'm trying to make a script which checks if someone has been to this school, using their alumni directory. (You can use the name Francois Hollande (current French president) to see how it behaves)
As far as I can tell (for the "name" ("nom") button) I need to access this part of the HTML code :
<div class="annuaireRecherche-v2" style="">
<span>Nom</span>
<div class="select2-container select2-allowclear autocomplete" id="s2id_PersonneNom" style="min-width: 0;">
<a href="javascript:void(0)" class="select2-choice" tabindex="-1">
<span class="select2-chosen" id="select2-chosen-3">Derez</span>
<abbr class="select2-search-choice-close"></abbr>
<span class="select2-arrow" role="presentation">
<b role="presentation"></b>
</span>
</a>
<label for="s2id_autogen3" class="select2-offscreen"></label>
<input class="select2-focusser select2-offscreen" type="text" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-3" id="s2id_autogen3" tabindex="0">
</div>
<input type="hidden" name="PersonneNom" id="PersonneNom" class="autocomplete" style="min-width: 0px; display: none;" data-placeholder="Saisir un nom" data-multiple="" data-libelle="" value="Hollande" data-limit="" tabindex="-1" title="">
</div>
and in the last <input>
tag, change the value
attribute to the name I want to check.
Then I'll have to "click" somehow on the afficher les résultats
(translation : show results
) on the right. Relevant HTML code :
<div class="showResultsButton" style="text-align: center; display: block;">
<a href="#" class="jqueryButton ui-button ui-widget ui-state-default ui-corner-all ui-button-text-icon-primary" onclick="showResultList($('.shortResults')); return false;" role="button">
<span class="ui-button-icon-primary ui-icon ui-icon-search"></span>
<span class="ui-button-text">
Afficher les résultats
</span>
</a>
</div>
And then I'll have to get to the <div class="people clearfix">
tag and retrieve the <a href="...">
tags :
<div class="people clearfix">
<div class="tab_result" style="clear:both">
<div class="ppl">
<div class="ppl-wrap clearfix" style="clear:both">
<div class="ppl-image">
<a href="/profil/francois.hollande74" target="_blank">
<img alt="" src="/ressources/temp/100_120t121_153006959_inconnu.jpeg">
</a>
</div>
<div class="ppl-content">
<h3>
<a href="/profil/francois.hollande74" target="_blank">Hollande François</a>
</h3>
<p class="meta">D Service Public Promo 1974</p>
<p></p>
</div>
<div class="ppl-content" style="float:right"></div>
</div>
<p class="buttons">
<a class="button " href="/profil/francois.hollande74" target="_blank">
Voir le profil
</a>
</p>
</div>
</div>
</div>
Here is my code so far :
library(XML)
library(httr)
library(foreach)
url <- "http://www.sciences-po.asso.fr/gene/main.php?base=1244"
response <- GET(url)
doc <- content(response, type="text/html", encoding = 'ISO-8859-1')
parseddoc <- htmlParse(doc)
# i have to modify the content of this
xpathApply(parseddoc, "//*[@id='PersonneNom']/@value")
# then make sure it is sent to the server, retrieve the code sent back, etcaetera...
Thanks for any help you can give.
In case anyone would stumble upon this question, I found two other packages to crawl websites : rvest
and RSelenium
. I went with RSelenium
as it seemed to be the most straight forward : it opens your browser and you can see live what you're code is doing on the webpage.
Moreover, here are two links that I found very useful, the second one being a good intro to RSelenium
:
http://ikkyle.com/webscraping_with_r.html
https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r