Search code examples
rcurlhttr

Modifying html/javascript code with httr


I'm trying to make a script which checks if someone has been to this school, using their alumni directory. (You can use the name Francois Hollande (current French president) to see how it behaves)

As far as I can tell (for the "name" ("nom") button) I need to access this part of the HTML code :

<div class="annuaireRecherche-v2" style="">
  <span>Nom</span>
  <div class="select2-container select2-allowclear autocomplete" id="s2id_PersonneNom" style="min-width: 0;">
    <a href="javascript:void(0)" class="select2-choice" tabindex="-1">   
      <span class="select2-chosen" id="select2-chosen-3">Derez</span>
      <abbr class="select2-search-choice-close"></abbr>   
      <span class="select2-arrow" role="presentation">
        <b role="presentation"></b>
      </span>
    </a>
    <label for="s2id_autogen3" class="select2-offscreen"></label>
    <input class="select2-focusser select2-offscreen" type="text" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-3" id="s2id_autogen3" tabindex="0">
  </div>
  <input type="hidden" name="PersonneNom" id="PersonneNom" class="autocomplete" style="min-width: 0px; display: none;" data-placeholder="Saisir un nom" data-multiple="" data-libelle="" value="Hollande" data-limit="" tabindex="-1" title="">
</div>

and in the last <input> tag, change the value attribute to the name I want to check.

Then I'll have to "click" somehow on the afficher les résultats (translation : show results) on the right. Relevant HTML code :

<div class="showResultsButton" style="text-align: center; display: block;">
  <a href="#" class="jqueryButton  ui-button ui-widget ui-state-default ui-corner-all ui-button-text-icon-primary" onclick="showResultList($('.shortResults')); return false;" role="button">
    <span class="ui-button-icon-primary ui-icon ui-icon-search"></span>
    <span class="ui-button-text">
      Afficher les résultats
    </span>
  </a>
</div>

And then I'll have to get to the <div class="people clearfix"> tag and retrieve the <a href="..."> tags :

<div class="people clearfix">
    <div class="tab_result" style="clear:both">
        <div class="ppl">       
            <div class="ppl-wrap clearfix" style="clear:both">
                <div class="ppl-image">
                    <a href="/profil/francois.hollande74" target="_blank">
                        <img alt="" src="/ressources/temp/100_120t121_153006959_inconnu.jpeg">
                    </a>
                </div>
                <div class="ppl-content">
                    <h3>
                        <a href="/profil/francois.hollande74" target="_blank">Hollande  François</a>
                    </h3>
                    <p class="meta">D Service Public Promo 1974</p>
                    <p></p>
                </div>
                    <div class="ppl-content" style="float:right"></div>
                </div>
            <p class="buttons">
                <a class="button " href="/profil/francois.hollande74" target="_blank">
                    Voir le profil
                </a>
            </p>
        </div>
    </div>
</div>

Here is my code so far :

library(XML)
library(httr)
library(foreach)

url        <- "http://www.sciences-po.asso.fr/gene/main.php?base=1244"    
response   <- GET(url)
doc        <- content(response, type="text/html", encoding = 'ISO-8859-1')
parseddoc  <- htmlParse(doc)

# i have to modify the content of this 
xpathApply(parseddoc, "//*[@id='PersonneNom']/@value")
# then make sure it is sent to the server, retrieve the code sent back, etcaetera...

Thanks for any help you can give.


Solution

  • In case anyone would stumble upon this question, I found two other packages to crawl websites : rvest and RSelenium. I went with RSelenium as it seemed to be the most straight forward : it opens your browser and you can see live what you're code is doing on the webpage.

    Moreover, here are two links that I found very useful, the second one being a good intro to RSelenium :

    http://ikkyle.com/webscraping_with_r.html

    https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r