Search code examples
htmlrubyweb-scrapingnokogiricurb

How to get Nokogiri to scrape text from span in Ruby


I'm trying to scrape information from a website using Nokogiri and Curb, but I can't seem to find the right name/ to find where to scrape. I'm trying to scrape the API key, which is at the bottom of the HTML code as "xxxxxxx".

The HTML code is:

    <body class="html not-front logged-in no-sidebars page-app page-app- page-app-8383900 page-app-keys i18n-en" data-twttr-rendered="true">

    <div id="skip-link"></div>
    <div id="page-wrapper">
        <!--

         Code for the global nav 

        -->
        <nav id="globalnav" class="without-subnav"></nav>
        <nav id="subnav"></nav>
        <section id="hero" class="hero-short"></section>

<section id="gaz-content">

    <div class="container">
        ::before
        <div id="messages"></div>
        <div id="gaz-content-wrap-outer" class="row">
            ::before
            <div id="gaz-content-wrap-inner" class="span12">
                <div class="row">
                    ::before
                    <div class="article-wrap span12">
                        <article id="gaz-content-body" class="content">
                            <header></header>
                            <div class="header-action"></div>
                            <div class="tabs"></div>

lass="d-block d-block-system g-main">

    <div class="app-details">
        <h2>

            Application Settings

        </h2>
        <div class="description"></div>
        <div class="app-settings">
            <div class="row">
                ::before
                <span class="heading">

                    Consumer Key (API Key)

                </span>
                <span>

                    xxxxxxxxx

                </span>

All I can seem to get is the "content" text.

My code looks like:

consumer = html.at("#gaz-content-body")['class']
puts consumer

I'm not sure what to type to select the class and/or span then the input text. All I can get is Nokogiri to put "content".


Solution

  • In this case we need to find the second span after the span class="heading", and inside the div class="app-settings" - I'm being a bit general but not too much. I'm using search instead of at to retrieve the two spans and get the second one:

    # Gets the 2 span elements under <div class='app-settings'>.
    res = html.search('#gaz-content-body .app-settings span')
    
    # Use .text to get the contents of the 2nd element.
    res[1].text.strip
    # => "xxxxxxxx"
    

    But you can also use at to target the same:

    res = html.at("#gaz-content-body .app-settings span:nth-child(2)")
    res.text.strip
    # => "xxxxxxxx"