Search code examples
jqueryhtmlajaxyqlyahoo-api

Extracting HTML from XML or JSON output using yahoo's HTMLSTRING and get some details from xml or json output


In my application I'm going to use Yahoo YQL's htmlstring to extract html from a website from xml or json output I get.

Ex XML output: https://query.yahooapis.com/v1/public/yql?q=select%20%2A%20from%20htmlstring%20where%20url%3D%27http%3A%2F%2Fstackoverflow.com%2F%27&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys

EX JSON output: https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20htmlstring%20where%20url%3D%22http%3A%2F%2Fstackoverflow.com%2F%22&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=

Reason I'm doing this is to get its property="og:image", property="og:title" & property="og:image".

Currently I'm doing it by doing:

XML OUTPUT:

  $(function () {
      var query;
      var apiUrl;      
      $("button.click").click(function () {
          
          apiUrl = "https://query.yahooapis.com/v1/public/yql?q=select * from htmlstring where url='http://stackoverflow.com/'&diagnostics=true&env=store://datatables.org/alltableswithkeys";

          $('p.extract').toggle();
          $.get(apiUrl, function(data) {
          	$('p.extract').addClass('none');
            var html = $(data).find('html');
            $("input.title" ).val(html.find("meta[property='og:title']").attr('content') || 'no description found');
           	 $("textarea.description").val(html.find("meta[property='og:description']").attr('content') || 'no title found');
            $("input.image").val(html.find("meta[property='og:image']").attr('content') || 'no image found');

      });

  });
    });
input {
    width: 100%;
    margin-bottom: 20px;
    padding: 10px;
}

.none{display:none;}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js"></script>
<button class="click">Click Me</button>
<br>
<p class="extract" style="display:none;">Extracting html</p>
<input type="text" class="title">
<br>
<textarea name="" id="" cols="30" rows="5" class="description"></textarea>
<br>
<input type="text" class="image">

JSON OUTPUT:

  $(function () {

      var apiUrl;

      $("button.click").click(function () {
         
          apiUrl = "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20htmlstring%20where%20url%3D%22http%3A%2F%2Fstackoverflow.com%2F%22&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=";


          $('p.extract').toggle();
          $.get(apiUrl, function(data) {
          	$('p.extract').addClass('none');
            var html = $(data).find('html');
            $("input.title" ).val(html.find("meta[property='og:title']").attr('content') || 'no description found');
           	 $("textarea.description").val(html.find("meta[property='og:description']").attr('content') || 'no title found');
            $("input.image").val(html.find("meta[property='og:image']").attr('content') || 'no image found');

      });

  });
    });
input {
    width: 100%;
    margin-bottom: 20px;
    padding: 10px;
}

.none{display:none;}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<button class="click">Click Me</button>
<br>
<p class="extract" style="display:none;">Extracting html</p>
<input type="text" class="title">
<br>
<textarea name="" id="" cols="30" rows="5" class="description"></textarea>
<br>
<input type="text" class="image">

What Im doing currently is not giving the details I want and I get nothing found even I can see them inside the output.

Any help is appreciated as I don't know what I'm doing wrong.


Solution

  • Since I could't truest Yahoo's anymore and they might stop other apis host by them, I went for a server a side solution built in my application.

    My application is based on Ruby on Rails and I used Nokogiri and a ajax call to server when a link is submitted for showing realtime results.