Let say I want to extract data from a web page with the following markup:
<table>
<tr>
<td><a href="Link 1">Column 1 Text</a></td>
<td>Column 2 Text</td>
<td>Column 3 Text</td>
</tr>
<tr>
<td><a href="Link 2">Column 1 Text</a></td>
<td>Column 2 Text</td>
<td>Column 3 Text</td>
</tr>
...
</table>
to JSON format :
[
{
link: 'Link 1',
text: 'Column 1 Text',
data: 'Column 3 Text'
},
{
link: 'Link 2',
text: 'Column 1 Text',
data: 'Column 3 Text'
}
]
Can we make it with YQL? If yes then please give me an example query.
Any helps would be appreciated!
Here's a query that's a good starting point, using the HTML table along with some XPath query (see Extracting HTML Content With XPath for more details on this technique):
select * from html where url="http://cantoni.org/test/table.html" and xpath='//table/tr'
Which produces JSON results like this:
{
"query": {
"count": 2,
"created": "2012-01-06T20:16:46Z",
"lang": "en-US",
"results": {
"tr": [
{
"td": [
{
"a": {
"href": "Link%201",
"content": "Column 1 Text"
}
},
{
"p": "Column 2 Text"
},
{
"p": "Column 3 Text"
}
]
},
{
"td": [
{
"a": {
"href": "Link%202",
"content": "Column 1 Text"
}
},
{
"p": "Column 2 Text"
},
{
"p": "Column 3 Text"
}
]
}
]
}
}
}