Anybody any idea how they do it? I currently use OffLiberty.com to parse Mixcloud links to get the raw MP3 URL for use in a custom HTML5 player for iOS compatibility, I was just wondering if anyone knew how exactly their process works, so I could create something similar that would 'cut out the middleman' so to speak, so my end-user wouldn't have to go to an external site to get a link to the MP3 for the mix they want to post. Just a thought really, not terribly important if it couldn't be done, but it would be a nice touch :)
Anybody any idea?
Note that I'm against content scraping and you should ask those website permission to scrap their MP3 URLs. Else, if I was them, I'd block you right now and ad vitam æternam.
Anyway, you can parse its HTML using DOMDocument
.
For example :
<?php
// just so you don't see parse errors
$internal_errors = libxml_use_internal_errors(true);
// initialize the document
$doc = new DomDocument();
// load a page
$doc->loadHTMLFile('http://www.mixcloud.com/LaidBackRadio/le-motel-on-the-road/');
// initialize XPATH for the document
$xpath = new DomXPath($doc);
// span with "data-preview-url" seems to contain MP3 url
// we request them inside a DomNodeList http://www.php.net/manual/en/class.domnodelist.php
$mp3 = $xpath->query('//span[@data-preview-url]');
foreach($mp3 as $m){
// we print the attribute value
echo $m->attributes->getNamedItem('data-preview-url')->nodeValue . '<br/>';
}
libxml_use_internal_errors($internal_errors);