Search code examples
phpparsingmp3

How does OffLiberty.com parse links to get files?


Anybody any idea how they do it? I currently use OffLiberty.com to parse Mixcloud links to get the raw MP3 URL for use in a custom HTML5 player for iOS compatibility, I was just wondering if anyone knew how exactly their process works, so I could create something similar that would 'cut out the middleman' so to speak, so my end-user wouldn't have to go to an external site to get a link to the MP3 for the mix they want to post. Just a thought really, not terribly important if it couldn't be done, but it would be a nice touch :)

Anybody any idea?


Solution

  • Note that I'm against content scraping and you should ask those website permission to scrap their MP3 URLs. Else, if I was them, I'd block you right now and ad vitam æternam.

    Anyway, you can parse its HTML using DOMDocument.

    For example :

    <?php
    // just so you don't see parse errors
    $internal_errors = libxml_use_internal_errors(true);     
    // initialize the document
    $doc = new DomDocument();
    // load a page
    $doc->loadHTMLFile('http://www.mixcloud.com/LaidBackRadio/le-motel-on-the-road/');
    // initialize XPATH for the document
    $xpath = new DomXPath($doc);
    // span with "data-preview-url" seems to contain MP3 url
    // we request them inside a DomNodeList http://www.php.net/manual/en/class.domnodelist.php
    $mp3 = $xpath->query('//span[@data-preview-url]');
    
    foreach($mp3 as $m){
          // we print the attribute value 
        echo $m->attributes->getNamedItem('data-preview-url')->nodeValue . '<br/>';
    }
    libxml_use_internal_errors($internal_errors);