Search code examples
regextwitteryahoo-pipes

Regex affecting all URLs other than a specific domain in Yahoo Pipes


While working with a Twitter Search RSS feed in Yahoo Pipes, I'm trying to clean up long Twitter links and replace them with their shortened versions. To that effect I want to match any link text that is NOT on a Twitter domain. Usually, those are t.co links.

Here's an example of what I want to do:

turn

<a href="http://t.co/AiyTQKaAoU">http://www.denverpost.com/environment/ci_26064841/colorado-coal-mine-mulls-appeal-after-federal-court ...</a>

into

<a href="http://t.co/AiyTQKaAoU">http://t.co/AiyTQKaAoU</a>

My regex started as <a .*?href=['""](.+?)['""].*?>(.+?)</a> which matched all links.

Then I tried <a .*?href=['""]!(www\.twitter\.com\/?)['""].*?>(.+?)</a> to remove twitter.com from the results, but it's not working. What I doing wrong?

P.S. I need to not touch Twitter links because that will mess up all '@' and '#' links.

Addition: Solution by @Avinash-Raj works in the demo but not inside the Yahoo Pipe. Anyone familiar with regex inside Yahoo Pipes?


Solution

  • In Yahoo Pipes, something like this should do:

    • pattern: href="(http://t.co[^"]*)"[^>]*>http://[^<]*
    • replacement: href="$1">$1

    Here's a demo pipe, and here's another, based on your pipe.

    PS: you know you can put multiple regex replacements in a single Regex operator. It's easier to read that way.