I am trying to create the means for my mom to be notified when the Diet Rite drink, that she buys at Stater Bros, goes on sale. My thinking is that I could create a Yahoo Pipe that takes in the weekly ad feed and filters for string Diet Rite. The Yahoo Pipe is itself an RSS feed and, thus, I would deliver the Pipe to the Google Reader and thus my mom would know if Stater Bros is having a sale.
Seeing how the Stater Bros has a searchable PDF of their weekly ad, I thought it would be a simple matter of having Yahoo Pipes search through it. However, Yahoo Pipes does not support PDF.
I then decided to pipe the PDF through an online PDF to HTML converter and feed that to Fetch Page
module in Yahoo Pipes. And in fact, the converter was quite successful, in that the resulting HTML preserved the text and I could search it and find what I needed. However, it turns out that it spits its data out in frames - thus I can't use it. I can't find any other online PDF to HTML converters.
Even if I was able to get the HTML of the PDF into Yahoo Pipes, I am not sure that would do any good, since Yahoo Pipes doesn't provide the means to search/filter HTML. It mostly works on feeds.
So I am stuck. Any ideas on how to achieve what I am trying to do?
If you are not using it allready then you may want to look at Googles caching system ..
http://webcache.googleusercontent.com/search?q=cache:http://www.staterbros.com/Images/PDFs/weekly.aspx
It's not widely known outside SEO circles but the Googlebot does actually perform a crude PDF to HTML & text conversion. If you cannot wait for Google to convert the PDF file there are also a couple of free PHP scripts that can perform the same function.
Because there is no equivelent of 'Preg_match' for pipes you have to work backwards, by removing what is NOT what you are looking for.
The regex for the replace module looks something like this ... ^(.+?)Diet Rite(.+?)$ Replace everything from the start of the string up to 'Diet Rite' with nothing Then replace everything after 'Diet Rite' to the end of the string with nothing..
Therefore if 'Diet Rite' exists on the page it will show up in the pipe and can be added to an RSS feed otherwise the pipe returns a blank.