I don’t get to play around with code much these days, but being around geeks that had html jokes tattooed on their arms and were running around doing cool things at SXSW made me feel lame.
So as soon as I got home and had a “wouldn’t it be cool if I could…” moment, I vowed to build it. Now what I describe below will look like driving screws with a hammer to those of you who script PHP as your mother tongue, but with a few hours squeezed in between work and kids, I decided to do this all with the shortcuts I know rather than take the time to freshen up on PHP skills or –heaven forbid– finally learn Python.

So thank you NoteTab Pro, thank you Yahoo Pipes, thank you Google Spreadsheets

which enabled me to fulfil the wish I wished:

“As a fan of the hyper-intelligent and hyper-creative folks who attend SXSW, I wish I could execute a search limited to the sites and blogs from people who attended and presented, so I will see only high-quality returns.”

The whole thing had its beginnings in a campaign goal: see what kind of buzz we managed to generate about the Green my Apple campaign with this crowd. (Answer: not bad! A surprising number of people had ALREADY blogged about the campaign before coming to SXSW.)
Here’s my homebrew recipe for creating a SXSW-flavoured search. Or you can just visit the pipe I finally made if all you want is the result.


First step, thank Jason Patton for making Conferenceer, without which none of this would be possible. Conferenceer ports all the panelist’s websites and blogs to an OPML file, along with those of everyone who registered with Conferenceer. (GREAT piece of work. Thanks, Jason!! Chapeau bas!)

Now I thought I would just be able to upload the OPML to my own server and get Yahoo Pipes to parse that file like an RSS feed. But nooooooooo. Nothing is ever so simple. Yahoo Pipes doesn’t speak OPML.

Now Chris Lott has written a little php script to get around that. But try as I might, I could not get the “link” values generated by this port of the opml to work as site limiters on a Yahoo search in Pipes, even after fixing user input errors so it validated. (www. without the http:// in about a dozen urls.) No idea why it worked not, this looked like the answer to my problem.

OK, so back to the drawing board. A further search on OPML to RSS conversion revealed this handy note from Techbrew, which pointed out that if you can port your data into a Google Spreadsheet, for Pete’s sake, you can create an RSS on the fly with a single column spreadsheet. Awesome.
Right, surely Google Spreadsheet will import structured data like OPML, right? WRONG!!! If I want to get a single column of URLs, I can’t give Google Spreadsheet XML. I can’t give it OPML. I have to give it a CSV. Now come on, Google. Shouldn’t XML imports be an obvious win?

So how am I going to strip down that OPML file to create a CSV?
OK, now here I know somebody will jump in and say there’s an easier way to do the following, but you knows what you knows, and since I stopped writing Pascal in my dark early days of self-taught code, my tool of choice for offline string manipulation is NoteTab Pro scripting. (Sneer not, Geek friend, it has never let me down yet!)
So in about five minutes I whipped up this:

;NoteTab Pro clipboard script for retrieving the htmlUrl fields from an OPML file
;and writing them to a csv file
^!Goto Process

:P rocess
^!Find “htmlUrl=” TIS
;htmlUrl= is the tag which precedes all URLs in the file
^!IfError Finish
;this breaks out the loop when the file is done
^!MoveCursor +1
;this moves the cursor focus into the position of the ” double quote
^!Set %URL%=^$GetQuoteBlock$^p
;this captures the text contained in the Doublequotes and writes it to variable ;called “URL” with a linebreak
^!AppendToFile “c:\temp\output.csv” ^%URL%
;this writes the linebreak-delimited item to the output file
^!Goto Process
;loop back to the top until you hit the EOF implied by no further “find” results

:Finish
^!Close “c:\temp\output.csv”
;Jane! Stop this Crazy thing!

And voila. Run that over the OPML and it nicely extracts all the URLs and puts them into a handy CSV file. Imported it into this here Google Spread Sheet. (sorry, you need a Google account to view it. Information doesn’t want to be unlocked by free email accounts, it wants to be FREE Thanks JR, for correcting this! And THANKS for Google Spreadsheets!)

Now that creates a handy rss feed (which IS freely available) And can be plugged into a Yahoo Pipe which Takes User input as a search term:

Then sequentially runs a site-limited search on each of the URLs in the RSS feed, aggregating as it goes.

It may not have been elegant in its making, but it gave me joy, and it works.

Share:
  • Facebook
  • Twitter
  • del.icio.us
  • Mixx
  • Google Bookmarks
  • FriendFeed
  • Digg
  • Reddit