Regex Golf

I was going to name this post Vim Golf, but it turned out to be more of a regex experiment than vim keystrokes.

So I found this cool ifttt recipe that logs all of your tweets to a dropbox file. It puts them all in a single file with a format of:

At a drive in. Wat?  @ Valle Drive-In
Jun 30, 2012
- - - - -

I see blurry apps
Jun 30, 2012
- - - - -

Since the twitter api only allows you to get to 3200 of your tweets I though it would be a good idea to get the rest of the tweets that are accessible to me in there as well. The quickest way I know to get my latest tweets is at Allmytweets will pull down all your tweets and then show them on their page. The html ends up looking like:

8"><img src="css/extlink.png"></a></li><li>Straw in the wro
ng hole.  @ High Life Lounge <a href=""></a> <span class="created_at">Jun 12, 20
12</span> <a href="!/darrinholst/status
/212644435689881600"><img src="css/extlink.png"></a></li><li
>I can see @dwolla from here! <span class="created_at">Jun 1
2, 2012</span> <a href="!/darrinholst/s
tatus/212639918541910016"><img src="css/extlink.png"></a></l
i><li>Simplify Design With  Zero, One, Many rules <a href="h
ttp://"></a> <span class="c

All the data is there, but not very useful for a txt file. Here’s the commands I threw at it to format it up:

Split li tags up to separate lines


Preserve new lines in the tweets with a token


Change new line tokens in between li tags back to new lines


Reverse the order of the tweets, I want them in chronological order (The only non-replace command)


Remove li start tags


Replace li end tags with a separator

%s/<\/li>/\r- - - - -\r/g

Turn new line tags back into new lines


Get rid of the start span tag for the date

%s/<span class="created_at">/\r/g

Get rid of the end span tag and tweet link start tag. Also change from https to http

%s/<\/span> <a href="https/\rhttp/g

Get rid of those stupid #!s


Get rid of the image from

%s/"><img src="css\/extlink.png"><\/a>//g

HTML decode

%s/&nbsp;/ /g

Get rid of remaining html

%s/<a href="//g

Clean up trailing spaces

%s/ *$//g

EDITING TEXT IS FUHHHH UHN! I wish I would have though of this 3800 tweets ago though.