Automatic Data Compression With DBIx::Class::CompressColumns
Hi - you seem to be new here. If you like what you see, please give back by subscribing to my RSS feed!
You can check me out on Twitter, Facebook, or FriendFeed to see what I'm up to. Thanks for visiting!
I also consult, and am open to full or part-time work. If you are interested, please contact me - check out our services at http://staynalive.com/consulting
UPDATE: You can now get the DBIx::Class::CompressColumns module on CPAN here or via CPAN command-line shell.
I’m going to get geeky on you for a minute, but you should find this interesting. One of the challenges I’ve had with SocialToo recently has been the massive Social Graph data we’ve had to story and process and track. We cache a lot of the data so we don’t have to hit Twitter’s servers as often, and also to enable us to track new follows and unfollows regularly on behalf of our users.
If you are a SocialToo user you may have noticed that your data hasn’t been as accurate lately as it should. The reason for that is we have had a) 20,000+ users all wanting to auto-follow or have their follower base tracked, and b) all 20,000+ of those users have anywhere from 100 to near 1 million followers that we have to store and process. It’s not an easy task! And our database, set up in a relational manner of followers to users, just wasn’t cutting it in regards to being able to retrieve and process so many followers at a time.
So I took a cue from Bret Taylor and FriendFeed, who talks about how they denormalized their database, and now reference “bags” of data that they can then process in their code. I went for a hybrid model, and with each user entry I now have a single column on that table we reference, in BLOB format, which contains all the social graph data for that user. In Perl, I simply create a hash structure of the data, freeze it, and then store it in the database in our social graph column. To retrieve it, we pull it from the database, thaw it, and we have an entire social graph we can play around with and do with as we please.
The issue I was running into however is that plain text, stored in a single column, for a user with 1 million followers, gets to be quite a large amount of data we need to pull through the pipes. I needed an easy way to compress the data before inserting into the database, storing it in binary format, and decompress. I also wanted it to be automatic, so no coder would ever have to worry about this extra step - it would just happen magically.
So today I’m releasing DBIx::Class::CompressColumns for all you Perl coders out there. What this module does is it sits on top of Perl’s DBIx::Class database abstraction libraries and allows you to monitor a single column. Any inserts or updates into that column get compressed in Zlib format, and any selects/get_column calls to that data (you must use get_column) get de-compressed, meaning you don’t have to worry at all about that extra step, the data is a significantly smaller footprint, and your throughput is much less, causing much less load on the database. For one-million followers, I measured just 4 Megabytes in space taken that has to go in and out of MySQL.
Approaching Graph optimization in this manner has significantly sped up our processes, and I’m already seeing huge benefits from it. There is much less load on the database, it’s much faster to retrieve and process the data, and we’re getting through our users’ followers much faster now.
The module namespace is currently being applied for on CPAN at the moment, and I’ll post a link there as soon as it is approved, but for now you can download the Makefile-compatible gzipped library here. I hope some of you find this useful, and please feel free to modify or send me any updates or bugs you think I missed!
The link for the download is http://socialtoo.com/DBIx-Class-CompressColumns-0.01000.tar.gz
Oh, and TMTOWTDI so please if you have better ways of approaching this I’d love to hear your ideas!
Photo courtesy rp72

Just yesterday, Facebook
(Sorry it’s been awhile since my last blog - it took me several days to figure out how to get my Flip video imported and exported to and from iMovie. To make a long story short, if you want to export from iMovie and have both picture and sound, you must import your source as something other than MP4 or AVI.)
I’ve been analyzing various Social Applications Analytics tools lately, and have recently stumbled upon Sometrics. Sometrics handles full Analytics for your Facebook, Bebo, and MySpace applications, and will actually utilize the Facebook API to retrieve demographic info about those visiting your Application. As I examine the other Analytics solutions for Facebook and other Social Network Applications, I’ll try to post my findings of their strengths and weaknesses here, OpenSocialNow, and FacebookAdvice.com. If you’re not a techie, you may want to skip the next part, or forward it onto your IT department.
I’ve been following the
Okay, so
Jesse consults with his business, Stay N' Alive Productions, LLC, and runs a social relationship management company called 

