social graph – Stay N Alive

Twitter Announces Live Social Graph Streams

In a Keynote at Chirp by Ryan Sarver, Project Manager over the Twitter API, he announced a new, full API around live content streaming that just saved me thousands.  The new API enables a real-time layer around not just Tweets and search that they’ve enabled in the past, but now direct messages, follows, favorites, and retweets.  As users follow, direct message, or favorite, developers will now be able to pull these actions for each user in real time.

One of the biggest headaches of my own on SocialToo has been the need to constantly poll Twitter.com for new follows and unfollows.  Each request requires an entire snapshot of the user’s friends and followers, and with Twitter’s current structure, can take minutes up to even a half hour or more to pull an entire snapshot of a user’s list of friends.  This takes bandwidth, takes time, and costs money on both the developer’s servers and on Twitter’s end.

The new API will enable one request per follow, one request per DM, and the great thing about it is all of it happens as the user clicks “follow”, as the user sends the DM, and the User benefits from a real-time, live update on new follows and DMs on sites like SocialToo.com.  So, assuming developers are given access soon, you will soon be able to have real-time updates on new followers and unfollowers, as well as new, filtered DMs on sites like SocialToo.com (if you haven’t signed up go sign up today!).

I’m excited for this new announcement, and it’s something I’ve been asking the Twitter API team for awhile now.  It’s good to see Twitter finally getting the capacity to work on these requests.  I hope to continue to see work on developers’ needs like this.

Automatic Data Compression With DBIx::Class::CompressColumns

UPDATE: You can now get the DBIx::Class::CompressColumns module on CPAN here or via CPAN command-line shell.

Too Many PeopleI’m going to get geeky on you for a minute, but you should find this interesting.  One of the challenges I’ve had with SocialToo recently has been the massive Social Graph data we’ve had to story and process and track. We cache a lot of the data so we don’t have to hit Twitter’s servers as often, and also to enable us to track new follows and unfollows regularly on behalf of our users.

If you are a SocialToo user you may have noticed that your data hasn’t been as accurate lately as it should.  The reason for that is we have had a) 20,000+ users all wanting to auto-follow or have their follower base tracked, and b) all 20,000+ of those users have anywhere from 100 to near 1 million followers that we have to store and process.  It’s not an easy task!  And our database, set up in a relational manner of followers to users, just wasn’t cutting it in regards to being able to retrieve and process so many followers at a time.

So I took a cue from Bret Taylor and FriendFeed, who talks about how they denormalized their database, and now reference “bags” of data that they can then process in their code.  I went for a hybrid model, and with each user entry I now have a single column on that table we reference, in BLOB format, which contains all the social graph data for that user.  In Perl, I simply create a hash structure of the data, freeze it, and then store it in the database in our social graph column.  To retrieve it, we pull it from the database, thaw it, and we have an entire social graph we can play around with and do with as we please.

The issue I was running into however is that plain text, stored in a single column, for a user with 1 million followers, gets to be quite a large amount of data we need to pull through the pipes.  I needed an easy way to compress the data before inserting into the database, storing it in binary format, and decompress.  I also wanted it to be automatic, so no coder would ever have to worry about this extra step – it would just happen magically.

So today I’m releasing DBIx::Class::CompressColumns for all you Perl coders out there.  What this module does is it sits on top of Perl’s DBIx::Class database abstraction libraries and allows you to monitor a single column.  Any inserts or updates into that column get compressed in Zlib format, and any selects/get_column calls to that data (you must use get_column) get de-compressed, meaning you don’t have to worry at all about that extra step, the data is a significantly smaller footprint, and your throughput is much less, causing much less load on the database.  For one-million followers, I measured just 4 Megabytes in space taken that has to go in and out of MySQL.

Approaching Graph optimization in this manner has significantly sped up our processes, and I’m already seeing huge benefits from it.  There is much less load on the database, it’s much faster to retrieve and process the data, and we’re getting through our users’ followers much faster now.

The module namespace is currently being applied for on CPAN at the moment, and I’ll post a link there as soon as it is approved, but for now you can download the Makefile-compatible gzipped library here.  I hope some of you find this useful, and please feel free to modify or send me any updates or bugs you think I missed!

The link for the download is http://socialtoo.com/DBIx-Class-CompressColumns-0.01000.tar.gz

Oh, and TMTOWTDI so please if you have better ways of approaching this I’d love to hear your ideas!

Photo courtesy rp72

Social Coding Series: I’m In Your Social Graph, Hacking Your Life – a Howto

As the first entry to my Social Coding series I’m going to cover Google’s Social Graph API. I saw a demo of this at Google I/O in San Francisco and was so impressed that I immediately started hacking on it when I got home. Little did I know how powerful this API was and how much information it could pull off the web about a single individual!

Google’s Social Graph API takes a cache of the rich storage of links, information, and URLs on Google’s servers, and determines which of those contain information about actual people. It combines OpenID for confirming an individual’s identity, and XFN and FOAF XML protocols to determine links between those identities. With a simple tag on a user’s website, a user can determine other websites that also identify them. If you link to one URL identifying that location as you, and at the linked website, it links back to you, Google can tell for sure both of those websites are yours, and identify you as a person. Not only that, but you can similarly provide XFN information or FOAF information via similar tags or a separately linked file identifying who your friends are. If they link back to you via similar metadata Google can tell for sure that the two of you are friends.

The Social Graph API lives and breaths this data. There are actually quite a few Social networks that use this protocol to identify you and your friends. Sites like Digg, Twitter, and FriendFeed all utilize these protocols to identify your friends. The Google Social Graph API scans this data and organizes it in an easy way for you, as a developer, to access.

Let’s try a simple example, and you don’t even have to be a developer to try it. Google has provided a simple playground to see how the Social Graph API works. If you go to http://socialgraph-resources.googlecode.com/svn/trunk/samples/exploreapi.html, enter in a few URLs of your blogs, social networking profiles, and other identifying locations on the web, leave “Follow ‘me’ Links”, “Pretty Output” checked, and click, “Find connections”. For me, just “twitter.com/jessestay” was all I needed to enter in the textarea.

The resulting structure is organized in a format called JSON – if you’re a Perl developer you might be familiar with this, as it is formatted the same way as a Perl Hash structure. You’ll see under “nodes” a bunch of URLs with different metadata about the URL – these are URLs that Google thinks, based on the metadata in the URL you provided, are you or contain info about you. I’ve found that only those with a “profile” attribute are actual Social Network profiles for yourself, so be sure to pay attention to those.

You can also go back and click “show inbound links” and “show outbound links” – this will then return URLs with links to sites you have identified as yourself, as well as sites you own that claim other sites as identifying for you. Play around with it – there’s a wealth of information it will give you about people!

Now, if you’re not a developer, you can skip over this next section because I’m going to get technical by showing an example. I’m a Perl developer so I’ll show one in Perl.

In Perl it’s simple – you need to install Net::SocialGraph with a command similar to this:

perl -MCPAN -e “install Net::SocialGraph”

Then, a bit of code like this will give you the data you need:

my $sg = Net::SocialGraph->new(‘fme’ => 1);

my @urls = ();
push (@urls,’http://twitter.com/jessestay’);
push (@urls,’http://facebook.com/profile.php?id=683545112′);

my $res = $sg->get(@urls);
my @profiles = ();
foreach my $node (keys %{$res->{‘nodes’}}) {
  if ($res->{‘nodes’}->{$node}->{‘attributes’}->{‘profile’}) {
    push (@profiles, $res->{‘nodes’}->{$node}->{‘attributes’}->{‘profile’});
  }
}

In the above example I instanciate my $sg object, telling it to follow “me” attributes in the response. I add a couple URLs to identify the individual I want profile information for (in this case, me), and then make the call to the SocialGraph API to go get my info based on those URLs with the “get” method provided by the API. Then, I just traverse the response and I can do whatever I want with it. After this, I could take the response information and list all of the user’s profiles as links, or perhaps I could scan those profiles for more information and provide information about each identified profile. You’ll also note that it’s not always correct so you’ll want to let the user intervene. Also, note I’m looking for only links with a “profile” attribute – I’ve found these to be most accurate.

Beyond that, that’s it. Ideally, you could take the Playground example above and look at the resulting URL. The basics of the Social Graph API are just that URL – plug in whatever you want and you’ll get back whatever information you need. You could then parse it with Javascript, Perl, PHP, or just leave it in the “pretty” format the Playground provides you by default.

Now, imagine taking that data and combining it with, say the Twitter API to pull out all of an individual’s friends on Twitter, then applying the Social Graph API to each of those individuals. Soon, you have a tool which can identify which of a user’s friends are on which networks, and if there are any of your friends you have not yet added on those networks. This API is powerful!

The Social Graph API can be an excellent utility to find out more information about any individual using your applications. No longer do you have to ask the individual for that information – so long as they are active on Web 2.0 that information can be provided for them to choose from!

You can learn more about the Social Graph API here.

Please note I too am new to this API – any inaccuracies in this document please let me know in the comments and I will correct them for others to benefit.

[youtube https://www.youtube.com/watch?v=LabCylbapuM&hl=en]