How I Learned to Stop Worrying and Love the Shebang

Wednesday, November 11, 2010

A shebang looks like this: #! I know technically it’s not a shebang. It’s just easier to say that.

Very recently Facebook and Twitter have changed to (or added) new URI structures that contain a shebang. A shebang looks like this: #!
(I know technically it’s not a shebang. It’s just easier to say than a hash with exclamation point after it.)

Naturally, this threw many SEO’s like myself into a raging panic. “Search engines ignore everything after a hash”, we proclaimed. “All of our SEO work in social media has been for not! Oh the humanity!”

Fortunately, this is not the case. These major social media outlets are using Google’s spec to make AJAX crawlable.

That’s right, web apps can be as crawlable and optimized as static content. Well… in theory.

If we take a look at what Google has indexed for these Twitter profiles with the new URI structure, we get a single result, and about 100 in the supplemental index. For a major pagerank site like Twitter, that doesn’t look too promising. Google even says in the FAQ, “Just as with static web pages, Google makes no guarantee about search rankings.” A bit of an understatement, no? Granted, the change on Twitter is still rolling out and is still opt in at the time of this writing.

In any case, I think these results will improve vastly going forward. More and more web apps using asynchronous javascript are popping up on the web everyday, combine that with Facebook’s adoption of Google’s spec, and you have something that will be going the distance.

So all that is the cat’s meow, but how is crawlable AJAX accomplished? Well, it turns out that crawling through AJAX ain’t like dusting crops, boy.

Here’s the basics of how it works. You want to have your regular AJAX app look like this: www.example.com/ajax.html#!key=value and then a canonical version that is a static snapshot.

When users hit your site, nothing will happen. However, Googlebot automatically knows (it is their spec after all) to modify each AJAX URI to fetch a static snapshot of that page.

The static HTML snapshot must be generated when this URI is accessed: www.example.com/ajax.html?_escaped_fragment_=key=value.

So, you grab it:
$escapedfragment = $_GET[’_escaped_fragment_’];
or you grab it:

Dim escapedfragment As String = Request.QueryString(“_escaped_fragment_”)
depending on your favorite flavor, and if it’s not null, spit out an HTML snapshot.

How do you create an HTML snapshot? Just use some type of headless browser. It’s not as scary as it sounds, just use something like HTMLunit or crawljax.com to grab it.

If you’ll forgive the personification, Google hates it when Googlebot sees something different than what users see. I mean hate, with a capital H. So be very careful to match your HTML snapshot with your AJAX app or you could find yourself filling out re-inclusion requests all night long. Also, I believe only Google has a spec for this type of thing, so if your SEO strategy requires Yahoo! and Bing you might want to consider an alternate implementation.

For more info, check out http://code.google.com/web/ajaxcrawling/ and happy SEOing!