Google: Caught blue-handed

Facebook data is supposed to be proprietary – indeed, the company has referred to your personal data as its intellectual property, a trade secret. This information is not, itself, a secret. There have been numerous articles on the subject. Here’s one about how Facebook’s exclusive access to what you post is extremely lucrative:

Last week, Facebook filed documents with the government that will allow it to sell shares of stock to the public. It is estimated to be worth at least $75 billion. But unlike other big-ticket corporations, it doesn’t have an inventory of widgets or gadgets, cars or phones. Facebook’s inventory consists of personal data — yours and mine.

Facebook makes money by selling ad space to companies that want to reach us. Advertisers choose key words or details — like relationship status, location, activities, favorite books and employment — and then Facebook runs the ads for the targeted subset of its 845 million users.

– Facebook is using you, Lori Andrew for The New York Times, February 4, 2012

Here’s another about how vast troves of proprietary data – like the one found on Facebook’s servers – is proving a thorny issue for academic researchers:

When scientists publish their research, they also make the underlying data available so the results can be verified by other scientists.

At least that is how the system is supposed to work. But lately social scientists have come up against an exception that is, true to its name, huge.

It is “big data,” the vast sets of information gathered by researchers at companies like Facebook, Google and Microsoft from patterns of cellphone calls, text messages and Internet clicks by millions of users around the world. Companies often refuse to make such information public, sometimes for competitive reasons and sometimes to protect customers’ privacy. But to many scientists, the practice is an invitation to bad science, secrecy and even potential fraud.

– Troves of personal data, forbidden to researchers, John Markoff for The New York Times, May 21, 2012

And even though the article mentions Facebook, Google and Microsoft in the same breath, proprietary means proprietary; the companies don’t share data with one another. Here’s an article about the possibility of an antitrust lawsuit that repeats the assertion that Google can’t see into Facebook’s data:

In fact, this investigation and the possible litigation could not come at a worse time. Google right now is facing huge challenges to its very existence, and can ill afford any kind of distraction.

Google’s overarching problem is that its core business—Internet search—was created to meet the needs of the Internet as it existed in 1996. In those days the Web was open, and Google’s algorithms could crawl over everything.

Today, the Internet is being carved up into walled gardens. Facebook has 900 million members, and Google can’t crawl their stuff.

– Antitrust suit could bring down Google, Dan Lyons for The Daily Beast, April 27, 2012

Anecdotally, I imagine the idea of Facebook as a walled garden conforms to your personal experience. Google searches don’t turn up Facebook content (which partially explains why it created Google+). They might turn up a public profile, but you won’t see much more than a name, a profile picture, and a non-representative handful of friends.

Or so the narrative goes.

I’m not convinced. As you may have gathered, I have some experience setting up blogs, using both Tumblr and WordPress (see under Blogroll, at right). The dashboard tracks stats for you – number of views, which posts were viewed, etc. – but to extract comparable information from Tumblr, you need to set up a third-party plugin. So I installed one called, appropriately, Statcounter on all my Tumblr accounts, including Name Baby Kavana.

Name Baby Kavana was already nearly two years old when I asked JJ – out of the blue – on February 18 of this year, “any chance we get to reopen this anytime soon?” His response came the next day, “dude. my sister is actually pregnant :). make it happen. not sure if boy or girl. take it away haha.” I wasted little time claiming the url for Name Baby Kavana’s Brother on WordPress and putting up the first few posts (the earliest dates from March 22), but it was never shared publicly until I posted it on JJ’s Facebook wall just before midnight on June 25.

Because I viewed this new site as a continuation of the original blog, I installed the same Statcounter code I’d once used for Name Baby Kavana so the stats would all be counted together. As this was the first time the url had been publicly shared on Facebook, I wasn’t surprised to note that Facebook had wasted no time visiting the site, presumably to check for spam, or simply to index it. Who knows why Facebook does what it does:

That spot where you see ‘Facebook’ is usually reserved for the internet service provider, e.g. Qwest or Road Runner. Once in a very long while, it gets slightly more specific, as you can see above. But I digress. You might be surprised to note who beat Facebook to the site by nearly 15 seconds:

Keep in mind, the site had been in existence for over three months, and had never been shared until I posted it on Facebook. I think it would be hard to argue that the timing of Google’s visit was purely coincidental. It certainly looks to me like Facebook’s ‘walled garden’ is in danger of being overrun by spiders.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s