Mar 12 2009
New Guardian API – a quibble
I generally applaud efforts by news organizations to get seriously jiggy with the web. In particular, when news companies make APIs available for their content, it represents a real willingness by that company to make the ultimate sacrifice: to give up control of their content and their data and let others see what they can make of it.
Take a look at the cool work of Vancouver artist Jer Thorp. He plays with visualizations based on the word frequency in the huge data sets made available through these APIs.
And visualizations are great, but they don’t reflect the best thing about APIs: as people use a news company’s data in various mashups, it gets more people looking at its content. The company gets more of a chance to engage a community of interested news consumers. They get more readership. We get a better web.
So what’s my beef?
The Guardian API is different from the other API offerings from the NYT and the BBC in that it provides the full content of stories. And the proposition is that you can use the full text of stories on other websites, as long as you display Guardian advertising alongside it.
So why is that bad? It’s not, per se. Aggregators can do a better job of aggregating, and semantic taggers can do a better job of semantic tagging, if they have the full dataset. And that’s the point of the exercise, after all: to increase the overall utility of the web for everyone’s benefit.
What I specifically don’t like is that the Guardian is explicitly saying it’s OK to duplicate their content on other sites. How then do aggregators decide which are duplicates? How does the semantic web figure out what the “canonical” address for the story is? It is specifically disutilitarian to have identical (or even similar, for that matter) content strewn around the web. Or worse, woven “into the fabric of the Internet,” as the Guardian announcement put it.
So they’re right to provide the full data: that’s a plus for web utility. But they’re wrong to condone outright duplication of content, which is of negative utility.
Shafqat Islam of Newscred says he hopes people will realize this problem and not duplicate full content feeds. But the Guardian could fix this problem easily by continuing to provide the full dataset, but putting a clause in the otherwise rather strict Terms and Conditions stating that outright duplication of the material is forbidden.
The Terms and Conditions already stipulate that a link must be provided back to the original article on the Guardian site. Which begs the question: why not just link back to the Guardian site? Why does anyone need to publish the full story, and duplicate that content, on their own site?
2 Responses to “New Guardian API – a quibble”
Additional comments powered by BackType
Finally got around to writing up my quibble with the new #GuardianAPI: http://bit.ly/13LQl
This comment was originally posted on Twitter
@JTownend as promised: http://bit.ly/13LQl #GuardianAPI
This comment was originally posted on Twitter