Archive for the ‘Atom’ Category

A Unique Problem

Friday, August 18th, 2006

James Holderness writes on how various feed aggregators attempt to judge the uniqueness of items in feeds:

Detecting duplicate items in an RSS feed is something of a black art. How does one uniquely identify an item in a feed while still allowing for that item to be updated? …

I can’t say for sure what algorithms applications are using, but after running 150 tests on more than 20 different aggregators, I think have a fair idea how many of them work.

He summarises some reasonable ways of judging uniqueness and brings up good arguements for and against.

He goes on to say:

I would recommend you also include a unique link element for each item in your feed, to allow for aggregators that don’t handle guids very well. No two items should ever have the same link element,

Unfortuanately the link element is as abused as dates and GUIDs and while this principle is ideal it isn’t how many feeds are constructed. The link element is meant as a permalink to the item itself. Not to what the item is talking about or any other link. But many linkblogs will put the link they are talking about in the link element. In fact a big source of RSS, del.icio.us, gets it wrong and links to the link being mentioned. Now consider that many people on del.icio.us link to the same link and you suddenly have seperate feed items that some aggregators may treat as duplicates.

So ideally yes, GUID and link are good but in practice sadly not. This is the way of much feed parsing as we have found out in the FeedHenry.com project. Feeds need fuzzy logic to make much sense of.

(James’ article is worth reading.)

GData explained

Wednesday, July 19th, 2006

Nat on O’Reilly Radar does it again, this time explaining GData from Google in terms I get:

GData is just Atom/RSS for reading, Atom Publishing for writing, and A9 stored queries for searching.

Interestingly:

There’s a huge move within Google away from SOAP and even REST-style ad hoc APIs and towards GData instead.

and

They’re building APIs to your Google-stored data via GData

Lastely the architect of GData is the same chap who did Microsoft’s HailStorm.