Personalised clustering

Date: 16 February 2006 11:30:17

Big words scare me, but occasionally I have to use them. In this case "personalised clustering" means that all the RSS feeds you subscribe to are - somehow, by some kind of magic - condensed into a more personalised list of article. So if Mr X decides to talk about his holiday, rather than his business (which is why you subscribed to his feed), you won't see those entries.

It's a good idea. Rabid RSS_readers like me whoc have dozens of not hundreds of feeds can easily get lost in the mire. It's called information overload, and is as much a problem for the RSS-generation as it is for anyone else. Too much stuff to read, not enough time to read it.

But there aren't many RSS readers that do filtering of RSS automatically. There's a review of one service here, which has led to this article, which proffers the following statement:

If 2005 was about Aggregation, then 2006 is all about Filtering.

Which may or may not be true. Personally I don't think that RSS has got enough into the mainstream publics' eye yet - despite some coverage from the media - to say that 2005 was about aggregation. Aggregation is still a 'now' thing and a 'future' thing for most websites and people.

But, filtering will become increasingly improtant as the number of RSS feeds multiplies. Because of the way the technology works (readonly, end-to-end, user controls the feed method) we won't see the influx of spam that led to the dirtying of email. But still, what are we to do with all these articles?

A few thoughts:

- Allow subscription to feeds based on tags, categories or keywords. The new Wiblog system has RSS feeds for each (individual) tag as well as a general RSS feed. That, of course, depends on the author assigning the right tags, putting the article in the right category, or including the right keywords. But the idea of customisable feeds appeals to me greatly - partly because it saves bandwidth if subscribers only download the content they want and not extraneous stuff.

- Filtering based on keywords. This does the filtering at the reed reader end, so still downloads all articles. I guess this would be easier to implement, but definitely not a cast-iron way of getting exactly what you want.

- Subscriber-enabled ranking of articles. This way would allow the subscriber to rank any articles based on how much they are relevent to what they want, or how much they enjoyed them. The feed reader would then scour the higher-ranked artickes for keywords and use that to automatically build lists of keywords for that subscriber that could then be used to filter future articles. This, of course, depends on subscribers rating articles - the more articles that are rated, the more exact the filtering will become. Think of it like a artificial intelligence RSS spam filter.

- Time-based keyword filtering. This is a little harder to explain. Imagine that the feed reader could tell how long the subscriber spent reading each article. Depending on the time spent reading and the length of the article, the system would figure out which articles the subscriber enjoyed or found more useful. Actually, that's a stupid idea: I leave my feed reader open all day.

Hmm. It's an interesting topic. Do you have any more ideas?