SQL - Inner joins and limit

Categories: uncategorized

Date: 11 August 2006 21:42:15

I've been hard at work at the rockface of the new Wiblog system recently, and it seems that the more I get done the more there is to do. I met a bit of a problem tonight regarding tags (those lovely bits of folksonomy) which I think I've fixed. This one is quite hard to explain, and is only suitable for people who would wear a "geek" badge with pride, but I'll do my best.

I have a MySQL database containing table with blog entries in it. That table is called 'entries'. And I have another table with tags in it called 'tags'. Each entry in the entries table has a unique ID, and each tag in the tag table is linked to the entry it belongs to using a field called 'entryid'. So far so good.

However, I want to paginate (that is, split into multiple pages) my blog entries, showing 10 on a page then a link to the next page of 10. That's great as long as there are zero or one tags linked to each entry. The code would look something like this:

select e.id, e.title, e.body, t.tag
from entries e
left outer join tags t on t.entryid = e.id
order by e.id desc
limit 0, 10

Those of you with some SQL knowledge will see my problem. When any particular entry has more than one tag linked to it, the 'limit' clause will not work correctly. Rather than bring back the top 10 entries, it will bring back the top 9, with one of thm (the one with two tags) twice. Once for each tag. Let me illustrate:

Output for the above code with one tag per entry:

idtitlebodytag
100100 title100 body100 tag
9999 title99 body99 tag
9898 title98 body98 tag
9797 title97 body97 tag
9696 title96 body96 tag
9595 title95 body95 tag
9494 title94 body94 tag
9393 title93 body93 tag
9292 title92 body92 tag
9191 title91 body91 tag

Output for the above code where entry ID 100 has two tags:

idtitlebodytag
100100 title100 body100 tag
100100 title100 body100 another tag
9999 title99 body99 tag
9898 title98 body98 tag
9797 title97 body97 tag
9696 title96 body96 tag
9595 title95 body95 tag
9494 title94 body94 tag
9393 title93 body93 tag
9292 title92 body92 tag

I've made the second tag bold to make it obvious. So, where has ID 91 gone? Dropped off the end because of the second tag linked to ID 10, that's where. No good, in fact it's downright bad.

I've had an idea for how to fix this for a while, but tonight has been my first chance to get it working. I'm sure this isn't unique, but I had a quick search and couldn't find anything similar. So, here we go.

The important thing we need to know is what the highest and lowest IDs of entries are that we want to bring back. Once we know that it doesn't matter how many tags we have linked to any entries, as we always know our list will be bound by the upper and lower limits.

Here's the code, with (hopefully) clear instructions:

# select the fields we want
select e.id, e.title, e.body, t.tag
# fom the entries table
from entries e
# join the tags table
left outer join tags t on t.entryid = e.id
# here's where it gets interesting
# firstly, get the minimum entry ID and check we only get ID equal or greater
where (e.id>=(select min(l.id) from (select id
from entries
# make sure we order correctly
order by id desc
# set our limit (used for pagination)
limit 0, 10) as l))
# then get the maximum entry ID and check we only get ID equal or less
and (e.id<=(select max(l.id) from (select id
from entries
# make sure we order correctly
order by id desc
# set our limit (used for pagination)
limit 0, 10) as l))
# make sure we order correctly
order by e.id desc

So, the above code should give you:

idtitlebodytag
100100 title100 body100 tag
100100 title100 body100 another tag
9999 title99 body99 tag
9898 title98 body98 tag
9797 title97 body97 tag
9696 title96 body96 tag
9595 title95 body95 tag
9494 title94 body94 tag
9393 title93 body93 tag
9292 title92 body92 tag
9191 title91 body91 tag

ID 91 is back! And while we actually have 11 rows returned by the query, it's pretty easy to parse it using PHP to make sure we don't show the same entry details twice. The only thing to watch out for is that you get the ordering and limits the same in both the min() and max() statements. I hope that's useful to you.