[Xapian-discuss] Grouping Results (again)

Ben Campbell ben at scumways.com
Mon Nov 2 10:25:27 GMT 2009


I've got a bunch of indexed documents (newspaper articles).
Each document has 0 or more authors.

I want to show search results grouped by author.

(it's a somewhat similar situation to the one posted a couple of weeks 
ago by Torsten Bronger)

Here are the solutions I can think of so far:

1) pick a single author for each article, and put them in a valueno 
slot, then use set_collapse_key() to do the grouping.
cons: doesn't handle articles with more than one credited author very well.

2) slurp the top N results out into the calling code (I'm using PHP in 
this case) and do the grouping there. Need some metric to rank authors - 
either by taking their most relevant document (as set_collapse_key does) 
or maybe even by summing up the relevance scores of all their documents 
- and multiple matching documents probably means an author is more relevant.

cons: doesn't scale up well to large result sets.

3) maintain a separate xapian database which has single uberdocument for 
each author (by concatinating all their articles)
I've got nearly 2 million documents, but only about 20000 authors. Maybe 
a second database would be quite small...
cons: _another_ database to maintain and contend for RAM

Any other suggestions or advice?

At the moment, I'm leaning toward option 2, although I might do a quick 
test of option 3 and see if the extra database is small enough to be 
manageable...

Thanks,
Ben.



More information about the Xapian-discuss mailing list