[Xapian-discuss] Returning "fresh" results only from multiple DBs

Henry henka at cityweb.co.za
Mon Jan 12 08:26:36 GMT 2009


Let's say you have the following scenario:

DB1:  large corpus with rarely changing data (typically split across a cluster).

DB2:  small corpus with frequently changing data (to update pages in DB1).

DBn:  ditto.

Since DB1 is so large, and heavily accessed, we want to keep things simple and foolproof, so it's contents are rarely changed, with newer, fresher, pages for the same DB1 pages going into DB2..n.  Each duplicate page (but fresher, so preferred) has a numeric field which increments for each refresh (1,2,3...), which identifies the the most up-to-date page across all DBs.

How can I perform an enquiry, collapsing on a key (as currently done) to remove duplicate pages, but yielding the freshest of those duplicate pages?

Similar to SQL:    SELECT MAX(freshness_num),*  FROM  table...

I know we can perform updates on DB1, but I don't want to go down that path because of the volumes/sizes involved.

Any ideas?


---- This message was sent via a PHP demo version of @Mail - http://atmail.com/

More information about the Xapian-discuss mailing list