[Xapian-discuss] 128 bit Document IDs (Please don't hurt me)

Kevin Duraj kevinduraj at gmail.com
Sun Mar 11 04:59:13 GMT 2012


We need Xapian Document ID in sequence, because then we can browse through index (e.g., 1... 10000) and retrieve every document, using its ID. If we would switch to UUID then we would loose this ability to retrieve each document from Xapian Index. This way we can use Xapian for searching and as a backup. Please do not change anything in Xapian index data structure, thank you.

- Kevin Duraj
http://MyHealthcare.com
Sent from my iPhone

On Mar 9, 2012, at 3:39 PM, Shane Spencer <shane at bogomip.com> wrote:

> I apologize for what may be a sore subject.  4 billion documents is a
> heck of a lot.  64 bit vs 32 bit would be an incredibly large database
> with an average document and term size.  Why 128 bit?  Simply for
> address space.
> 
> Mapping a UUID (128 bit) or MongoDB ObjectID (96 bit) directly into
> the Xapian document space removes the need for referencing one or the
> other from one or both.  I see a common tendency to write a document
> to the Xapian, return the document ID, and then write to the database
> backing the document in some way.
> 
> This is nothing new.. but I really would like to remove that extra
> write and optionally throw a way the Xapian response by specifying the
> document ID as the UUID associated to the document.  This is starting
> to become much more important as people are walking away from
> auto-increment fields and aiming more toward universal identification
> which, from a sparseness standpoint, is amazingly wasteful but
> incredibly useful.
> 
> Thanks for your consideration.  I have no idea how complicated it
> would be to make this change to Xapian, however I'd imagine migrating
> the document ID into a binary like value rather than an integer value
> would allow for very large document ID widths.  This probably means
> adding a 16 bit length to every document ID which is pretty wasteful.
> 
> For now I'm just storing the UUID as a serialized large integer
> through python-xapian and then writing the xapian document ID to my
> database documents as they become indexed.
> 
> Thanks for your consideration,
> 
> Shane Spencer
> 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss



More information about the Xapian-discuss mailing list