[Xapian-tickets] [Xapian] #198: Add support for multiple values in each value slot in a Document.

Xapian nobody at xapian.org
Mon Jul 20 17:16:14 BST 2009


#198: Add support for multiple values in each value slot in a Document.
---------------------------+------------------------------------------------
 Reporter:  richard        |        Owner:  richard  
     Type:  defect         |       Status:  assigned 
 Priority:  high           |    Milestone:  1.1.4    
Component:  Backend-Chert  |      Version:  SVN trunk
 Severity:  normal         |   Resolution:           
 Keywords:                 |    Blockedby:           
 Platform:  All            |     Blocking:  199      
---------------------------+------------------------------------------------

Comment(by olly):

 Um, I think you mean !StringListSerialiser...

 While you could use a Sorter to get out the first value for sorting (and
 probably for collapsing too eventually), that's a lot of extra virtual
 method calls, and worse, a lot of extra data to fetch that you didn't
 want.

 I'm not completely sold on the space overhead argument - it's misleading
 to claim that encode_length() doesn't store the length - that's what the
 top bit of each byte is used for, so it really incurs a 1 bit in 7
 overhead to do so (on average, roughly, since it rounds up the value's
 length to a multiple of 7 or 8 bits in each case).  And we could store
 multivalues by just concatenating the non-primary values and storing the
 split points using interpolative coding, which would probably be similar
 to 1 bit in 7.  It can certainly do better in some cases - e.g. if you're
 storing evenly distributed values from 0 to 255, then encode_length()
 would need 1.5 * N bytes, but the interpolative approach would need N (no
 overhead, since interpolative coding a full range takes no space!)

 And if you have to store the primary value in another slot to avoid the
 I/O overhead when you want to sort on it, that will cancel out any saving
 you made here and then some.

 Also you don't have to use multi-values if you think you can encode your
 stuff into a single value more efficiently than the multi-value
 implementation.

 So I think I still like this idea more than supplying serialisation
 helpers.

 I think the conclusion when we discussed this before is that the non-
 primary values wouldn't go in the stream, so you'd need to store the
 secondary values.  That certainly seems to make most sense to me.  Sure
 that's extra code, but then !StringListSerialiser is extra code too.  It
 might be a bit more complex to implement this internally, but multi-values
 seem a cleaner external API.

-- 
Ticket URL: <http://trac.xapian.org/ticket/198#comment:17>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list