[Xapian-devel] Some Questions From the beginner of Xapian

liminghit at 126.com liminghit at 126.com
Thu Sep 18 03:02:01 BST 2008


 
 Thanks for your guys enthusiasm replys, realy helpful!
 
Cheers,
Ming
 
 
 


在2008-09-17,"Olly Betts" <olly at survex.com> 写道:
>On Wed, Sep 17, 2008 at 06:13:40AM +0000, Dave Spencer wrote:
>> It would be nice if there was some page on "concepts" that covered this
>
>http://xapian.org/docs/glossary
>
>> I've wondered what the intent of get_data and set_data was, esp why have
>> the indexed values (the index being the first arg to get/add value) whereas
>> with data it's just a single value -- why not have multiple "data" values,
>> or why not get rid of "data" and just let the get/add value calls cover it?
>
>Use values if you need fast access during the match process itself (e.g.
>for sorting, collapsing, etc).  Then Xapian knows to store the data such
>that this can be done efficiently.  If you're sorting by date, Xapian
>only needs date information and doesn't want to have to fetch extraneous
>data to get it - this is why there are multiple value slots (the current
>implementation doesn't make best use of this but I'm working on that at
>the moment as it happens!)
>
>Optimising the storage scheme for this use case will hurt other access
>patterns, so we advise against storing arbitrary "data fields" in value
>slots.  If you need to store other data which isn't needed in this way
>(e.g. you want it for displaying results), serialise it into the
>document data instead.
>
>There are already plenty of existing ways to serialise structured data
>into a single string, so when we were originally building Xapian we just
>chose a simple approach which allows you to pick an existing solution
>you like (some examples: XML, Python's pickle, JSON, Omega's
>"name=value" scheme) and allowed us to get on with the rest of the job.
>
>At some point I think we probably will add support for some sort of
>document fields.  Verbosity is more of an issue here than in most
>situations, so it's not just a case of reinventing the wheel, and
>we may be able to reuse an existing solution anyway.
>
>A numerically subscripted array of strings doesn't add much generality
>though - if you want to store any other sort of structure or any
>non-string data, you're still going to have to serialise it to one or
>more strings.  I think we probably should aim higher.
>
>There's a ticket tracking this issue:
>
>http://trac.xapian.org/ticket/53
>
>> I'm guessing the intent of 'data' is to store some key piece of info
>> about a document such as the URL of a doc that represents a web page.
>
>One *or more* pieces of information, but otherwise yes.
>
>Cheers,
>    Olly
>
>_______________________________________________
>Xapian-devel mailing list
>Xapian-devel at lists.xapian.org
>http://lists.xapian.org/mailman/listinfo/xapian-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xapian.org/pipermail/xapian-devel/attachments/20080918/af49753f/attachment.htm 


More information about the Xapian-devel mailing list