[Xapian-devel] Some Questions From the beginner of Xapian
liminghit at 126.com
liminghit at 126.com
Thu Sep 18 03:02:01 BST 2008
Thanks for your guys enthusiasm replys, realy helpful!
在2008-09-17，"Olly Betts" <olly at survex.com> 写道：
>On Wed, Sep 17, 2008 at 06:13:40AM +0000, Dave Spencer wrote:
>> It would be nice if there was some page on "concepts" that covered this
>> I've wondered what the intent of get_data and set_data was, esp why have
>> the indexed values (the index being the first arg to get/add value) whereas
>> with data it's just a single value -- why not have multiple "data" values,
>> or why not get rid of "data" and just let the get/add value calls cover it?
>Use values if you need fast access during the match process itself (e.g.
>for sorting, collapsing, etc). Then Xapian knows to store the data such
>that this can be done efficiently. If you're sorting by date, Xapian
>only needs date information and doesn't want to have to fetch extraneous
>data to get it - this is why there are multiple value slots (the current
>implementation doesn't make best use of this but I'm working on that at
>the moment as it happens!)
>Optimising the storage scheme for this use case will hurt other access
>patterns, so we advise against storing arbitrary "data fields" in value
>slots. If you need to store other data which isn't needed in this way
>(e.g. you want it for displaying results), serialise it into the
>document data instead.
>There are already plenty of existing ways to serialise structured data
>into a single string, so when we were originally building Xapian we just
>chose a simple approach which allows you to pick an existing solution
>you like (some examples: XML, Python's pickle, JSON, Omega's
>"name=value" scheme) and allowed us to get on with the rest of the job.
>At some point I think we probably will add support for some sort of
>document fields. Verbosity is more of an issue here than in most
>situations, so it's not just a case of reinventing the wheel, and
>we may be able to reuse an existing solution anyway.
>A numerically subscripted array of strings doesn't add much generality
>though - if you want to store any other sort of structure or any
>non-string data, you're still going to have to serialise it to one or
>more strings. I think we probably should aim higher.
>There's a ticket tracking this issue:
>> I'm guessing the intent of 'data' is to store some key piece of info
>> about a document such as the URL of a doc that represents a web page.
>One *or more* pieces of information, but otherwise yes.
>Xapian-devel mailing list
>Xapian-devel at lists.xapian.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Xapian-devel