[Xapian-discuss] some questions with scriptindex
Sabrina Shen
hm2shen at yahoo.com
Mon Mar 28 01:07:21 BST 2005
Thanks a lot! Now I have a much better understanding.
--- Olly Betts <olly at survex.com> wrote:
> On Sun, Mar 27, 2005 at 07:45:21PM +0000, Sabrina
> Shen wrote:
> > UID: field=UID boolean=XUID unique=Q
>
> Unless you're trying to do something clever, you
> want the same prefix for
> boolean and unique.
>
Yes, you're right. I'll change it.
> > (1) Do I really need "field=" for each?
>
> You do if you want Xapian to store them.
>
> > Isn't "field=" just for displaying web search
> results (As Sam
> > described in earlier messages: "Fields are used to
> retreive per-record
> > text for summaries and things like for Omega." ) ?
>
> There's nothing special about web search results.
> Sam meant "for Omega"
> simply as an example of a program which might use
> them.
>
> > Can't I get these values with
> "get_document().get_data()" using
> > MSetiterator in my local system even without
> "field="? say, output
> > search results into a text file?
>
> The document data is built from the values processed
> with "field=". So
> if you don't have a field action, the value won't be
> stored in the
> document data. Sometimes that's what you want...
>
Oh, I see. I have to keep the "field=" action.
> > (2) How does "truncate=" work?
>
> The "input field" from the dump file is fed through
> each action in turn.
> The "truncate" action simply truncates the value to
> the given length, so
> actions on the same line after the "truncate" see
> the truncated text.
>
> > Does it work for both probabilistic field and
> BOOLEAN field?
>
> For *ANY* action after it.
>
> > Does it truncate each word while indexing, e.g.
> truncate a term
> > if it's longer than 200 characters while indexing?
>
> No - "index" after "truncate" means the text will be
> truncated before
> word splitting. But "index" will discard any word
> of more than 64
> characters anyway.
I got it. That's also why the key too long error is
probably not from the "index" field.
> > (3) In the indexing process, I got an error
> message as following:
> > "Exception: Key too long: length was 264 bytes,
> maximum length of a key is
> > Btree::max_key_len bytes". I understand it means a
> single term is too
> > long. But a term in which field: the primary field
> UID? or any field
> > such as JN, CA, and AB?
>
> It'll be in one of the boolean fields (unless you
> passed "index" a prefix
> of 200 or so characters!)
This is somewhat unexpected. It seems to me that
there shouldn't be a single term longer than 200 in
the boolean fields. JN (journal name) is separated by
spaces. Publication Year is a 4-digit number.
Classification is a code with two chars. I assigned
multiple values for articles with multiple authors
(AU). Anyway, I'll check whether there is such a long
term in a single value.
> This should be reported better. We need to check
> term length explicitly
> up front (at present this exception comes from a
> lower level which is
> handling keys built from terms and document ids).
>
> Cheers,
> Olly
Is there a way that I can check exactly where this
error happened, say, with which term and which
document?
Thanks!
Sabrina
__________________________________
Do you Yahoo!?
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/
More information about the Xapian-discuss
mailing list