[Xapian-discuss] some questions with scriptindex

Sabrina Shen hm2shen at yahoo.com
Mon Mar 28 01:07:21 BST 2005


Thanks a lot! Now I have a much better understanding.

--- Olly Betts <olly at survex.com> wrote:
> On Sun, Mar 27, 2005 at 07:45:21PM +0000, Sabrina
> Shen wrote:
> > UID: field=UID boolean=XUID unique=Q 
> 
> Unless you're trying to do something clever, you
> want the same prefix for
> boolean and unique.
> 

Yes,  you're right. I'll change it.

> > (1) Do I really need "field=" for each?
> 
> You do if you want Xapian to store them.
> 
> > Isn't "field=" just for displaying web search
> results (As Sam
> > described in earlier messages: "Fields are used to
> retreive per-record
> > text for summaries and things like for Omega." ) ?
> 
> There's nothing special about web search results. 
> Sam meant "for Omega"
> simply as an example of a program which might use
> them.
> 
> > Can't I get these values with
> "get_document().get_data()" using
> > MSetiterator in my local system even without
> "field="? say, output
> > search results into a text file?
> 
> The document data is built from the values processed
> with "field=".  So
> if you don't have a field action, the value won't be
> stored in the
> document data.  Sometimes that's what you want...
> 

Oh, I see. I have to keep the "field=" action.

> > (2) How does "truncate=" work?
> 
> The "input field" from the dump file is fed through
> each action in turn.
> The "truncate" action simply truncates the value to
> the given length, so
> actions on the same line after the "truncate" see
> the truncated text.
> 
> > Does it work for both probabilistic field and
> BOOLEAN field?
> 
> For *ANY* action after it.
> 
> > Does it truncate each word while indexing, e.g.
> truncate a term
> > if it's longer than 200 characters while indexing?
> 
> No - "index" after "truncate" means the text will be
> truncated before
> word splitting.  But "index" will discard any word
> of more than 64
> characters anyway.

I got it. That's also why the key too long error is
probably not from the "index" field.

> > (3) In the indexing process, I got an error
> message as following: 
> > "Exception: Key too long: length was 264 bytes,
> maximum length of a key is
> > Btree::max_key_len bytes". I understand it means a
> single term is too
> > long. But a term in which field: the primary field
> UID? or any field
> > such as JN, CA, and AB?
> 
> It'll be in one of the boolean fields (unless you
> passed "index" a prefix
> of 200 or so characters!)

This is somewhat unexpected.  It seems to me that
there shouldn't be a single term longer than 200 in
the boolean fields. JN (journal name) is separated by
spaces. Publication Year is a 4-digit number.
Classification is a code with two chars. I assigned
multiple values for articles with multiple authors
(AU). Anyway, I'll check whether there is such a long 
term in a single value. 

> This should be reported better.  We need to check
> term length explicitly
> up front (at present this exception comes from a
> lower level which is
> handling keys built from terms and document ids).
> 
> Cheers,
>     Olly

Is there a  way  that I can check exactly where this 
error happened, say, with which term and which
document? 

Thanks!

Sabrina


		
__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 



More information about the Xapian-discuss mailing list