[Xapian-discuss] Need a suggestion on implementation.

Olly Betts olly at survex.com
Wed Mar 1 19:54:59 GMT 2006


On Wed, Mar 01, 2006 at 01:58:50PM -0500, jarrod roberson wrote:
> On to my delimna, I want to index arbitarly long logical paths. And I have
> run up on the ~240 character term limit way more than once so far.
> So I am trying to decide the best way to index path information.
> 
> My ideas are as follows:
> 
> /usr/jarrod/very/long/path/to/a/file.txt
> 
> use prefixes like P000:usr, P001:jarrod, P002:very P003:long . . . you get
> the idea

There's no need for each term to correspond to a directory level - you
could make them a fixed number of characters long, which would reduce
the number needed, which should make finding a particular existing entry
more efficient - if you make the length 240 characters then many files
will only need a single term.  Also, this'll work even if you have a
directory name which is 300 characters long...

> the other idea is to use positional information using add_posting( usr, 0 ),
> add_posting( jarrod, 1 ), add_posting( very, 2 ), add_posting( long, 3 )

That'll be less efficient that encoding the position into the term.

You could hash the overlong part of the path like omindex does, but
that carries a small chance that two paths may collide and you'll only
index one, which you may find unacceptable.

Or you could use an external database of some sort to track the pathname
-> xapian docid mapping.

Cheers,
    Olly



More information about the Xapian-discuss mailing list