[Xapian-discuss] Need a suggestion on implementation.
jarrod roberson
jarrod.roberson at gmail.com
Thu Mar 2 06:37:22 GMT 2006
On 3/1/06, Olly Betts <olly at survex.com> wrote:
>
> On Wed, Mar 01, 2006 at 01:58:50PM -0500, jarrod roberson wrote:
> > On to my delimna, I want to index arbitarly long logical paths. And I
> have
> > run up on the ~240 character term limit way more than once so far.
> > So I am trying to decide the best way to index path information.
> >
> > My ideas are as follows:
> >
> > /usr/jarrod/very/long/path/to/a/file.txt
> >
> > use prefixes like P000:usr, P001:jarrod, P002:very P003:long . . . you
> get
> > the idea
>
> There's no need for each term to correspond to a directory level - you
> could make them a fixed number of characters long, which would reduce
> the number needed, which should make finding a particular existing entry
> more efficient - if you make the length 240 characters then many files
> will only need a single term. Also, this'll work even if you have a
> directory name which is 300 characters long...
thanks for the suggestion, I can't afford even the miniscule chance that
a hash collision might happen.
so what you are suggesting is for terms that are > 240 have a term entry
for each 240 character piece and
prefix them with a position? This will probably be a single term per
directory in most cases as you suggest.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060302/6ffd7551/attachment.htm
More information about the Xapian-discuss
mailing list