[Xapian-discuss] scriptindex on an internet crawl

Arjen van der Meijden acmmailing at tweakers.net
Thu Jun 23 07:14:05 BST 2005


On 23-6-2005 0:34, Olly Betts wrote:
> On Wed, Jun 22, 2005 at 09:14:12PM +0100, Olly Betts wrote:
> 
>>On Wed, Jun 22, 2005 at 03:21:32PM -0400, Georges Dupret wrote:
>>
>>>In a first try, I inserted in the command file url : field=url boolean=XURL
>>>unique=XURL and in the input file: url=www.dcc.uchile.cl/~gdupret for
>>>example, but scriptindex start using 100% of the CPU and never finishes.
>>
>>You probably don't want to specify both uid and url as unique fields,
>>but this should cause a hang - I'll see if I can reproduce this.
> 
> 
> I can't seem to reproduce this.  Can you run scriptindex under gdb (just
> add "gdb --args " in front of the scriptindex command, then "run" at the
> "(gdb)" prompt), and hit Ctrl-C when it's hung.  Then "bt" should show a
> backtrace of where execution is.

Can't this be explained by just that scriptindex is very very slow? I 
can imagine that a unique-check for a relatively long identifier with a 
relatively similar beginning can be very time consuming and/or results 
in quite a bit of more btree-work. At least compared to more evenly 
distributed identifiers.

Best regards,

Arjen



More information about the Xapian-discuss mailing list