[Xapian-discuss] htdig with omega for multiple URLs (websites)

Olly Betts olly at survex.com
Thu Mar 30 14:28:10 BST 2006


On Wed, Mar 29, 2006 at 12:41:31PM -0500, Peter Masiar wrote:
> If you still have around the script you said you wrote to use htdig as 
> crawler front-end for omega, I would be really interested to see it.

The script is htdig2omega, which is distributed with omega.

> My htdig crawls single site. I need to learn how to crawl multiple sites 
> and merge results. Do you recall your htdig2omega script handling this 
> merging? Or you searched one htdig-crawled database? Or can I merge 
> using htdig and then search using omega?

I don't know about merging with htdig.

You could convert each htdig database to a single Xapian database, and
then search them together.

Or you could combine them at index time.  Assuming your htdig indexes
are /var/htdig/index.N and you want the xapian index to be
/var/xapian/default, then this bourne shell snippet should do the job:

  for htdigdir in /var/htdig/index.* ; do
    htdig2omega "$htdigdir"|scriptindex /var/xapian/default htdig2omega.script
  done

Cheers,
    Olly



More information about the Xapian-discuss mailing list