[Xapian-discuss] htdig with omega for multiple URLs (websites)
Olly Betts
olly at survex.com
Thu Mar 30 14:28:10 BST 2006
On Wed, Mar 29, 2006 at 12:41:31PM -0500, Peter Masiar wrote:
> If you still have around the script you said you wrote to use htdig as
> crawler front-end for omega, I would be really interested to see it.
The script is htdig2omega, which is distributed with omega.
> My htdig crawls single site. I need to learn how to crawl multiple sites
> and merge results. Do you recall your htdig2omega script handling this
> merging? Or you searched one htdig-crawled database? Or can I merge
> using htdig and then search using omega?
I don't know about merging with htdig.
You could convert each htdig database to a single Xapian database, and
then search them together.
Or you could combine them at index time. Assuming your htdig indexes
are /var/htdig/index.N and you want the xapian index to be
/var/xapian/default, then this bourne shell snippet should do the job:
for htdigdir in /var/htdig/index.* ; do
htdig2omega "$htdigdir"|scriptindex /var/xapian/default htdig2omega.script
done
Cheers,
Olly
More information about the Xapian-discuss
mailing list