[Xapian-discuss] omega crawler: ht://dig or wget?

Olly Betts olly at survex.com
Fri Mar 17 06:33:02 GMT 2006


On Fri, Mar 17, 2006 at 12:55:47AM -0500, Peter Masiar wrote:
> > http://htdig.blogspot.com/
> 
> Blogs looks like dead link to me, too :-(

Odd, it was there when I pasted the link in.  I guess it's a transient
problem at blogspot.

> Thats what exactly I wanted to know - if anybody here uses it for something.

I originally wrote the htdig2omega script (for someone else) as a quick
way to put together a prototype.  I'm not using it myself right now though.

The htdig2omega script also provides a way for an existing htdig user to
easily try out Xapian and Omega, and serves as an example of how you
might use scriptindex to index data from arbitrary sources.

> Would be nice if fetcher can log in into some sites with password or cookie.
> Currently I do not see any more features. Can wget or htdig do it?

I suspect either can.  If wget seems more suitable for your needs, go
ahead and use it.  If you want to write a HOWTO to help others, or
perhaps some sort of wrapper script please do.

I'm really just trying to make clear that htdig is still a viable
alternative approach.  And that there doesn't have to be just one way to
index data on a website into Xapian.

I believe the next major revision of swish-e will feature Xapian as a
backend, so that will provide another way do this.

Incidentally, the latest htdig release was 2004-06-16, not 2002.  It
has a nominal beta tag, but I wouldn't read too much into that:

    Calling this release a "beta" simply means that exhaustive testing,
    especially on non-Linux platforms, is not yet complete. However, we
    consider it stable enough for most production use.

Cheers,
    Olly



More information about the Xapian-discuss mailing list