[Xapian-discuss] Offline documentation.

Olly Betts olly at survex.com
Fri Sep 18 14:03:12 BST 2009

On Thu, Sep 17, 2009 at 04:24:56PM +0100, James Aylett wrote:
> On Thu, Sep 17, 2009 at 04:10:11PM +0100, Olly Betts wrote:
> > Perhaps I'm old fashioned, but at least a minute between requests used
> > to be a good rule of thumb for polite spidering.
> No one is polite. I have websites that get hammered by MSNBot, for
> instance; no delay at all between requests (because for some reason
> they haven't bothered to sort out delays across their spider farm) :-(

Rudeness by others isn't much of an excuse!

> > > It would, but we probably want to frob the output. In particular,
> > > renaming the files so they're .html would possibly make life easier
> > > for some people; including support files (stylesheet, for instance)
> > > might help as well.
> > 
> > The same goes for anyone downloading for themselves.  We could just
> > provide the files as they come out of wget for now, and it would
> > save people having to perform the spidering step, which would reduce
> > the amount of traffic through trac if multiple people wanted them.
> That's true. The incantation I gave (plus a -w <something>) should do
> for that as a starting point.

wget has options to rename to .html and rewrite links, so here's a dump
that should be browsable:


That should get updated weekly, early Friday morning UTC.

I told it to include stylesheets and images too, but that didn't seem to
work despite what the man page says.  I don't have time to investigate
why not at the moment.


More information about the Xapian-discuss mailing list