[Xapian-discuss] Offline documentation.

Olly Betts olly at survex.com
Thu Sep 17 16:10:11 BST 2009


On Thu, Sep 17, 2009 at 03:42:22PM +0100, James Aylett wrote:
> On Thu, Sep 17, 2009 at 03:40:23PM +0100, Olly Betts wrote:
> 
> > > wget -r -l 1 -I wiki http://trac.xapian.org/wiki/TitleIndex
> > 
> > I think it would be prudent to add --wait=60 or similar - it seems trac
> > can get a bit sulky if it gets hit with a lot of overlapping requests,
> > as is likely if wget is banging them in back-to-back (since any other
> > requests it gets will then be overlapping...)
> 
> I was originally going to do this, but I ran it without wait and it
> didn't seem to do much to our load. -w 1 would probably be more than
> enough if that's a significant concern, though.

It's not the load that worries me, it's trac getting unhappy and
failing to serve other requests, or even requiring manual intervention,
which I want to avoid.  Remember 3am GMT is the middle of the day here.

Perhaps I'm old fashioned, but at least a minute between requests used
to be a good rule of thumb for polite spidering.

> > Perhaps we should run such a job periodically and provide a tarball if
> > this is likely to be of wider interest.
> 
> It would, but we probably want to frob the output. In particular,
> renaming the files so they're .html would possibly make life easier
> for some people; including support files (stylesheet, for instance)
> might help as well.

The same goes for anyone downloading for themselves.  We could just
provide the files as they come out of wget for now, and it would
save people having to perform the spidering step, which would reduce
the amount of traffic through trac if multiple people wanted them.

Cheers,
    Olly



More information about the Xapian-discuss mailing list