[Xapian-discuss] Offline documentation.

James Aylett james-xapian at tartarus.org
Thu Sep 17 16:24:56 BST 2009


On Thu, Sep 17, 2009 at 04:10:11PM +0100, Olly Betts wrote:

> It's not the load that worries me, it's trac getting unhappy and
> failing to serve other requests, or even requiring manual intervention,
> which I want to avoid.  Remember 3am GMT is the middle of the day here.

True.

> Perhaps I'm old fashioned, but at least a minute between requests used
> to be a good rule of thumb for polite spidering.

No one is polite. I have websites that get hammered by MSNBot, for
instance; no delay at all between requests (because for some reason
they haven't bothered to sort out delays across their spider farm) :-(

> > It would, but we probably want to frob the output. In particular,
> > renaming the files so they're .html would possibly make life easier
> > for some people; including support files (stylesheet, for instance)
> > might help as well.
> 
> The same goes for anyone downloading for themselves.  We could just
> provide the files as they come out of wget for now, and it would
> save people having to perform the spidering step, which would reduce
> the amount of traffic through trac if multiple people wanted them.

That's true. The incantation I gave (plus a -w <something>) should do
for that as a starting point.

J

-- 
  James Aylett

  talktorex.co.uk - xapian.org - uncertaintydivision.org



More information about the Xapian-discuss mailing list