[Xapian-discuss] Offline documentation.

James Aylett james-xapian at tartarus.org
Sat Sep 19 17:05:38 BST 2009


On Fri, Sep 18, 2009 at 02:03:12PM +0100, Olly Betts wrote:

> > > Perhaps I'm old fashioned, but at least a minute between requests used
> > > to be a good rule of thumb for polite spidering.
> > 
> > No one is polite. I have websites that get hammered by MSNBot, for
> > instance; no delay at all between requests (because for some reason
> > they haven't bothered to sort out delays across their spider farm) :-(
> 
> Rudeness by others isn't much of an excuse!

I know, but I kind of feel that we're allowed to be rude to our own
site if we have reason to believe others are going to be anyway. But
hey, whatever works.

> wget has options to rename to .html and rewrite links, so here's a dump
> that should be browsable:

Oh, I didn't know that. Awesome.

> I told it to include stylesheets and images too, but that didn't seem to
> work despite what the man page says.  I don't have time to investigate
> why not at the moment.

It's because it's only including stuff under /wiki; so additional
include options will need to be added for static stuff. Or just
copying them in or something. I doubt it's critical for getting the
content anyway.

J

-- 
  James Aylett

  talktorex.co.uk - xapian.org - uncertaintydivision.org



More information about the Xapian-discuss mailing list