[Xapian-discuss] Offline documentation.
James Aylett
james-xapian at tartarus.org
Sat Sep 19 17:05:38 BST 2009
On Fri, Sep 18, 2009 at 02:03:12PM +0100, Olly Betts wrote:
> > > Perhaps I'm old fashioned, but at least a minute between requests used
> > > to be a good rule of thumb for polite spidering.
> >
> > No one is polite. I have websites that get hammered by MSNBot, for
> > instance; no delay at all between requests (because for some reason
> > they haven't bothered to sort out delays across their spider farm) :-(
>
> Rudeness by others isn't much of an excuse!
I know, but I kind of feel that we're allowed to be rude to our own
site if we have reason to believe others are going to be anyway. But
hey, whatever works.
> wget has options to rename to .html and rewrite links, so here's a dump
> that should be browsable:
Oh, I didn't know that. Awesome.
> I told it to include stylesheets and images too, but that didn't seem to
> work despite what the man page says. I don't have time to investigate
> why not at the moment.
It's because it's only including stuff under /wiki; so additional
include options will need to be added for static stuff. Or just
copying them in or something. I doubt it's critical for getting the
content anyway.
J
--
James Aylett
talktorex.co.uk - xapian.org - uncertaintydivision.org
More information about the Xapian-discuss
mailing list