[Xapian-discuss] Xapian documentation

Olly Betts olly at survex.com
Wed May 10 01:24:06 BST 2006


On Mon, May 08, 2006 at 06:23:32PM +0100, James Aylett wrote:
> Yes, the internal/non-internal split is done well (although it's not
> entirely clear on the index page - or rather, you have to read all the
> words, where I'd like big section headers :-)

Personally I think the internal documentation should be a little less
visible to stop people reading it by mistake (since perhaps 1 in 100
people will actually want to look at it), but I recall there was
resistance to doing this when I suggested it before.

I noticed people tending to link to the "full source docs" rather than
the "API" documentation on distributed bookmarking sites (del.icio.us
and that ilk) so I renamed the former to "internal classes" which seems
to have helped that at least.

> but I suspect we want
> to split the "how does Xapian think" up a little so people who don't
> want to program to Xapian can read it.

None of the internal documentation really describes it at that level
though (if anything our internal documentation is a bit too low level!)

"Overview" gives a reasonable idea of "how Xapian thinks" from a user's
point of view.  While it could be improved, I think it's along the right
lines.

> > In fact, looking at the side menu, I can't see much which I think would
> > make sense in the manual, so I'm not quite sure what you have in mind...
> 
> Home, Features, History, Docs, Current Users (possibly) all make sense
> to me. (With my view of the home page.)

Obviously the docs would fit in a manual!  I think the history would
work there too.

I think the full list of current users is probably too dynamic and
mostly interesting as a set of links to look at.  A few cherry-picked
examples to illustrate the size and range of uses would work in a manual
though.

I think we need to be careful not to overload the front page.  I agree
the existing prose would benefit from a link or two from each paragraph
(though we don't currently have any Omega or bindings documentation
online, so the lack of links is largely because of the lack of pages to
link to!)  But I think it would be a mistake to add a *lot* more text.

The way I see it, for a new visitor, the front page is perhaps their
first or at least one of their earliest impressions of Xapian.  So
it's good to have a succinct, clean-looking page.  It needs to explain
what Xapian is and what it can do for them (and probably also what it
can't do so that we don't constantly get asked!)

There are far too many project websites which I've visited having seen a
passing reference to the project name and have to click around to even
find out what the hell it does, or what the licence is!  Or equally
unhelpfully, all the information is in a single monster page and it's
impossible to find the parts you actually care about.

It's good to cover the different possible reasons people might be
looking at Xapian, and to some extent we have this already - there's a
paragraph for developers and one for people looking for something like
Omega.  We could certainly cover more roles, though I'm not sure we want
a series of paragraphs all starting "IF YOU" in capitals!  I know you
weren't seriously suggesting that, but as a serious point, it's best if
the prose flows naturally, so we want to try to avoid too much
repetition in the sentence structure.

I've noticed the current front page also provides a readily
cut-and-paste-able "soundbite" which people can use if they mention Xapian
in their blog or post it to del.icio.us or whatever:

http://www.google.com/search?q=%22Xapian+is+an+Open+Source+Probabilistic+Information+Retrieval+library%22

I didn't deliberately write the page that way, but having seen this use
evolve, I think it's good to provide such a short paragraph which sums
up the project.

Then I see the features page as really just there to answer the next
question: "does it support obscure <feature x>?"  To that end, it
perhaps deserves a link from the front page prose.

And to some extent the history page lets people know that Xapian is a
reasonably mature project (though rereading the page, a couple more
dates would help!)

Once you know what Xapian is and you've decided that it is for you, you
don't really need to view either of the front page or features again.
For returning visitors the front page really just serves as a jumping
off point to whichever page you actually wanted.

The only concession is the "latest stable version is N" link, and even
that's mostly there as a subtle way to convey to new users that this is
an active project, not one which hasn't made a release this century!

If you're wanting to produce a very complex frontpage, I think we need
to avoid presenting it to new visitors.  One way would be to have a
"logged-in user" status which could subsume the need to register for
bugzilla and the wiki separately.  Then logged-in users could get a
more complex frontpage showing summaries of bugs and wiki pages they
might be watching, etc.  But such a page doesn't really need to replace
the front page (which might be confusing), it could just be another page
which you could bookmark instead of the frontpage if you wanted.

> with each _link_ going to a section of the manual. The big advantage
> of this is that you can download *everything* you should know about
> Xapian in one place. I'm beginning to see this as a /sine qua non/ of
> successfuly free software projects; if I can't download it and read it
> on my laptop on the train, I'm probably not going to use it in
> skunkworks. (So Apache Struts: no hope, JBoss Hibernate: already there.)

Looking at this from the other direction, I dislike projects I have to
download just to read the documentation - it's nice to be able to browse
it online.  Which is another argument for pulling it all together into
a more coherent whole, as that whole can be put on the website as well
as being packaged for downloading.

> The FAQ can obviously link into the manual as well; better, the FAQ
> could be part of the manual (this is what we should have done with
> Zap) so it's distributed in the tarball.

At least while releases are reasonably regular that makes sense.  If
they become less frequent, we can always update the manual between
code releases anyhow.

> > As I've said before (but not made much progress towards sadly) I'd like
> > to be able to house the documentation (or at least a copy of it) on the
> > wiki to allow users (and indeed developers) to easily correct and extend
> > it.  
> > 
> > So using wiki-style markup would make a lot of sense (the main obstacle
> > I found to moving the current documentation to the wiki is the need to
> > convert the markup - it's easy to do a quick conversion, but I found the
> > details hard to get right).  Wiki markup also has the benefit that you
> > don't need to think about it so much.
> 
> I'd rather use wiki-text, but then we need a way to convert that into
> a nice book.

By chance, this cropped up in the latest Debian Weekly News:

http://lists.debian.org/debian-edu/2006/05/msg00017.html

Essentially it's a docbook output filter for moinmoin, which you can
then feed into the usual docbook processing tools.  With a bit of
scripting you should be able to glue together the docbook from lots
of different pages and wrap it in a higher level docbook tag (the filter
seems to output each page as an "article", so you can wrap them all in a
"part" or "book").  

Also linked was an OpenOffice Writer to moinmoin convertor, and the same
page also has a thing to allow you to use docbook instead of wiki
markup (I think that would be a mistake for us though):

http://ooowiki.de/WikiKonverter (in German only)
http://translate.google.com/translate?hl=en&sl=de&u=ooowiki.de/WikiKonverter
  (comedy English translation)

> If we can do that, let's go for it - if I do a pass over
> our existing HTML docs it won't be difficult to wiki-ise them.

I should have a script somewhere which can do most of the drudge work
for you, though I don't seem to be able to find it right now.  The
current wiki actually contains the results of trying it out ages ago,
e.g. here's an automated conversion of an old version of the "overview"
page:

http://wiki.xapian.org/XapianOverview

I wouldn't be suprised if someone else has written a better convertor.

> However we'd need to know how to do indexing before commiting to that
> - I abhore the idea of writing even a small bookworth of information
> but not being able to produce a decent index.

My suggestion would be to add a simple macroname.  Are you simply
looking for the ability to say "include a link to here for the index
entry 'foo'"?  If so, then a simple custom macro would allow you to
write "[[Index(foo)]]" and expand it to an empty string for now, but
we'd have the information in place to handle it in a more sophisticated
way when exporting to docbook later.

http://moinmoin.wikiwikiweb.de/HelpOnMacros

> Also a wiki doesn't give good disconnected editing operations, whereas
> a flat file approach does. We need to solve this - are there any
> disconnected wikis? I guess we want a wiki that runs on top of
> subversion, then we can just use svk.

At least the backing is often a directory of flat files.

It shouldn't be too hard to slap the subversion "filesystem" under an
existing wiki, but conflict revolution might be harder to sort out in
a sensible way.

Another approach would be to have the wiki working in its own SVN
checkout of the master sources, with a (probably manually assisted)
checkin periodically.

Cheers,
    Olly



More information about the Xapian-discuss mailing list