[Xapian-discuss] Tag-based filesystem with xapian, advice?
Olly Betts
olly at survex.com
Tue Mar 3 01:39:33 GMT 2009
On Mon, Mar 02, 2009 at 10:50:34PM +0100, Karel Marissens wrote:
> Olly, Thank you for your answers. I haven't got time yet to test it
> all out but I looked at the API for your answer to my 2th question and
> it's not entirely clear to me yet.
>
> First of all, how do I go from an MSet to an RSet? Is there a built-in
> method I'm overlooking?
Just iterate over the MSet and add the docid for each entry. There's
no built-in way (it's unusual to want to use the whole MSet in this
way).
> As you guessed, I want to eliminate path terms in my taglists. I see
> the ExpandDecider can accept a term to ignore, so I need to loop over
> all terms, check for a '/' and if found add the term to the decider?
ExpandDecider is passed each term being considered, so you just check if
it is a path term or not. Conventionally a path is indexed with a "P"
prefix, though you could just start them "/" if you aren't wanting to
interoperate with Omega, etc (and of course assuming no tag can start
with a "/").
> I'm also curious what you think might be the performance of this
> search for available tags compared to an RDBMS solution? In an RDBMS
> solution with 3 tables (files, files_tags and tags), there could be a
> LIKE '/path/%' in the files table to find relevant files (I believe an
> index can be used for the like), a join with the files_tags table, a
> join with the tags table and finally a group by on the found tags. But
> I have no idea if that is more/less performant than the xapian way.
I have no idea either.
> Another requirement that I (probably) have is to be able to add a term
> to the database without actually adding it to a file yet. Is this
> possible? I off course can always use an empty document which has all
> terms (except paths)...
A term only exists if it indexes a document, so you would need an empty
document, or to keep an external list and add it to the tags you get
from Xapian.
> Lastly, do you have any idea if there's python documentation similar
> to the API documentation for C++? (see link below) Or can it be
> generated somehow? I did find the python bindings page and everything
> seems to be about the same as the C++ API, but still it would be
> handy...
There are Python doccomments, which you should be able to access in the
usual Python ways, e.g. pydoc xapian.MSet
There's probably a way to generate HTML, but I've never investigated.
Check the Python documentation to find out.
Cheers,
Olly
More information about the Xapian-discuss
mailing list