[Xapian-discuss] Tag-based filesystem with xapian, advice?

Olly Betts olly at survex.com
Tue Mar 3 01:39:33 GMT 2009


On Mon, Mar 02, 2009 at 10:50:34PM +0100, Karel Marissens wrote:
> Olly, Thank you for your answers. I haven't got time yet to test it  
> all out but I looked at the API for your answer to my 2th question and  
> it's not entirely clear to me yet.
> 
> First of all, how do I go from an MSet to an RSet? Is there a built-in  
> method I'm overlooking?

Just iterate over the MSet and add the docid for each entry.  There's
no built-in way (it's unusual to want to use the whole MSet in this
way).

> As you guessed, I want to eliminate path terms in my taglists. I see  
> the ExpandDecider can accept a term to ignore, so I need to loop over  
> all terms, check for a '/' and if found add the term to the decider?  

ExpandDecider is passed each term being considered, so you just check if
it is a path term or not.  Conventionally a path is indexed with a "P"
prefix, though you could just start them "/" if you aren't wanting to
interoperate with Omega, etc (and of course assuming no tag can start
with a "/").

> I'm also curious what you think might be the performance of this  
> search for available tags compared to an RDBMS solution? In an RDBMS  
> solution with 3 tables (files, files_tags and tags), there could be a  
> LIKE '/path/%' in the files table to find relevant files (I believe an  
> index can be used for the like), a join with the files_tags table, a  
> join with the tags table and finally a group by on the found tags. But  
> I have no idea if that is more/less performant than the xapian way.

I have no idea either.

> Another requirement that I (probably) have is to be able to add a term  
> to the database without actually adding it to a file yet. Is this  
> possible? I off course can always use an empty document which has all  
> terms (except paths)...

A term only exists if it indexes a document, so you would need an empty
document, or to keep an external list and add it to the tags you get
from Xapian.

> Lastly, do you have any idea if there's python documentation similar  
> to the API documentation for C++? (see link below) Or can it be  
> generated somehow? I did find the python bindings page and everything  
> seems to be about the same as the C++ API, but still it would be  
> handy...

There are Python doccomments, which you should be able to access in the
usual Python ways, e.g. pydoc xapian.MSet

There's probably a way to generate HTML, but I've never investigated.
Check the Python documentation to find out.

Cheers,
    Olly



More information about the Xapian-discuss mailing list