[Xapian-discuss] Re: searching and sorting by date

Michel Pelletier michel at dialnetwork.com
Fri Mar 24 20:13:54 GMT 2006


> Querying is more complex than it needs to be. We could easily add
> something into the bindings so you could do:
> 
> ----------------------------------------------------------------------
> for match in database.query("my search terms):
>     # do something
>     pass
> ----------------------------------------------------------------------

I think that would be very useful.

> 
>>d = Document(TextField("foo bar bang"), Keyword("genre", "punk"))
>>idx.index(d)
>>for result in idx.search("foo genre:punk"):
>>  print result
> 
> 
> Maybe it's just me, but I don't know what that is doing. 

It's a sample of how one uses Xapwrap.  Sorry I wasn't clear about that.

I'll take a
> guess and say you're adding "foo bar bang" (term generated, stemmed),
> and then a term you're intending to be used in a boolean
> fashion.

Yep

> Looks to me like you don't want to use the omega term style
> here, because you'd have to write more code to set the
> genre->something mapping, and then pass that to the index for when you
> run searches.

Xapwrap manages those mappings for you.  That's one of the really nice 
things it does out of the box.  When a document is indexed, the keys of 
"Keywords" are remembered in a dictionary and the query parser is 
automatically configured with the appropriate prefixes.  You can either 
save/restore the mapping from a dictionary (which I use), or Xapwrap has 
support for storing its metadata in document==1 in the xapian database.

As to term generation, the procedure you explained in your previous 
message (very thoroughly, thank you!) to generate terms according to the 
same capitalization convention is done by Xapwrap already, at least as 
far as I can tell (there are probably some differences).  It handles and 
prefixes text and keywords and has classes for terms and values as well.

> In this case, your Index.search() method won't be able
> to use the QueryParser.
> 
> What do you expect:
> 
>      print result
> 
> to return? You haven't given the underlying Xapian document anything
> to display...

By default in Xapwrap it just prints the score and document id.  You can 
access values of the document form the result, so if title was an 
existing value:

     print result['values']['title']

would print the title of the matching document.

> Note that Xapian doesn't currently include term generators for
> indexing in the library. There has been discussion of this, which
> might take care of the fact that, in the first two lines, you're
> asking for an indexer.

Right.  While the term generation explanation makes sense once it's 
explained, it's a tough concept for a new user to jump over right away.

> Xapwrap is, as I understand, intended to give a simple interface to
> doing the most common types of indexing and searching. Searching is
> something that could be easier, but if you're looking for indexing
> facilities I'd use scriptindex unless I needed to do something fairly
> sophisticated, in which case I'd want to be working at the raw term
> level. If and when we have term generators (and possibly even
> indexers) shipping as part of Xapian or a bundled extension, they
> could easily appear in the bindings as well.

Well that would be great, Xapwrap currently indexes terms along the 
lines of the capitalization scheme you described, so I don't think it 
would be difficult to move my concepts from one to the other.

> Again, maybe that's me - I like to know what's going into my database :-)

Xapwrap does not occlude the database from you, the index classes they 
provide are just wrappers, and the database objects themselves are 
easily accessed via the 'db' attribute.  it gives you quite a bit of 
flexibility on what terms get generated, and when that fails, hey, it's 
Python. ;)  But I know how you feel, libraries of a very raw nature have 
their own set of risks that are unacceptable for many applications.

> 
> (Btw, there are python libraries in core that make you do pretty much
> as much work; they tend to have simpler access methods for common
> tasks, which is probably almost all of what we're lacking. Just be
> thankful we're not doing XML generation under Java :-)

every day!

-Michel




More information about the Xapian-discuss mailing list