[Xapian-discuss] Re: searching and sorting by date

James Aylett james-xapian at tartarus.org
Fri Mar 24 10:44:48 GMT 2006


On Thu, Mar 23, 2006 at 10:28:23AM -0800, Michel Pelletier wrote:

> I think there are a couple of "pythonic" idioms that would be useful but 
> I don't see their purpose being added to the SWIG'ed API, rather they 
> should be in a seperate module distributed with the xapian bindings that 
> did some of the tedious backwork, like mapping names to value slots, 
> providing ways to query the db in one shot instead of constantly 
> retyping the pattern "e=Enquire(d).set_query(Query(QueryParser(...)))" 
> and some simple classes that make it easy to create documents and terms 
> and index them.

Querying is more complex than it needs to be. We could easily add
something into the bindings so you could do:

----------------------------------------------------------------------
for match in database.query("my search terms):
    # do something
    pass
----------------------------------------------------------------------

I have a prototype of this; it's only fifty or so lines of code.

> d = Document(TextField("foo bar bang"), Keyword("genre", "punk"))
> idx.index(d)
> for result in idx.search("foo genre:punk"):
>   print result

Maybe it's just me, but I don't know what that is doing. I'll take a
guess and say you're adding "foo bar bang" (term generated, stemmed),
and then a term you're intending to be used in a boolean
fashion. Looks to me like you don't want to use the omega term style
here, because you'd have to write more code to set the
genre->something mapping, and then pass that to the index for when you
run searches. In this case, your Index.search() method won't be able
to use the QueryParser.

What do you expect:

     print result

to return? You haven't given the underlying Xapian document anything
to display...

Note that Xapian doesn't currently include term generators for
indexing in the library. There has been discussion of this, which
might take care of the fact that, in the first two lines, you're
asking for an indexer.

The last two lines would be catered for by what I propose above, I
think. (Providing the Document is given some sort of data to display :-)

> I don't see how I can get that much bang per line out of the swig 
> wrappers without essentially writing my own Xapwrap-like library, and I 
> don't see how anyone else can do it without them all writing their own 
> slightly similar/different versions of their own little library.  This 
> makes me think the need for Xapwrap is genuine and that it is bringing 
> me and others benefits that the swig API doesn't provide.
> 
> Of course I'm not trying to insult the SWIG API or offend anyone's 
> sensibilities.  The swig API is pefectly suited for low-level, 
> cross-language, direct use of xapian, I just think there exist high 
> level use cases that they don't provide that a high level wrapper module 
> should.

Xapwrap is, as I understand, intended to give a simple interface to
doing the most common types of indexing and searching. Searching is
something that could be easier, but if you're looking for indexing
facilities I'd use scriptindex unless I needed to do something fairly
sophisticated, in which case I'd want to be working at the raw term
level. If and when we have term generators (and possibly even
indexers) shipping as part of Xapian or a bundled extension, they
could easily appear in the bindings as well.

Again, maybe that's me - I like to know what's going into my database :-)

(Btw, there are python libraries in core that make you do pretty much
as much work; they tend to have simpler access methods for common
tasks, which is probably almost all of what we're lacking. Just be
thankful we're not doing XML generation under Java :-)

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list