Formulating Advanced Queries with Xapian-Omega

Giulio Teslano giulio.teslano at virgilio.it
Thu Dec 29 16:44:50 GMT 2016


To Olly Betts:

Thank you very much for any feedback.

I apologise for this belated reply and also for the fact that the text
of the  previous posting appeared fragmented, due to its fixed
chars/line format.

With reference to:

 

> Can, or could, one construct a query so that Omega (Xapian) can
handle

> this ?

> 

> ... perhaps with some type of Regex ?

> 

> It would seem that Wild Cards fall short here.

> If it is possible but not immediately available what would one have
to do 

> to enable this option ? Are there any working examples, HowTos, Faqs
?

 

and

......................................................................
..

I have a branch which adds support for arbitrary glob-style wildcard

patterns (where * matches 0 or more characters and ? a single
character):

 

 <https://github.com/ojwb/xapian/tree/extended-wildcards>
https://github.com/ojwb/xapian/tree/extended-wildcards

 

The code there works, but is waiting for some benchmarking and
profiling

before being merged.

......................................................................
...

 

Regarding the above questions and comments:

I looked at the link you suggested with interest however unfortunately
I failed to see the detailed information regarding the new
possibilities of your extended wild cards (that is in comparison to
the basic well known options) and also on how one can implement/make
use of your proposal with the support of the arbitrary glob-style
wildcard patterns.

 

a. What other types of extended wild card(s) options are there ?

   or is this still currently limited to these two characters '*?' ?

b. Apart from 0 or more and single char options are there any other
options ?

 

either via Omega, formulating an appropriate query for CGI or via
Xapian.

 

Rearding the question in relation to Text Patterns

The reference to ISBNs was of course merely a simple example, but it
could be any other typical pattern of letters, numbers and separator
characters.

 

Were you suggesting that one possibility would be trying something
similar to : 

isbn:?-???-?????-? as a very loose general query for ISBNs ?

(so long as the option is enabled).

 

1 Could you mention how one enables and can take advantage of your
extended option in Omega and/or Xapian ? (working example ?)

2 The ? Wild Char is for general characters, is it not ?

  ie. It cannot distinguish between digits and letters and thus cannot
act as a RE \d or [0-9] ?

 

>> $match{REGEX,STRING[,OPTIONS]},

>> $transform{REGEXP,SUBST,STRING[,OPTIONS]}

 

> These are for use in the templating language - they're not search
options.

 

Yes I mentioned that it seemed from reading that these were Post Query
Options acting on the result set.

 

......................................................................
...

>> If none of the above are possible for Omega, can one manage this
with 

>> Xapian, or do something similar ?

>> and 

>> Again any links to working examples etc. would be most appreciated.

 

> If you have particular "code" patterns which are important in your
domain, 

> I'd consider pulling them out at index time and adding them as a
filter term

......................................................................
...

 

It is no doubt due to my lack of understanding but how would this
interesting option 'pulling them out at index time ...' be implemented
?

 

It would be very useful if there were some working examples in
relation to these themes, (at least for those less expert than the
xapian developer level). Xapian-Omega appears to be a very interesting
solution and with an RE option it would be one of the most flexible
and versatile SEs currently available on the net

 

Thank you again for your follow-up.

Best wishes,

Giulio

 



More information about the Xapian-discuss mailing list