[Xapian-devel] Adding Bi-gram in the QueryParser and Object.

Gaurav Arora gauravarora.daiict at gmail.com
Fri Jun 29 01:35:40 BST 2012


Hi all,

I have jotted down a plan for how to handle or add Bi-gram in Query Object
though QueryParser.

PFA as a sequence diagram which depicts what i got to know about how parser
works and query is build from tokens provided by the lexer.I
have highlighted some area in blue where i think there is possibility of
having bi-grams.While Integrating bi-gram in the Parser ,Query our aim is
to generate and add bi-grams for all the consecutive terms to the query.

Following are categories sent to Parser from Lexer to Form Query Object:

*Near - *2 or more terms with near in between.It is a type of query these
two term are in window of 10 words.Since we are seeking these two words in
vicinity of 10 Words window.It wont hurt to have them as bi-grams as we are
seeking them in 10 words window so having them next is better.*(Bigram can
be added)*

*Example:*
*
*
Failed NEAR Assertion

*Currently parser output.*

Query((failed at 1 NEAR 11 assertion at 2))

*Output With Bigram:*
*
*
Query((failed at 1 NEAR 11 assertion at 2) OR failed assertion at 3)

*Implementation:*

Since the all terms detected as near is added to class *Terms* so when we
ask for Queries from the Class *Terms *using as_near_query ,
as_adj_query,as_opwindow_query then while parsing terms we can just add the
bigrams while iterating list of term.


*Adj: *exactly similar to *NEAR(Bigram can be added)*

*phrase : *Terms giving in a Quotes.Since they are terms user want to have
together.Bigram can be added*(Bigram can be added)*
Implementation is similar to Near,adj.
*
*
*Phrased: *Single term which is actually two or more term linked with
punctuation.These terms can be treated as bi-grams as they are terms which
must exist together.*(Bigram can be added)*
Implementation is similar to Near,adj.

*Group: *A group of term separated only by white-spaces.*(Bigram can be
added)*
*
*
*Implementation:*

Since the all terms detected as group are added to class *TermGroup* so
when we ask for Query from the Class *TermGroup *using as_group_query  then
while parsing terms we can just add the bigrams while iterating list of
terms.

*Wild:*
*Partial:*
*Synonym:*
**This is expanding which follow the pattern,synonym of term.It will pull
out lot of similar terms and form a query with all those words.So
considering this for bi-gram doesn't seem important.Please suggest if you
feel it should be included.
*
*
*BRA-KET: *These are bracketed expression.Currently the grammar have rule *BRA
expr(E) KET* .so if there will be any scope of bi-grams in query inside
BRA-KET it would have been consider while working on internal expression.
*
*
*ValueRange: *No relation with Bi-grams.
*
*
*Love: *
*Hate: *
Since we are trying to avoid query or have a single term.We can
restrain adding them as bi-grams.
*
*
*bool_operator:*
https://github.com/sehaj-sk/xapian/blob/mybranch/xapian-core/docs/queryparser_new.rst#boolean-query
*
*
Boolean operator are done by following type of grammer rules. bool_arg(E)
bool_operator bool_arg(P).

since *bool_arg* are *expr *i.e they are Query Object hence getting Bigrams
would be difficult.Please suggest something.

Example.

assertion OR failed

*Current parsed Query:*
*
*
Query((Zfail at 1 OR Zassert at 2))
*
*
*Since Terms are converted to Query object how to make bigrams for these
simple OR operator of terms.*
*
*
Major work of handling the bi-grams will be taken care of by adding
bi-grams to the terms while iterating terms in *TermGroup *and *Terms *
class.

Please guide me and provide feedback about how to adding bi-grams in Query
Object.

Thanks,

Gaurav Arora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120629/7087a087/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Queryparser_view.jpg
Type: image/jpeg
Size: 96877 bytes
Desc: not available
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120629/7087a087/attachment-0001.jpg>


More information about the Xapian-devel mailing list