[Xapian-discuss] Enhance synonyms feature of the query parser (patch included)
hightman
hightman at twomice.net
Thu Jan 5 05:28:36 GMT 2012
Very few people seem to be using synonym in Xapian, I recently found some problems in the use of synonyms.
Normally, I think we should not contain any prefix info in synonym table except that 'Z'.
For example, I have the following synonyms and prefix info:
db.add_synonym("search", "find");
db.add_synonym("Zsearch", "Zfind");
db.add_synonym("foo bar", "foobar");
qp.add_prefix("title", "T");
I think my expected results of query parser should be like this:
"search something" ==> "(Zsearch:(pos=1) SYNONYM find:(pos=1)) AND Zsometh:(pos=2)
"title:search" ==> "ZTsearch:(pos=1) SYNONYM Tfind:(pos=1)"
"title:searching" ==> "ZTSearch:(pos=1) SYNONYM ZTfind:(pos=1)"
"title:(foo bar)" ==> "(ZTfoo:(pos=1) AND ZTbar:(pos=2)) SYNONYM Tfoobar:(pos=1)
...
In general, it is hoped can add prefix info to synonym term automatically, But it does not supportted in current xapian version.
In addition, I have another question about prefix_info of the Term object, it is a vector list, but I don't know when
there are multi prefixes for a term?? It leads me to worry about the modifier for multi words, because I only consider
the first prefix.
--- PATCH CONTENT BEGIN 'queryparser/queryparser.lemon' ---
*** queryparser.lemony 2012-01-05 12:28:39.000000000 +0800
--- queryparser.lemony.new 2012-01-05 12:52:56.000000000 +0800
***************
*** 307,316 ****
--- 307,318 ----
for (piter = prefixes.begin(); piter != prefixes.end(); ++piter) {
// First try the unstemmed term:
string term;
+ #ifndef HAVE_SYNONYMS_ENH
if (!piter->empty()) {
term += *piter;
if (prefix_needs_colon(*piter, name[0])) term += ':';
}
+ #endif
term += name;
Xapian::Database db = state->get_database();
***************
*** 319,334 ****
--- 321,347 ----
if (syn == end && stem != QueryParser::STEM_NONE) {
// If that has no synonyms, try the stemmed form:
term = 'Z';
+ #ifndef HAVE_SYNONYMS_ENH
if (!piter->empty()) {
term += *piter;
if (prefix_needs_colon(*piter, name[0])) term += ':';
}
+ #endif
term += state->stem_term(name);
syn = db.synonyms_begin(term);
end = db.synonyms_end(term);
}
while (syn != end) {
+ #ifdef HAVE_SYNONYMS_ENH
+ string sterm = *syn;
+ if (!piter->empty()) {
+ if (sterm[0] == 'Z') sterm = "Z" + *piter + sterm.substr(1);
+ else sterm = *piter + sterm;
+ }
+ q = Query(query::OP_SYNONYM, q, Query(sterm, 1, pos));
+ #else
q = Query(Query::OP_SYNONYM, q, Query(*syn, 1, pos));
+ #endif
++syn;
}
}
***************
*** 1356,1362 ****
--- 1369,1379 ----
Query::op default_op = state->default_op();
vector<Query> subqs;
subqs.reserve(terms.size());
+ #ifdef HAVE_SYNONYMS_ENH
+ if ((state->flags & QueryParser::FLAG_AUTO_MULTIWORD_SYNONYMS) == QueryParser::FLAG_AUTO_MULTIWORD_SYNONYMS) {
+ #else
if (state->flags & QueryParser::FLAG_AUTO_MULTIWORD_SYNONYMS) {
+ #endif
// Check for multi-word synonyms.
Database db = state->get_database();
***************
*** 1432,1440 ****
--- 1449,1467 ----
// Use the position of the first term for the synonyms.
Xapian::termpos pos = (*begin)->pos;
+ #ifdef HAVE_SYNONYMS_ENH
+ string prefix;
+ const list<string> & prefixes = (*begin)->prefix_info->prefixes;
+ if (prefixes.begin() != prefixes.end())
+ prefix = *(prefixes.begin());
+ #endif
begin = i;
while (syn != end) {
+ #ifdef HAVE_SYNONYMS_ENH
+ subqs2.push_back(Query(prefix + *syn, 1, pos));
+ #else
subqs2.push_back(Query(*syn, 1, pos));
+ #endif
++syn;
}
Query q_synonym_terms(Query::OP_SYNONYM, subqs2.begin(), subqs2.end());
--- PATCH CONTENT END ---
More information about the Xapian-discuss
mailing list