[Xapian-discuss] Enhance synonyms feature of the query parser (patch included)

hightman hightman at twomice.net
Thu Jan 5 05:28:36 GMT 2012


Very few people seem to be using synonym in Xapian, I recently found some problems in the use of synonyms.

Normally, I think we should not contain any prefix info in synonym table except that 'Z'. 
For example, I have the following synonyms and prefix info:

db.add_synonym("search", "find");
db.add_synonym("Zsearch", "Zfind");
db.add_synonym("foo bar", "foobar");
qp.add_prefix("title", "T");

I think my expected results of query parser should be like this:

"search something" ==> "(Zsearch:(pos=1) SYNONYM find:(pos=1)) AND Zsometh:(pos=2)
"title:search" ==> "ZTsearch:(pos=1) SYNONYM Tfind:(pos=1)"
"title:searching" ==> "ZTSearch:(pos=1) SYNONYM ZTfind:(pos=1)"
"title:(foo bar)" ==> "(ZTfoo:(pos=1) AND ZTbar:(pos=2)) SYNONYM Tfoobar:(pos=1)
...
In general, it is hoped can add prefix info to synonym term automatically, But it does not supportted in current xapian version.

In addition, I have another question about prefix_info of the Term object, it is a vector list, but I don't know when 
there are multi prefixes for a term?? It leads me to worry about the modifier for multi words, because I only consider
the first prefix.

--- PATCH CONTENT BEGIN 'queryparser/queryparser.lemon' ---

*** queryparser.lemony  2012-01-05 12:28:39.000000000 +0800
--- queryparser.lemony.new      2012-01-05 12:52:56.000000000 +0800
***************
*** 307,316 ****
--- 307,318 ----
      for (piter = prefixes.begin(); piter != prefixes.end(); ++piter) {
        // First try the unstemmed term:
        string term;
+ #ifndef HAVE_SYNONYMS_ENH
        if (!piter->empty()) {
            term += *piter;
            if (prefix_needs_colon(*piter, name[0])) term += ':';
        }
+ #endif
        term += name;
  
        Xapian::Database db = state->get_database();
***************
*** 319,334 ****
--- 321,347 ----
        if (syn == end && stem != QueryParser::STEM_NONE) {
            // If that has no synonyms, try the stemmed form:
            term = 'Z';
+ #ifndef HAVE_SYNONYMS_ENH
            if (!piter->empty()) {
                term += *piter;
                if (prefix_needs_colon(*piter, name[0])) term += ':';
            }
+ #endif
            term += state->stem_term(name);
            syn = db.synonyms_begin(term);
            end = db.synonyms_end(term);
        }
        while (syn != end) {
+ #ifdef HAVE_SYNONYMS_ENH
+           string sterm = *syn;
+           if (!piter->empty()) {
+               if (sterm[0] == 'Z') sterm = "Z" + *piter + sterm.substr(1);
+               else sterm = *piter + sterm;
+           }
+           q = Query(query::OP_SYNONYM, q, Query(sterm, 1, pos));
+ #else
            q = Query(Query::OP_SYNONYM, q, Query(*syn, 1, pos));
+ #endif
            ++syn;
        }
      }
***************
*** 1356,1362 ****
--- 1369,1379 ----
      Query::op default_op = state->default_op();
      vector<Query> subqs;
      subqs.reserve(terms.size());
+ #ifdef HAVE_SYNONYMS_ENH
+     if ((state->flags & QueryParser::FLAG_AUTO_MULTIWORD_SYNONYMS) == QueryParser::FLAG_AUTO_MULTIWORD_SYNONYMS) {
+ #else
      if (state->flags & QueryParser::FLAG_AUTO_MULTIWORD_SYNONYMS) {
+ #endif
        // Check for multi-word synonyms.
        Database db = state->get_database();
  
***************
*** 1432,1440 ****
--- 1449,1467 ----
  
            // Use the position of the first term for the synonyms.
            Xapian::termpos pos = (*begin)->pos;
+ #ifdef HAVE_SYNONYMS_ENH
+           string prefix;
+           const list<string> & prefixes = (*begin)->prefix_info->prefixes;
+           if (prefixes.begin() != prefixes.end())
+               prefix = *(prefixes.begin());
+ #endif
            begin = i;
            while (syn != end) {
+ #ifdef HAVE_SYNONYMS_ENH
+               subqs2.push_back(Query(prefix + *syn, 1, pos));
+ #else
                subqs2.push_back(Query(*syn, 1, pos));
+ #endif
                ++syn;
            }
            Query q_synonym_terms(Query::OP_SYNONYM, subqs2.begin(), subqs2.end());

--- PATCH CONTENT END ---


More information about the Xapian-discuss mailing list