[Xapian-discuss] C++ parser for doc.get_data() result.

Kevin Duraj kevin.softdev at gmail.com
Fri Oct 9 01:33:36 BST 2009


Yes, this is the parsing code I was looking for.  Thank you for the advice.

2046	    if (!fieldnames.empty()) {
2047	        // Each line is a field, with fieldnames taken from corresponding
2048	        // entries in the tab-separated list specified by $opt{fieldnames}.
2049	        string::size_type n = 0, n2;
2050	        while (true) {
2051	            n2 = fieldnames.find('\t', n);
2052	            string::size_type old_i = i;
2053	            i = text.find('\n', i);
2054	            field[fieldnames.substr(n, n2 - n)] =
text.substr(old_i, i - old_i);
2055	            if (n2 == string::npos || i == string::npos) break;
2056	            ++i;
2057	            n = n2 + 1;
2058	        }
2059	    } else {
2060	        // Each line is a field, in the format NAME=VALUE.  We
assume the field
2061	        // name doesn't contain an "=".  Lines without an "=" are currently
2062	        // just ignored.
2063	        while (true) {
2064	            string::size_type old_i = i;
2065	            i = text.find('\n', i);
2066	            string line = text.substr(old_i, i - old_i);
2067	            string::size_type j = line.find('=');
2068	            if (j != string::npos) {
2069	                string key = line.substr(0, j);
2070	                string value = field[key];
2071	                if (!value.empty()) value += '\t';
2072	                value += line.substr(j + 1);
2073	                field[key] = value;
2074	            }
2075	            if (i == string::npos) break;
2076	            ++i;
2077	        }
2078	    }


Kevin Duraj
http://find1friend.com/


On Thu, Oct 1, 2009 at 3:48 AM, James Aylett <james-xapian at tartarus.org> wrote:
> On Wed, Sep 30, 2009 at 03:04:44PM -0700, Kevin Duraj wrote:
>
>> Did anybody wrote and would like to share a routines that parse result
>> from doc.get_data() into some key and  pair values in C++ ?
>
> There's code in omega's query.cc (around line 2050 on trunk).
>
> However if you're indexing yourself (as well as searching yourself),
> you may prefer to use another serialisation format for which you have
> an off-the-shelf library. There's nothing magical about the omega
> convention for use of document data -- it is purely a convention. (And
> has some limitations if you want to stuff arbitrary data inside it.)
>
> J
>
> --
>  James Aylett
>
>  talktorex.co.uk - xapian.org - uncertaintydivision.org
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list