[Xapian-tickets] [Xapian] #552: omindex extracts wrong extension
Xapian
nobody at xapian.org
Tue Jun 28 10:47:50 BST 2011
#552: omindex extracts wrong extension
--------------------+-------------------------------------------------------
Reporter: Ditha | Owner: olly
Type: task | Status: new
Priority: normal | Milestone:
Component: Omega | Version: 1.2.6
Severity: minor | Keywords:
Blockedby: | Platform: All
Blocking: |
--------------------+-------------------------------------------------------
Old description:
> If you try to index with
> "omindex --follow --preserve-nonduplicates --stemmer=german -M:text/html
> --db /data/INDEX /data/QUELLE"
> a directory structure like
> "/data/.../0/118/blog.laukien.com/software/admen"
> the indexer thinks ".com/software..." is an extension, if the file to
> index has no own extension.
> Everything after the last dot is the extension...
>
> If you change the source of omindex.cc into
>
> const char * dot_ptr = strrchr(d.leafname(), '.');
> const char * dot_slash = strrchr(d.leafname(), '/');
>
> if (dot_ptr && dot_slash && dot_ptr > dot_slash)
>
> the extension will be interpreted right. ...I think. ;-)
New description:
If you try to index with
"omindex --follow --preserve-nonduplicates --stemmer=german -M:text/html
--db /data/INDEX /data/QUELLE"
a directory structure like
"/data/.../0/118/blog.laukien.com/software/admen"
the indexer thinks ".com/software..." is an extension, if the file to
index has no own extension.
Everything after the last dot is the extension...
If you change the source of omindex.cc into
{{{
const char * dot_ptr = strrchr(d.leafname(), '.');
const char * dot_slash = strrchr(d.leafname(), '/');
if (dot_ptr && dot_slash && dot_ptr > dot_slash)
}}}
the extension will be interpreted right. ...I think. ;-)
--
Comment(by james):
It should probably be `slash_ptr` not `dot_slash`. Also, I think the
conditional needs to be:
{{{
if (dot_ptr && dot_ptr > dot_slash)
}}}
since if you're indexing relative, "wibble.html" needs to be interpreted
as an extension of ".html".
--
Ticket URL: <http://trac.xapian.org/ticket/552#comment:1>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list