[Xapian-tickets] [Xapian] #552: omindex extracts wrong extension

Xapian nobody at xapian.org
Tue Jun 28 10:47:50 BST 2011


#552: omindex extracts wrong extension
--------------------+-------------------------------------------------------
 Reporter:  Ditha   |       Owner:  olly 
     Type:  task    |      Status:  new  
 Priority:  normal  |   Milestone:       
Component:  Omega   |     Version:  1.2.6
 Severity:  minor   |    Keywords:       
Blockedby:          |    Platform:  All  
 Blocking:          |  
--------------------+-------------------------------------------------------

Old description:

> If you try to index with
> "omindex --follow --preserve-nonduplicates --stemmer=german -M:text/html
> --db /data/INDEX /data/QUELLE"
> a directory structure like
> "/data/.../0/118/blog.laukien.com/software/admen"
> the indexer thinks ".com/software..." is an extension, if the file to
> index has no own extension.
> Everything after the last dot is the extension...
>
> If you change the source of omindex.cc into
>
> const char * dot_ptr = strrchr(d.leafname(), '.');
> const char * dot_slash = strrchr(d.leafname(), '/');
>
> if (dot_ptr && dot_slash && dot_ptr > dot_slash)
>
> the extension will be interpreted right. ...I think. ;-)

New description:

 If you try to index with
 "omindex --follow --preserve-nonduplicates --stemmer=german -M:text/html
 --db /data/INDEX /data/QUELLE"
 a directory structure like
 "/data/.../0/118/blog.laukien.com/software/admen"
 the indexer thinks ".com/software..." is an extension, if the file to
 index has no own extension.
 Everything after the last dot is the extension...

 If you change the source of omindex.cc into

 {{{
 const char * dot_ptr = strrchr(d.leafname(), '.');
 const char * dot_slash = strrchr(d.leafname(), '/');

 if (dot_ptr && dot_slash && dot_ptr > dot_slash)
 }}}

 the extension will be interpreted right. ...I think. ;-)

--

Comment(by james):

 It should probably be `slash_ptr` not `dot_slash`. Also, I think the
 conditional needs to be:

 {{{
 if (dot_ptr && dot_ptr > dot_slash)
 }}}

 since if you're indexing relative, "wibble.html" needs to be interpreted
 as an extension of ".html".

-- 
Ticket URL: <http://trac.xapian.org/ticket/552#comment:1>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list