[Xapian-tickets] [Xapian] #282: Assorted enhancements to omindex

Xapian nobody at xapian.org
Fri May 13 02:20:56 BST 2011


#282: Assorted enhancements to omindex
-------------------------+--------------------------------------------------
 Reporter:  olly         |       Owner:  olly     
     Type:  enhancement  |      Status:  assigned 
 Priority:  normal       |   Milestone:  1.2.x    
Component:  Omega        |     Version:  SVN trunk
 Severity:  normal       |    Keywords:           
Blockedby:               |    Platform:  All      
 Blocking:               |  
-------------------------+--------------------------------------------------

Old description:

> A patch from Reini Urban at AVL which was pasted into the wiki a while
> back, but a ticket is really a more appropriate way to track it.  We
> should look at folding some of these improvements in, though some others
> we probably don't want to include, at least in the form in this patch.
>
> I've updated the patch to compile with latest Omega SVN HEAD, dropping
> parts which Omega now supports anyway, and splitting out some features
> into separate tickets.  I've not run-tested it at all.
>
> The remaining features in this patch are:
>
>  * Unpacking "container file types" (e.g. archives like .zip, email
> folders like .mbox, email messages with attachments) so we can index the
> sub-parts
>  * Logging stderr from filters to a file
>  * The seemingly arbitrary addition of more words all starting with "a"
> to the stopword list - stopping some of these seems a bit aggressive to
> me
>  * Defaulting to adding the size and lastmod time of the dump file in
> scriptindex. In general, the size of the dump file seems misleading
> (though if you put one document per dump, less so). The lastmod isn't
> particular helpful in many cases either
>  * Some tweaks to installing docs in the .spec file, which I don't know
> the reasons for

New description:

 A patch from Reini Urban at AVL which was pasted into the wiki a while
 back, but a ticket is really a more appropriate way to track it.  We
 should look at folding some of these improvements in, though some others
 we probably don't want to include, at least in the form in this patch.

 I've updated the patch to compile with latest Omega SVN HEAD, dropping
 parts which Omega now supports anyway, and splitting out some features
 into separate tickets.  I've not run-tested it at all.

 The remaining features in this patch are:

  * Unpacking "container file types" (e.g. archives like .zip, email
 folders like .mbox, email messages with attachments) so we can index the
 sub-parts
  * Logging stderr from filters to a file
  * Defaulting to adding the size and lastmod time of the dump file in
 scriptindex. In general, the size of the dump file seems misleading
 (though if you put one document per dump, less so). The lastmod isn't
 particular helpful in many cases either

--

Comment(by olly):

 Update description for changes in latest patch too (dropped the random
 extra stopwords and the doc-related changes to the spec file).

 Latest patch builds, but functionality untested and probably isn't right.

-- 
Ticket URL: <http://trac.xapian.org/ticket/282#comment:9>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list