[Xapian-discuss] problems with indexing xlsx files

Olly Betts olly at survex.com
Mon Apr 15 07:35:47 BST 2013


On Fri, Apr 05, 2013 at 03:47:11PM -0300, Chris Purves wrote:
> I have a number of Excel .xlsx files that aren't indexed properly.  To illustrate, I have a file called "this is a test.xlsx".  It consists of four cells:
> 
> | this |
> | is   |
> | a    |
> | test |
> 
[...]
> 
> You can see that the words are all concatenated together as if they
> are a single word.  If I search for "thisisatest" it comes up, but not
> otherwise.
> 
> I'm using version 1.2.3 on Debian.

The xlsx extraction code changed significantly in 1.2.11, so I think
this is quite likely to already be fixed.

Could you try a newer version, or point us at a sample file which
exhibits this problem?

Cheers,
    Olly



More information about the Xapian-discuss mailing list