[Xapian-discuss] problems with indexing xlsx files

Chris Purves chris at northfolk.ca
Tue Apr 16 18:03:24 BST 2013


On 2013-04-15 03:35, Olly Betts wrote:
> On Fri, Apr 05, 2013 at 03:47:11PM -0300, Chris Purves wrote:
>> I have a number of Excel .xlsx files that aren't indexed properly.  To illustrate, I have a file called "this is a test.xlsx".  It consists of four cells:
>>
>> | this |
>> | is   |
>> | a    |
>> | test |
>>
> [...]
>>
>> You can see that the words are all concatenated together as if they
>> are a single word.  If I search for "thisisatest" it comes up, but not
>> otherwise.
>>
>> I'm using version 1.2.3 on Debian.
> 
> The xlsx extraction code changed significantly in 1.2.11, so I think
> this is quite likely to already be fixed.
> 
> Could you try a newer version, or point us at a sample file which
> exhibits this problem?

After a bit of wrangling I was able to upgrade to version 1.2.12.  This indeed solved my problem.  The individual words are no longer joined together and I can do proper searches for xlsx files.  

Thanks!



-- 
Chris Purves
Visit my blog: http://chris.northfolk.ca

"Whence come I and whither go I? That is the great unfathomable question, the same for every one of us. Science has no answer to it." - Max Planck



More information about the Xapian-discuss mailing list