[Xapian-discuss] problems with indexing xlsx files
Chris Purves
chris at northfolk.ca
Tue Apr 16 18:03:24 BST 2013
On 2013-04-15 03:35, Olly Betts wrote:
> On Fri, Apr 05, 2013 at 03:47:11PM -0300, Chris Purves wrote:
>> I have a number of Excel .xlsx files that aren't indexed properly. To illustrate, I have a file called "this is a test.xlsx". It consists of four cells:
>>
>> | this |
>> | is |
>> | a |
>> | test |
>>
> [...]
>>
>> You can see that the words are all concatenated together as if they
>> are a single word. If I search for "thisisatest" it comes up, but not
>> otherwise.
>>
>> I'm using version 1.2.3 on Debian.
>
> The xlsx extraction code changed significantly in 1.2.11, so I think
> this is quite likely to already be fixed.
>
> Could you try a newer version, or point us at a sample file which
> exhibits this problem?
After a bit of wrangling I was able to upgrade to version 1.2.12. This indeed solved my problem. The individual words are no longer joined together and I can do proper searches for xlsx files.
Thanks!
--
Chris Purves
Visit my blog: http://chris.northfolk.ca
"Whence come I and whither go I? That is the great unfathomable question, the same for every one of us. Science has no answer to it." - Max Planck
More information about the Xapian-discuss
mailing list