Python Pandas Cannot Read Old Excel Files With Some Strange Encoding And Split Panes
Solution 1:
The format of the file is Excel 2 (BIFF2).
However, since it wasn't created by Excel it seems to contain inconsistencies with the Excel BIFF2 spec.
For the file you show the WINDOW2 record is incorrect. You can workaround this by getting the current version of xlrd
(0.9.3) and applying the following patch:
diff --git a/xlrd/sheet.py b/xlrd/sheet.pyindex 36438a0..6d895c4 100644--- a/xlrd/sheet.py+++ b/xlrd/sheet.py@@ -1455,7 +1455,8 @@ class Sheet(BaseObject):
(self.first_visible_rowx, self.first_visible_colx,
self.automatic_grid_line_colour,
) = unpack("<HHB", data[5:10])
- self.gridline_colour_rgb = unpack("<BBB", data[10:13])+ if data_len > 10:+ self.gridline_colour_rgb = unpack("<BBB", data[10:13])
self.gridline_colour_index = nearest_colour_index(
self.book.colour_map, self.gridline_colour_rgb, debug=0)
self.cached_page_break_preview_mag_factor = 0 # default (60%)
Then install this version of the module or use it from your PYTHONPATH
since pandas uses xlrd
to read Excel files.
This still gives the Codepage warning but that is only a warning and you can use encoding_override='ascii'
(or whatever the correct encoding is but ascii is probably right).
Note, there may other issues in the file format, given that you have 40,000 files, but that was the only one I found in the file you provided.
Update: Based on the second example file it looks like the encoding is Windows CP-1252 so the following should work:
importxlrdworkbook= xlrd.open_workbook('harvest.xls', encoding_override='cp1252')
Solution 2:
I've previously used this with success to open old Excel files, give it a try:
Post a Comment for "Python Pandas Cannot Read Old Excel Files With Some Strange Encoding And Split Panes"