Properly Getting Blobs From Mysql Database With Mysql Connector In Python
Solution 1:
We ran into the same issue that BLOBs were mistakenly read back as UTF-8 strings with MySQL 8.0.13, mysql-connector-python 8.0.13 and sqlalchemy 1.2.14.
What did the trick for us was enabling the use_pure
option of MySQL Connector. The default of use_pure
had changed in 8.0.11 with the new default being to use the C Extension. Thus, we set back the option:
create_engine(uri, connect_args={'use_pure': True}, ...)
Details of our error and stack trace:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 1: invalid start byte
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
....
File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 272, in execute
self._handle_result(result)
File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 163, in _handle_result
self._handle_resultset()
File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 651, in _handle_resultset
self._rows = self._cnx.get_rows()[0]
File "/usr/local/lib/python3.6/site-packages/mysql/connector/connection_cext.py", line 273, in get_rows
row = self._cmysql.fetch_row()
SystemError: <built-in method fetch_row of _mysql_connector.MySQL object at 0x5627dcfdf9f0> returned a result with an error set
Solution 2:
I reproduced above error:
Traceback (most recent call last):
File "demo.py", line 16, in <module>
cursor.execute(query, ())
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte '0xff ... '
in position 0: invalid start byte
Using versions:
$ python --version
Python 2.7.10
>>> mysql.connector.__version__
'8.0.15'
With python code
#!/usr/bin/python
# -*- coding: utf-8 -*-
import mysql.connector
conn = mysql.connector.connect(
user='asdf',
password='asdf',
host='1.2.3.4',
database='the_db',
connect_timeout=10)
cursor = conn.cursor(buffered=True) #error is raised here
try:
query = ("SELECT data_blob FROM blog.cmd_table")
cursor.execute(query, ())
except mysql.connector.Error as err: #error is caught here
#error is caught here, and printed:
print(err) #printed thustly
Using a python variable "raw byte binary" populated by python's open(
like this:
def read_file_as_blob(filename):
#r stands for read
#b stands for binary
with open(filename, 'rb') as f:
data = f.read()
return data
So the problem is somewhere between the encoding transform of data in the file -> the encoding of data for mysql blob -> and how mysql lifts that blob and converts it back to utf-8.
Two solutions:
Solution 1 is exactly as AHalvar said, set use_pure=True
parameter and pass to mysql.connector.connect( ... )
. Then mysteriously it just works. But good programmers will note that deferring to mysterious incantation is a bad code smell. Fixes by brownian motion incur technical debt.
Solution 2 is to encode your data early and often, and prevent double re-encoding and double data decoding which is the source of these problems. Lock it down to a common encoding format as soon as possible.
The gratifying solution for me was forcing utf-8 encoding earlier in the process. Enforcing UTF-8 everywhere.
data.encode('UTF-8')
The unicode pile of poo represents my opinion on such babysitting of character encoding between various devices on different operating systems and encoding schemes.
Solution 3:
Apparently, this is a known issue with the Python 'mysql' module. Try to use 'pymysql' instead.
Solution 4:
Another way is to use raw=True
parameter at initialization of connection:
connection = mysql.connector.connect(
host="localhost",
user="user",
password="password",
database="database",
raw=True
)
Post a Comment for "Properly Getting Blobs From Mysql Database With Mysql Connector In Python"