Skip to content Skip to sidebar Skip to footer

Load A Json With Raw_unicode_escape Encoded Strings

I have a JSON file where strings are encoded in raw_unicode_escape (the file itself is UTF-8). How do I parse it so that strings will be UTF-8 in memory? For individual properties,

Solution 1:

Since codecs.open('file.json', 'r', 'raw_unicode_escape') works somehow, I took a look at its source code and came up with a solution.

>>> from codecs import getreader
>>>
>>> withopen('file.json', 'r') asinput:
...     reader = getreader('raw_unicode_escape')(input)
...     j = json.loads(reader.read().encode('raw_unicode_escape'))
... print(j['name'])
...
è

Of course, that will work even if input is another type of file-like object, like a file inside a zip archive in my case.

Eventually, I've turned down the hypothesis of an incremental encoder (it doesn't make sense with JSONs, see), but for those interested I suggest taking a look at this answer as well as codecs.iterencode().

Post a Comment for "Load A Json With Raw_unicode_escape Encoded Strings"