Load A Json With Raw_unicode_escape Encoded Strings
I have a JSON file where strings are encoded in raw_unicode_escape (the file itself is UTF-8). How do I parse it so that strings will be UTF-8 in memory? For individual properties,
Solution 1:
Since codecs.open('file.json', 'r', 'raw_unicode_escape')
works somehow, I took a look at its source code and came up with a solution.
>>> from codecs import getreader
>>>
>>> withopen('file.json', 'r') asinput:
... reader = getreader('raw_unicode_escape')(input)
... j = json.loads(reader.read().encode('raw_unicode_escape'))
... print(j['name'])
...
è
Of course, that will work even if input is another type of file-like object, like a file inside a zip archive in my case.
Eventually, I've turned down the hypothesis of an incremental encoder (it doesn't make sense with JSONs, see), but for those interested I suggest taking a look at this answer as well as codecs.iterencode()
.
Post a Comment for "Load A Json With Raw_unicode_escape Encoded Strings"