What's The Preferred Way To Include Unicode In Python Source Files?
Solution 1:
I think the most common way I've used (in Python 2) is:
# coding: utf-8
text = u'résumé'
- The text is readable. Compare to
text = u'r\u00e9sum\u00e9'
, where I must look up what character that is. Everything else is less readable. - If you're using Unicode, your variable is most certainly text and not binary data, so there's no point in keeping it in anything other than a
unicode
object. (Just in case'€'
became an option.)
from __future__ import unicode_literals
changes the parsing mode of the program; I think you'd need to be more aware of the difference between text & binary data. (Something that, if you ask me, most programmers are not good at.)
In large projects, it might be confusing to have the parsing mode change for just one file, so it's probably better as an all files or no files, so you don't need to refer to the file header. If you're in Python 2, the default is probably off unless you're also targetting Python 3. If you're targetting Python 2.5 or older¹, then it's not an option.
Most editors these days are Unicode-aware. That said, I have seen editors corrupt non-ASCII characters in files, but exceedingly rarely; if the author of such a commit doesn't review his code adequately, code review should catch this. (The diff will be painfully obvious.) It is not worth supporting these people: Unicode is here to stay; track them down and fix their set up. Of note, vim
handles Unicode just fine.
¹You should upgrade.
Post a Comment for "What's The Preferred Way To Include Unicode In Python Source Files?"