Skip to content Skip to sidebar Skip to footer

What's The Preferred Way To Include Unicode In Python Source Files?

When using unicode strings in source code, there seems to be many ways to skin a cat. The docs and the relevant PEPs have plenty of information about what's possible, but are scan

Solution 1:

I think the most common way I've used (in Python 2) is:

# coding: utf-8

text = u'résumé'
  • The text is readable. Compare to text = u'r\u00e9sum\u00e9', where I must look up what character that is. Everything else is less readable.
  • If you're using Unicode, your variable is most certainly text and not binary data, so there's no point in keeping it in anything other than a unicode object. (Just in case '€' became an option.)

from __future__ import unicode_literals changes the parsing mode of the program; I think you'd need to be more aware of the difference between text & binary data. (Something that, if you ask me, most programmers are not good at.)

In large projects, it might be confusing to have the parsing mode change for just one file, so it's probably better as an all files or no files, so you don't need to refer to the file header. If you're in Python 2, the default is probably off unless you're also targetting Python 3. If you're targetting Python 2.5 or older¹, then it's not an option.

Most editors these days are Unicode-aware. That said, I have seen editors corrupt non-ASCII characters in files, but exceedingly rarely; if the author of such a commit doesn't review his code adequately, code review should catch this. (The diff will be painfully obvious.) It is not worth supporting these people: Unicode is here to stay; track them down and fix their set up. Of note, vim handles Unicode just fine.

¹You should upgrade.

Post a Comment for "What's The Preferred Way To Include Unicode In Python Source Files?"