Search code examples
pythonautomated-testsstatic-analysisenforcement

Python 2.x: how to automate enforcing unicode instead of string?


How can I automate a test to enforce that a body of Python 2.x code contains no string instances (only unicode instances)?

Eg.

Can I do it from within the code?

Is there a static analysis tool that has this feature?

Edit:

I wanted this for an application in Python 2.5, but it turns out this is not really possible because:

  1. 2.5 doesn't support unicode_literals
  2. kwargs dictionary keys can't be unicode objects, only strings

So I'm accepting the answer that says it's not possible, even though it's for different reasons :)


Solution

  • You can't enforce that all strings are Unicode; even with from __future__ import unicode_literals in a module, byte strings can be written as b'...', as they can in Python 3.

    There was an option that could be used to get the same effect as unicode_literals globally: the command-line option -U. However it was abandoned early in the 2.x series because it basically broke every script.

    What is your purpose for this? It is not desirable to abolish byte strings. They are not “bad” and Unicode strings are not universally “better”; they are two separate animals and you will need both of them. Byte strings will certainly be needed to talk to binary files and network services.

    If you want to be prepared to transition to Python 3, the best tack is to write b'...' for all the strings you really mean to be bytes, and u'...' for the strings that are inherently Unicode. The default string '...' format can be used for everything else, places where you don't care and/or whether Python 3 changes the default string type.