I have searched the web and stack overflow questions but been unable to find an answer to this question. The observation that I've made is that in Python 2.7.3, if you assign two variables the same single character string, e.g.
>>> a = 'a'
>>> b = 'a'
>>> c = ' '
>>> d = ' '
Then the variables will share the same reference:
>>> a is b
True
>>> c is d
True
This is also true for some longer strings:
>>> a = 'abc'
>>> b = 'abc'
>>> a is b
True
>>> ' ' is ' '
True
>>> ' ' * 1 is ' ' * 1
True
However, there are a lot of cases where the reference is (unexpectantly) not shared:
>>> a = 'a c'
>>> b = 'a c'
>>> a is b
False
>>> c = ' '
>>> d = ' '
>>> c is d
False
>>> ' ' * 2 is ' ' * 2
False
Can someone please explain the reason for this?
I suspect there might be simplifications/substitutions made by the interpreter and/or some caching mechanism that makes use of the fact that python strings are immutable to optimize in some special cases, but what do I know? I tried making deep copies of strings using the str constructor and the copy.deepcopy function but the strings still inconsistently share references.
The reason I'm having problems with this is because I check for inequality of references to strings in some unit tests I'm writing for clone methods of new-style python classes.
The details of when strings are cached and reused are implementation-dependent, can change from Python version to Python version and cannot be relied upon. If you want to check strings for equality, use ==
, not is
.
In CPython (the most commonly-used Python implementation), string literals consisting only of ASCII letters, digits, and underscores are interned, so if such a string literal occurs twice in the source code, they will end up pointing to the same string object. In Python 2.x, you can also call the built-in function intern()
to force that a particular string is interned, but you actually shouldn't do so.
Edit regarding you actual aim of checking whether attributes are improperly shared between instances: This kind of check is only useful for mutable objects. For attributes of immutable type, there is no semantic difference between shared and unshared objects. You could exclude immutable types from your tests by using
Immutable = basestring, tuple, numbers.Number, frozenset
# ...
if not isinstance(x, Immutable): # Exclude types known to be immutable
Note that this would also exclude tuples that contain mutable objects. If you wanted to test those, you would need to recursively descend into tuples.