Confused About `is` Operator With Strings
Solution 1:
I believe it has to do with string interning. In essence, the idea is to store only a single copy of each distinct string, to increase performance on some operations.
Basically, the reason why a is b
works is because (as you may have guessed) there is a single immutable string that is referenced by Python in both cases. When a string is large (and some other factors that I don't understand, most likely), this isn't done, which is why your second example returns False.
EDIT: And in fact, the odd behavior seems to be a side-effect of the interactive environment. If you take your same code and place it into a Python script, both a is b
and ktr is ptr
return True.
a="poi"
b="poi"
print a is b # Prints 'True'
ktr = "today is a fine day"
ptr = "today is a fine day"
print ktr is ptr # Prints 'True'
This makes sense, since it'd be easy for Python to parse a source file and look for duplicate string literals within it. If you create the strings dynamically, then it behaves differently even in a script.
a="p" + "oi"
b="po" + "i"
print a is b # Oddly enough, prints 'True'
ktr = "today is" + " a fine day"
ptr = "today is a f" + "ine day"
print ktr is ptr # Prints 'False'
As for why a is b
still results in True, perhaps the allocated string is small enough to warrant a quick search through the interned collection, whereas the other one is not?
Solution 2:
is
is identity testing. It will work on smaller some strings(because of cache) but not on bigger other strings. Since str is NOT a ptr. [thanks erykson]
See this code:
>>>import dis>>>deffun():...str = 'today is a fine day'... ptr = 'today is a fine day'...return (stris ptr)...>>>dis.dis(fun)
2 0 LOAD_CONST 1 ('today is a fine day')
3 STORE_FAST 0 (str)
3 6 LOAD_CONST 1 ('today is a fine day')
9 STORE_FAST 1 (ptr)
4 12 LOAD_FAST 0 (str)
15 LOAD_FAST 1 (ptr)
18 COMPARE_OP 8 (is)
21 RETURN_VALUE
>>>id(str)
26652288
>>>id(ptr)
27604736
#hence this comparison returns false: ptr is str
Notice the IDs of str
and ptr
are different.
BUT:
>>>x = "poi">>>y = "poi">>>id(x)
26650592
>>>id(y)
26650592
#hence this comparison returns true : x is y
IDs of x and y are the same. Hence is
operator works on "ids" and not on "equalities"
See the below link for a discussion on when and why python will allocate a different memory location for identical strings(read the question as well).
When does python allocate new memory for identical strings
Also sys.intern
on python3.x and intern
on python2.x should help you allocate the strings in the same memory location, regardless of the size of the string.
Solution 3:
is
is not the same as ==
.
Basically, is
checks if the two objects are the same, while ==
compares the values of those objects (strings, like everything in python, are objects).
So you should use is
when you really know what objects you're looking at (ie. you've made the objects, or are comparing with None
as the question comments point out), and you want to know if two variables are referencing the exact same object in memory.
In your examples, however, you're looking at str
objects that python is handling behind the scenes, so without diving deep into how python works, you don't really know what to expect. You would have the same problem with int
s or float
s. Other answers do a good job of explaining the "behind the scenes" stuff (string interning), but you mostly shouldn't have to worry about it in day-to-day programming.
Solution 4:
Note that this is a CPython specific optimization. If you want your code to be portable, you should avoid it. For example, in PyPy
>>>> a = "hi">>>> b = "hi">>>> a is b
False
It's also worth pointing out that a similar thing happens for small integers
>>>a = 12>>>b = 12>>>a is b
True
which again you should not rely on, because other implementations might not include this optimization.
Post a Comment for "Confused About `is` Operator With Strings"