The problem is is. Is it not?

This has been a re­peat­ed dis­cus­sion in the Python Ar­genti­na mail­ing list. Since it has not come up in a while, why not re­cap it, so the next time it hap­pens peo­ple can just link here.

Some peo­ple for some rea­son do this:

>>> a = 2
>>> b = 2
>>> a == b
True
>>> a is b
True

And then, when they do this, they are sur­prised:

>>> a = 1000
>>> b = 1000
>>> a == b
True
>>> a is b
False

They are sur­prised be­cause “2 is 2” makes more in­tu­itive sense than “1000 is not 1000”. This could be at­trib­uted to an in­cli­na­tion to­wards pla­ton­is­m, but re­al­ly, it’s be­cause they don’t know what is is.

The is op­er­a­tor is (on CPython) sim­ply a mem­o­ry ad­dress com­par­i­son. if ob­jects a and b are the same ex­act chunk of mem­o­ry, then they “are” each oth­er. Since python pre-cre­ates a bunch of small in­te­gers, then ev­ery 2 you cre­ate is re­al­ly not a new 2, but the same 2 of last time.

This works be­cause of two things:

  1. In­te­gers are read­-on­ly ob­ject­s. You can have as many vari­ables “hold­ing” the same 2, be­cause they can’t break it.
  2. In python, as­sign­ment is just alias­ing. You are not mak­ing a copy­ of 2 when you do a = 2, you are just say­ing “a is an­oth­er name for this 2 here”.

This is sur­pris­ing for peo­ple com­ing from oth­er lan­guages, like, say, C or C++. In those lan­guages, a vari­able int a will nev­er use the same mem­o­ry space as an­oth­er vari­able int b be­cause a and b are names for spe­cif­ic bytes of mem­o­ry, and you can change the con­tents of those bytes. On C and C++, in­te­gers are a mu­ta­ble type. This 2 is not that 2, un­less you do it in­ten­tion­al­ly us­ing point­er­s.

In fac­t, the way as­sign­ment works on Python al­so leads to oth­er sur­pris­es, more in­ter­est­ing in re­al life. For ex­am­ple, look at this ses­sion:

>>> def f(s=""):
...     s+='x'
...     return s
...
>>> f()
'x'
>>> f()
'x'
>>> f()
'x'

That is re­al­ly not sur­pris­ing. Now, let’s make a very small change:

>>> def f(l=[]):
...     l.append('x')
...     return l
...
>>> f()
['x']
>>> f()
['x', 'x']
>>> f()
['x', 'x', 'x']

And that is, for some­one who has not seen it be­fore, sur­pris­ing. It hap­pens be­cause lists are a mu­ta­ble type. The de­fault ar­gu­ment is de­fined when the func­tion is parsed, and ev­ery time you call f() you are us­ing and re­turn­ing the same l. Be­fore, you were al­so us­ing al­ways the same s but since strings are im­mutable, it nev­er changed, and you were re­turn­ing a new string each time.

You could check that I am telling you the truth, us­ing is, of course. And BTW, this is not a prob­lem just for list­s. It’s a prob­lem for ob­jects of ev­ery class you cre­ate your­self, un­less you both­er mak­ing it im­mutable some­how. So let’s be care­ful with de­fault ar­gu­ments, ok?

But the main prob­lem about find­ing the orig­i­nal 1000 is not 1000 thing sur­pris­ing is that, in truth, it’s un­in­ter­est­ing. In­te­gers are fun­gi­ble. You don’t care if they are the same in­te­ger, you on­ly re­al­ly care that they are equal.

Test­ing for in­te­ger iden­ti­ty is like wor­ry­ing, af­ter you loan me $1, about whether I re­turn you a dif­fer­ent or the same $1 coin. It just does­n’t mat­ter. What you want is just a $1 coin, or a 2, or a 1000.

Al­so, the re­sult of 2 is 2 is im­ple­men­ta­tion de­pen­den­t. There is no rea­son, be­yond an op­ti­miza­tion, for that to be True.

Hop­ing this was clear, let me give you a last snip­pet:

>>> a = float('NaN')
>>> a is a
True
>>> a == a
False

UP­DATE: lots of fun and in­ter­est­ing com­ments about this post at red­dit and a small fol­lowup here

Comments

Comments powered by Disqus
Contents © 2000-2013 Roberto Alsina
Share