Skip to main content

Ralsina.Me — Roberto Alsina's website

The problem is is. Is it not?

This has been a re­peat­ed dis­cus­sion in the Python Ar­genti­na mail­ing list. Since it has not come up in a while, why not re­cap it, so the next time it hap­pens peo­ple can just link here.

Some peo­ple for some rea­son do this:

>>> a = 2
>>> b = 2
>>> a == b
True
>>> a is b
True

And then, when they do this, they are sur­prised:

>>> a = 1000
>>> b = 1000
>>> a == b
True
>>> a is b
False

They are sur­prised be­cause "2 is 2" makes more in­tu­itive sense than "1000 is not 1000". This could be at­trib­uted to an in­cli­na­tion to­wards pla­ton­is­m, but re­al­ly, it's be­cause they don't know what is is.

The is op­er­a­tor is (on CPython) sim­ply a mem­o­ry ad­dress com­par­i­son. if ob­jects a and b are the same ex­act chunk of mem­o­ry, then they "are" each oth­er. Since python pre-cre­ates a bunch of small in­te­gers, then ev­ery 2 you cre­ate is re­al­ly not a new 2, but the same 2 of last time.

This works be­cause of two things:

  1. In­­te­gers are read­­-on­­ly ob­­jec­t­s. You can have as many var­i­ables "hold­ing" the same 2, be­­cause they can't break it.

  2. In python, as­sign­­ment is just alias­ing. You are not mak­ing a copy­­ of 2 when you do a = 2, you are just say­ing "a is an­oth­er name for this 2 here".

This is sur­pris­ing for peo­ple com­ing from oth­er lan­guages, like, say, C or C++. In those lan­guages, a vari­able int a will nev­er use the same mem­o­ry space as an­oth­er vari­able int b be­cause a and b are names for spe­cif­ic bytes of mem­o­ry, and you can change the con­tents of those bytes. On C and C++, in­te­gers are a mu­ta­ble type. This 2 is not that 2, un­less you do it in­ten­tion­al­ly us­ing point­er­s.

In fac­t, the way as­sign­ment works on Python al­so leads to oth­er sur­pris­es, more in­ter­est­ing in re­al life. For ex­am­ple, look at this ses­sion:

>>> def f(s=""):
...     s+='x'
...     return s
...
>>> f()
'x'
>>> f()
'x'
>>> f()
'x'

That is re­al­ly not sur­pris­ing. Now, let's make a very small change:

>>> def f(l=[]):
...     l.append('x')
...     return l
...
>>> f()
['x']
>>> f()
['x', 'x']
>>> f()
['x', 'x', 'x']

And that is, for some­one who has not seen it be­fore, sur­pris­ing. It hap­pens be­cause lists are a mu­ta­ble type. The de­fault ar­gu­ment is de­fined when the func­tion is parsed, and ev­ery time you call f() you are us­ing and re­turn­ing the same l. Be­fore, you were al­so us­ing al­ways the same s but since strings are im­mutable, it nev­er changed, and you were re­turn­ing a new string each time.

You could check that I am telling you the truth, us­ing is, of course. And BTW, this is not a prob­lem just for list­s. It's a prob­lem for ob­jects of ev­ery class you cre­ate your­self, un­less you both­er mak­ing it im­mutable some­how. So let's be care­ful with de­fault ar­gu­ments, ok?

But the main problem about finding the original 1000 is not 1000 thing surprising is that, in truth, it's uninteresting. Integers are fungible. You don't care if they are the same integer, you only really care that they are equal.

Test­ing for in­te­ger iden­ti­ty is like wor­ry­ing, af­ter you loan me $1, about whether I re­turn you a dif­fer­ent or the same $1 coin. It just does­n't mat­ter. What you want is just a $1 coin, or a 2, or a 1000.

Al­so, the re­sult of 2 is 2 is im­ple­men­ta­tion de­pen­den­t. There is no rea­son, be­yond an op­ti­miza­tion, for that to be True.

Hop­ing this was clear, let me give you a last snip­pet:

>>> a = float('NaN')
>>> a is a
True
>>> a == a
False

UP­DATE: lots of fun and in­ter­est­ing com­ments about this post at red­dit and a small fol­lowup here

Comments

Comments powered by Disqus