Skip to main content

Ralsina.Me — Roberto Alsina's website

The problem is is. Is it not?

This has been a re­peat­ed dis­cus­sion in the Python Ar­genti­na mail­ing list. Since it has not come up in a while, why not re­cap it, so the next time it hap­pens peo­ple can just link here.

Some peo­ple for some rea­son do this:

>>> a = 2
>>> b = 2
>>> a == b
>>> a is b

And then, when they do this, they are sur­prised:

>>> a = 1000
>>> b = 1000
>>> a == b
>>> a is b

They are sur­prised be­cause "2 is 2" makes more in­tu­itive sense than "1000 is not 1000". This could be at­trib­uted to an in­cli­na­tion to­wards pla­ton­is­m, but re­al­ly, it's be­cause they don't know what is is.

The is op­er­a­tor is (on CPython) sim­ply a mem­o­ry ad­dress com­par­i­son. if ob­jects a and b are the same ex­act chunk of mem­o­ry, then they "are" each oth­er. Since python pre-cre­ates a bunch of small in­te­gers, then ev­ery 2 you cre­ate is re­al­ly not a new 2, but the same 2 of last time.

This works be­cause of two things:

  1. In­­te­gers are read­­-on­­ly ob­­jec­t­s. You can have as many var­i­ables "hold­ing" the same 2, be­­cause they can't break it.

  2. In python, as­sign­­ment is just alias­ing. You are not mak­ing a copy of 2 when you do a = 2, you are just say­ing "a is an­oth­er name for this 2 here".

This is sur­pris­ing for peo­ple com­ing from oth­er lan­guages, like, say, C or C++. In those lan­guages, a vari­able int a will nev­er use the same mem­o­ry space as an­oth­er vari­able int b be­cause a and b are names for spe­cif­ic bytes of mem­o­ry, and you can change the con­tents of those bytes. On C and C++, in­te­gers are a mu­ta­ble type. This 2 is not that 2, un­less you do it in­ten­tion­al­ly us­ing point­er­s.

In fac­t, the way as­sign­ment works on Python al­so leads to oth­er sur­pris­es, more in­ter­est­ing in re­al life. For ex­am­ple, look at this ses­sion:

>>> def f(s=""):
...     s+='x'
...     return s
>>> f()
>>> f()
>>> f()

That is re­al­ly not sur­pris­ing. Now, let's make a very small change:

>>> def f(l=[]):
...     l.append('x')
...     return l
>>> f()
>>> f()
['x', 'x']
>>> f()
['x', 'x', 'x']

And that is, for some­one who has not seen it be­fore, sur­pris­ing. It hap­pens be­cause lists are a mu­ta­ble type. The de­fault ar­gu­ment is de­fined when the func­tion is parsed, and ev­ery time you call f() you are us­ing and re­turn­ing the same l. Be­fore, you were al­so us­ing al­ways the same s but since strings are im­mutable, it nev­er changed, and you were re­turn­ing a new string each time.

You could check that I am telling you the truth, us­ing is, of course. And BTW, this is not a prob­lem just for list­s. It's a prob­lem for ob­jects of ev­ery class you cre­ate your­self, un­less you both­er mak­ing it im­mutable some­how. So let's be care­ful with de­fault ar­gu­ments, ok?

But the main problem about finding the original 1000 is not 1000 thing surprising is that, in truth, it's uninteresting. Integers are fungible. You don't care if they are the same integer, you only really care that they are equal.

Test­ing for in­te­ger iden­ti­ty is like wor­ry­ing, af­ter you loan me $1, about whether I re­turn you a dif­fer­ent or the same $1 coin. It just does­n't mat­ter. What you want is just a $1 coin, or a 2, or a 1000.

Al­so, the re­sult of 2 is 2 is im­ple­men­ta­tion de­pen­den­t. There is no rea­son, be­yond an op­ti­miza­tion, for that to be True.

Hop­ing this was clear, let me give you a last snip­pet:

>>> a = float('NaN')
>>> a is a
>>> a == a

UP­DATE: lots of fun and in­ter­est­ing com­ments about this post at red­dit and a small fol­lowup here

toyg / 2012-01-30 11:00:

This is actually an issue in PyCharm, the JetBrains IDE for Python, which (from what I can see) will always nudge you to use "is" for any integer comparison, as it's "more readable" (which it is, but correctness will always trump readability).

BTW, the "bunch of small integers" is exactly 256, at least on Windows.

Benjamin Kloster / 2012-01-31 07:12:

I use PyCharm extensively, and I've never seen it suggest the "is" operator for integer comparison.

toyg / 2012-01-31 09:30:

maybe you turned off some option? i'm pretty sure it does suggest it. EDIT: see http://lateral.netmanagers....

Guest / 2012-01-31 16:58:

I think you're talking about the use of "and" or "or" instead of "&&" and "||".

toyg / 2012-01-31 23:14:

Actually, I went back to check and I think what I meant is the warning "boolean variable check can be simplified" (under Settings -> Inspectors -> Python), i.e. if int(someVar) == 0
In those cases, PyCharm suggests to switch from == to "is".

John Lenton / 2012-01-30 14:19:

Remind me not to lend you $1... :-)

jjconti / 2012-01-30 20:45:

Jaja, el último snippet es como un remate de standup.

vegai / 2012-01-31 08:40:

What's the benefit of pre-creating all 8-bit integers?

Roberto Alsina / 2012-01-31 17:48:

You use them often, and creating objects is kinda expensive.

shabbyrobe / 2012-01-31 12:21:

This is an excellent illustration of two of the most eye-popping WTFs Python has to offer. For a language that potentially has so much to offer beginners, total catastrophe gotchas like these simply shouldn't exist. A lot of us may well have a computer science background, but a language that comes so close to being incredibly useful without one shouldn't make it a requirement.

Roberto Alsina / 2012-01-31 17:49:

While I would not have called is "is", there is really little to be done about the mutable default argument issue without making it less useful, AFAICS