Skip to main content

Ralsina.Me — Roberto Alsina's website

Posts about python (old posts, page 71)

To write, and to write what.

Some of you may know I have writ­ten about 30% of a book, called "Python No Muerde", avail­able at http://no­muerde.net­man­ager­s.­com.ar (in span­ish on­ly).That book has stag­nat­ed for a long time.

On the oth­er hand, I wrote a very pop­u­lar se­ries of post­s, called PyQt by Ex­am­ple, which has (y­ou guessed it) stag­nat­ed for a long time.

The main prob­lem with the book was that I tried to cov­er way too much ground. When com­plete, it would be a 500 page book, and that would in­volve writ­ing half a dozen ex­am­ple app­s, some of them in ar­eas I am no ex­pert.

The main prob­lem with the post se­ries is that the ex­am­ple is lame (a TO­DO ap­p!) and ex­pand­ing it is bor­ing.

¡So, what bet­ter way to fix both things at on­ce, than to merge them!

I will leave Python No Muerde as it is, and will do a new book, called PyQt No Muerde. It will keep the tone and lan­guage of Python No Muerde, and will even share some chap­ter­s, but will fo­cus on de­vel­op­ing a PyQt app or two, in­stead of the much more am­bi­tious goals of Python No Muerde. It will be about 200 pages.

I have ac­quired per­mis­sion from my su­pe­ri­ors (my wife) to work on this project a cou­ple of hours a day, in the ear­ly morn­ing. So, it may move for­ward, or it may not. This is, as usu­al, an ex­per­i­men­t, not a prom­ise.

PyQt Quickie: Don't Get Garbage Collected

There is one area where Qt and Python (and in con­se­quence PyQt) have ma­jor dis­agree­ments. That area is mem­o­ry man­age­men­t.

While Qt has its own mech­a­nisms to han­dle ob­ject al­lo­ca­tion and dis­pos­al (the hi­er­ar­chi­cal QOb­ject trees, smart point­er­s, etc.), PyQt runs on Python, so it has garbage col­lec­tion.

Let's con­sid­er a sim­ple ex­am­ple:

from PyQt4 import QtCore

def finished():
    print "The process is done!"
    # Quit the app
    QtCore.QCoreApplication.instance().quit()

def launch_process():
    # Do something asynchronously
    proc = QtCore.QProcess()
    proc.start("/bin/sleep 3")
    # After it finishes, call finished
    proc.finished.connect(finished)

def main():
    app = QtCore.QCoreApplication([])
    # Launch the process
    launch_process()
    app.exec_()

main()

If you run this, this is what will hap­pen:

QProcess: Destroyed while process is still running.
The process is done!

Plus, the script never ends. Fun! The problem is that proc is being deleted at the end of launch_process because there are no more references to it.

Here is a bet­ter way to do it:

from PyQt4 import QtCore

processes = set([])

def finished():
    print "The process is done!"
    # Quit the app
    QtCore.QCoreApplication.instance().quit()

def launch_process():
    # Do something asynchronously
    proc = QtCore.QProcess()
    processes.add(proc)
    proc.start("/bin/sleep 3")
    # After it finishes, call finished
    proc.finished.connect(finished)

def main():
    app = QtCore.QCoreApplication([])
    # Launch the process
    launch_process()
    app.exec_()

main()

Here, we add a global processes set and add proc there so we always keep a reference to it. Now, the program works as intended. However, it still has an issue: we are leaking QProcess objects.

While in this case the leak is very short­-lived, since we are end­ing the pro­gram right af­ter the process end­s, in a re­al pro­gram this is not a good idea.

So, we would need to add a way to remove proc from processes in finished. This is not as easy as it may seem. Here is an idea that will not work as you expect:

def launch_process():
    # Do something asynchronously
    proc = QtCore.QProcess()
    processes.add(proc)
    proc.start("/bin/sleep 3")
    # Remove the process from the global set when done
    proc.finished.connect(lambda: processes.remove(proc))
    # After it finishes, call finished
    proc.finished.connect(finished)

In this version, we will still leak proc, even though processes is empty! Why? Because we are keeping a reference to proc in the lambda!

I don't really have a good answer for that that doesn't involve turning everything into members of a QObject and using sender to figure out what process is ending, or using QSignalMapper. That version is left as an exercise.

Garbage Collection Has Side Effects

Just a quick fol­lowup to The prob­lem is is, is it not? This is not mine, I got it from red­dit

This should re­al­ly not sur­prise you:

>>> a = [1,2]
>>> b = [3,4]
>>> a is b
False
>>> a == b
False
>>> id(a) == id(b)
False

Af­ter al­l, a and b are com­plete­ly dif­fer­ent things. How­ev­er:

>>> [1,2] is [3,4]
False
>>> [1,2] == [3,4]
False
>>> id([1,2]) == id([3,4])
True

Turns out that us­ing lit­er­al­s, one of those things is not like the oth­er­s.

First, the ex­pla­na­tion so you un­der­stand why this hap­pen­s. When you don't have any more ref­er­ences to a piece of data, it will get garbage col­lect­ed, the mem­o­ry will be freed, so it can be reused for oth­er things.

In the first case, I am keeping references to both lists in the variables a and b. That means the lists have to exist at all times, since I can always say print a and python has to know what's in it.

In the second case, I am using literals, which means there is no reference to the lists after they are used. When python evaluates id([1,2]) == id([3,4]) it first evaluates the left side of the ==. After that is done, there is no need to keep [1,2] available, so it's deleted. Then, when evaluating the right side, it creates [3,4].

By pure chance, it will use the exact same place for it as it was using for [1,2]. So id will return the same value. This is just to remind you of a couple of things:

  1. a is b is usu­al­ly (but not al­ways) the same as id(a) == id(b)

  2. garbage col­lec­­tion can cause side ef­­fects you may not be ex­pec­t­ing

The problem is is. Is it not?

This has been a re­peat­ed dis­cus­sion in the Python Ar­genti­na mail­ing list. Since it has not come up in a while, why not re­cap it, so the next time it hap­pens peo­ple can just link here.

Some peo­ple for some rea­son do this:

>>> a = 2
>>> b = 2
>>> a == b
True
>>> a is b
True

And then, when they do this, they are sur­prised:

>>> a = 1000
>>> b = 1000
>>> a == b
True
>>> a is b
False

They are sur­prised be­cause "2 is 2" makes more in­tu­itive sense than "1000 is not 1000". This could be at­trib­uted to an in­cli­na­tion to­wards pla­ton­is­m, but re­al­ly, it's be­cause they don't know what is is.

The is op­er­a­tor is (on CPython) sim­ply a mem­o­ry ad­dress com­par­i­son. if ob­jects a and b are the same ex­act chunk of mem­o­ry, then they "are" each oth­er. Since python pre-cre­ates a bunch of small in­te­gers, then ev­ery 2 you cre­ate is re­al­ly not a new 2, but the same 2 of last time.

This works be­cause of two things:

  1. In­­te­gers are read­­-on­­ly ob­­jec­t­s. You can have as many var­i­ables "hold­ing" the same 2, be­­cause they can't break it.

  2. In python, as­sign­­ment is just alias­ing. You are not mak­ing a copy of 2 when you do a = 2, you are just say­ing "a is an­oth­er name for this 2 here".

This is sur­pris­ing for peo­ple com­ing from oth­er lan­guages, like, say, C or C++. In those lan­guages, a vari­able int a will nev­er use the same mem­o­ry space as an­oth­er vari­able int b be­cause a and b are names for spe­cif­ic bytes of mem­o­ry, and you can change the con­tents of those bytes. On C and C++, in­te­gers are a mu­ta­ble type. This 2 is not that 2, un­less you do it in­ten­tion­al­ly us­ing point­er­s.

In fac­t, the way as­sign­ment works on Python al­so leads to oth­er sur­pris­es, more in­ter­est­ing in re­al life. For ex­am­ple, look at this ses­sion:

>>> def f(s=""):
...     s+='x'
...     return s
...
>>> f()
'x'
>>> f()
'x'
>>> f()
'x'

That is re­al­ly not sur­pris­ing. Now, let's make a very small change:

>>> def f(l=[]):
...     l.append('x')
...     return l
...
>>> f()
['x']
>>> f()
['x', 'x']
>>> f()
['x', 'x', 'x']

And that is, for some­one who has not seen it be­fore, sur­pris­ing. It hap­pens be­cause lists are a mu­ta­ble type. The de­fault ar­gu­ment is de­fined when the func­tion is parsed, and ev­ery time you call f() you are us­ing and re­turn­ing the same l. Be­fore, you were al­so us­ing al­ways the same s but since strings are im­mutable, it nev­er changed, and you were re­turn­ing a new string each time.

You could check that I am telling you the truth, us­ing is, of course. And BTW, this is not a prob­lem just for list­s. It's a prob­lem for ob­jects of ev­ery class you cre­ate your­self, un­less you both­er mak­ing it im­mutable some­how. So let's be care­ful with de­fault ar­gu­ments, ok?

But the main problem about finding the original 1000 is not 1000 thing surprising is that, in truth, it's uninteresting. Integers are fungible. You don't care if they are the same integer, you only really care that they are equal.

Test­ing for in­te­ger iden­ti­ty is like wor­ry­ing, af­ter you loan me $1, about whether I re­turn you a dif­fer­ent or the same $1 coin. It just does­n't mat­ter. What you want is just a $1 coin, or a 2, or a 1000.

Al­so, the re­sult of 2 is 2 is im­ple­men­ta­tion de­pen­den­t. There is no rea­son, be­yond an op­ti­miza­tion, for that to be True.

Hop­ing this was clear, let me give you a last snip­pet:

>>> a = float('NaN')
>>> a is a
True
>>> a == a
False

UP­DATE: lots of fun and in­ter­est­ing com­ments about this post at red­dit and a small fol­lowup here


Contents © 2000-2020 Roberto Alsina