Garbage collection in Python
Garbage collector aims at collecting objects occupying room in memory but that will be unlikely to be used since nothing make references to their address.
Garbage collection is a feature simplifying programming and improving
robustness of the code dramatically. As a high-level programming
language, Python has a garbage collector and provide a standard module,
gc
that allows to interact with it.
Prelude, psutil
and context manager
The module psutil
provides tools to monitor available memory on many different systems. I
will have a basic usage of it to monitor memory usage of some Python
code.
Context manager are a great way to encapsulate code and ensure that some initialization and termination of the code are well done. Since the system is not idle, we are not really measuring the space taken by the python object. The precision is low (approximately around 10Mb).
import psutil
class memory_usage():
def __enter__(self):
self.free = psutil.virtual_memory().free
def __exit__(self, *args):
= self.free - psutil.virtual_memory().free
memory_usage print(f"Memory usage {10*(memory_usage//10**7)} Mb")
with memory_usage():
= list(range(1000000)) L
We can play a bit with memory_usage
to have some idea of
what is costly is Python. We compare here respectively the cost of
storing 1.000.000 of dict, list, and tuple.
with memory_usage():
= [{} for _ in range(1000000)] K
with memory_usage():
= [[] for _ in range(1000000)] R
We can also check the impact of delete pointer reference.
with memory_usage():
del L
We don’t really have to delete the list itself. It will be automatically delete if we reuse the reference.
with memory_usage():
= R = 3 K
We can also check that reference within the scope of a function are dealt-with appropriately.
def f():
= [{} for _ in range(1000000)]
L
with memory_usage():
f()
Small application of
gc
As we have seen, Garbage collector do a substantial amount of work to avoid programmers to think about memory management. However, this work requires to find compromise between automatic memory management and code performance.
The garbage collector collects explicit references to python object and delete object with no references. It is simple enough to be implemented efficiently but cannot detect cycle references. Those cycle references will never be free. It can happen in many way, lets look at some examples.
with memory_usage():
= [ [] for _ in range(1000000) ]
L for K in L:
# self reference! K.append(K)
We have a list of 1.000.000 list self referring to them self. If we delete the list L, nothing is freed.
with memory_usage():
del L
The memory used by those list is gone and will never be freed except at the termination of the whole script.
If such an example is unlikely to happen in real code, it can appears without noticing it. Let say we want to implement a very simple double linked list with only a append operation (don’t do that, it is a really inefficient implementation)
class DoubleLinkList:
def __init__(self, v):
self.pred = None
self.succ = None
= v
value self.useless_memory = list(range(10000))
def append(self, v):
= self
succ while succ.succ:
= succ.succ
succ = DoubleLinkList(v)
succ.succ = succ
succ.succ.pred
with memory_usage():
= DoubleLinkList(0)
D for i in range(1, 1000):
D.append(i)
The memory used by DoubleLinkList
is doomed. Even if
explicitly free it.
with memory_usage():
del D
Here, I used some useless field to make the object consume more
memory. Tracking memory like this is rather crude. We can’t really see
small memory leakage. Here come the gc
module. It can
explicitly collect all the object within the memory and reference to
them.
For instance, I can check how many list are in my current memory.
import gc
print(len(list(filter(lambda e:type(e) is list, gc.get_objects()))))
= [] # Lets add one more list
L print("One more", len(list(filter(lambda e:type(e) is list, gc.get_objects()))))
del L # And delete it
print("It is deleted", len(list(filter(lambda e:type(e) is list, gc.get_objects()))))
Lets package a bit gc
for convenience:
def typeCount(some_type):
return len(list(filter(lambda e:type(e) is some_type, gc.get_objects())))
A stupid class
Lets say we create a really stupid class. The only thing every instance does is to point to itself.
class IAmStupid:
def __init__(self):
self.self = self
= [ IAmStupid() for _ in range(1000) ]
L
print(f"{typeCount(IAmStupidd)} Stupid Class")
del L
print("They can't be remove, so they are still:{typeCount(IAmStupid)}")
A more surprising example
So far we have seen how circular dependencies could be harmful to automatic management system. Those circular dependencies we have seen so far where obvious.
It can happen surprisingly.
class Closure:
def __init__(self, v):
self.v = v
self.func = lambda e: self.v * e
This class show a pattern that can actually be use in real life context. We define a method on the go at object initialization. It is not a good practice but can be convenient.
As you probably already guess, this class act like the stupid one above.
= Closure(0)
C
print(f"{typeCount(Closure)} Closured Class")
del C
print(f"They can't be remove, so they are still: {typeCount(Closure)}")
Here the catch came from the lambda
closure itself. In
its scope it encompass self
and thus point toward it. Many
more tricky example could exists. I would be happy to add them, so don’t
hesitate to mail me.
Some investigation of the closure itself
The catch here is to understand what a lambda
closure
mean. When you define a function with a lambda, it can access to the
whole environment. When you call the function, the environment could
have change and change the output of the lambda function.
When the scope of the lambda definition is global, it respect lazyness of Python and as expected, it follows variables name and not their value.
For instance:
= y = 1
x = lambda: x*y
f = 0
x print(f())
Within the scope of a function, the environment is no longer global and make the analysis a bit more tricky. For instance,
def some_func():
= 0
x = 1
y = lambda: x * y
f return f
= some_func()
other_func print(other_func())
= 1
x print(other_func())
What this code show is that the lambda
function has
somehow to remember its context. It can be further explore through two
methods of the function __closure__
and
__code__
.
print(other_func.__code__.co_freevars, *other_func.__closure__)
If we go back to the Closure
class, we can see
explicitly the reference within the closure.
= Closure(0)
C print(C.func.__closure__, C.func.__code__.co_freevars)
del C
Automatic cleanup
I have said before that cyclic reference is wasting memory. It is actually not exactly the case. The garbage collector will eventually collect the memory, in particular if the main thread is idle.
It is possible to trigger it explicitly with the method
gc.collect
.
print(f"Instance of Closure {typeCount(Closure)}")
= Closure(1)
D print(f"Instance of Closure {typeCount(Closure)}")
del D
print(f"Instance of Closure {typeCount(Closure)}")
gc.collect()print(f"Instance of Closure {typeCount(Closure)}")
Compiled the: dim. 07 janv. 2024 23:19:29 CET