Reiterability
With Bruno Guillon we developed a nice class we call reiterable within a larger (and hopefully soon to be disclosed) project.
For people not at ease with iterators and generators, you can read an introduction here or the official documentation here
An example of Generator
Lets say that you are into number theory and you want to manipulate the sequences of prime numbers. Of course it is infinite, but nevertheless you can manipulate an iterator over all primes. For instance:
def primes():
= []
Primes = 2
n while True:
if all(n % p for p in Primes):
Primes.append(n)yield n
+= 1
n
= primes()
P # P is an iterator over all primes!
for i in range(10):
print(next(P), end=", ")
2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
The iterator is useful itself since we can map or filter it:
= primes()
P = filter(lambda e:e%4 == 3, P)
P2 # P2 is the infinite sequence of primes
# whose congruent to 3 mod 4.
for i in range(10):
print(next(P2), end=", ")
3, 7, 11, 19, 23, 31, 43, 47, 59, 67,
One classical caveat of iterators (which is actually a feature) is that they are designed to not be reusable. For instance:
= primes()
P = filter(lambda e:e%4 == 3, P)
P2 = filter(lambda e:e%4 == 3, P)
P3 print(next(P3))
print(next(P2))
3
7
In particular, we have an inter-dependencies between the definitions of iterators that is highly prone of mistake. In the case of Primes, you could (and should) regenerate the iterators for each new definition:
= filter(lambda e:e%4 == 3, primes())
P2 = filter(lambda e:e%4 == 3, primes())
P3 print(next(P3))
print(next(P2))
3
3
When building complicated code, iterator regeneration can become a nightmare and make requirement to inspect the object to check if it is iterable or iterator.
Reiterability
In many situations (but certainly not all) where iterators are useful, it is possible to maintain the information of how the iterator has been generated. For prime numbers, it is not hard to provide a way of restarting the computation. It is actually the case with all generators: we can simply regenerate the iterator.
Hence, we have the possibility to have iterators that can be reset. In the reiterable.py we introduce a decorator of generators that simply does that. Instead of generating iterators, it will generate iterators with reset (called reiterator).
from reiterable import reiterable
@reiterable
def primes():
= []
Primes = 2
n while True:
if all(n % p for p in Primes):
Primes.append(n)yield n
+= 1
n
= primes()
P for i in range(10):
print(next(P), end=', ')
P.reset()print()
print("After the reset:", next(P))
2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
After the reset: 2
The reiterator does not store much more information than the initial
iterator. We simply remember the generator and regenerate internally the
iterator. Reiterators are not iterators. Their
__iter__
method actually generates a new reiterator
starting from the beginning.
= 0
i for p in P:
print(p, end=", ")
+= 1
i if i == 10:
break
print()
print("We can do that again")
for p in P:
print(p, end=", ")
+= 1
i if i == 20:
break
3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
We can do that again
3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
Every function that call __iter__
will regenerate an
iterator from the current position. If we redo the previous code, but
after popping the first prime number:
next(P) # We pop 2
= 0
i for p in P:
print(p, end=", ")
+= 1
i if i == 10:
break
print()
print("We can do that again")
for p in P:
print(p, end=", ")
+= 1
i if i == 20:
break
5, 7, 11, 13, 17, 19, 23, 29, 31, 37,
We can do that again
5, 7, 11, 13, 17, 19, 23, 29, 31, 37,
Extra operations on reiterators
On Reiterator, we don’t lose information by iterating. So we can actually move freely within the sequence of elements without really consuming the elements. For that reason, we support the full slices of lists. Of course, reaching for the last element of an infinite reiterator is a really bad idea.
P.reset()print("The 5th Prime", P[5])
The 5th Prime 13
If we pop the first element, then the 5th is the 6th prime.
next(P)
print("The 6th Prime", P[5])
The 6th Prime 17
Slice allow to have a lookahead.
print("The next elements", list(P[:5]))
The next elements [3, 5, 7, 11, 13]
We didn’t pop from the sequence yet, it is just a lookahead.
print(next(P))
print("The next elements", list(P[:10]))
3
The next elements [5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
We can also look behind
print(P.previous())
2
Although these operations are supported, in term of complexity they are not efficient. Each time we copy an iterator, we have to re-execute the generator from the beginning to the current point. So it should be used lightly.
Functional operations
We can of course perform map
and filter
on
reiterators. It will generate new iterators that will be independent
from the initial generators:
P.reset()= filter(lambda e: e%4 == 1, P)
P2 print(next(P2))
print(next(P)) # It didn't get consummed.
5
2
Unfortunately, it is not possible to stack many layers of filter and map by keeping the good behaviour of regeneration.
= map(lambda e:e**2, P2)
P3 print(next(P3))
print(next(P2)) # P2 get consummed :(
169
17
There are no reasonable ways of overloading the action of functional
operators like filter
and map
. Thus, as
alternative, we can call .filter
and .map
as
class methods. The reiterators obtained are independents.
= P.filter(lambda e: e%4 == 1)
P2 = P2.map(lambda e: e**2)
P3 print(list(P2[:5]))
print(next(P3))
print(list(P2[:5])) # Not consumed
[5, 13, 17, 29, 37]
25
[5, 13, 17, 29, 37]
We also overload accumulate
but use the semantic from
the standard version in itertools
starting from python
3.8.
P.reset()= P.accumulate(lambda x,y: (x+(y,))[-2:], ())
AdjacentPrimes print(list(AdjacentPrimes[:5]))
[(2,), (2, 3), (3, 5), (5, 7), (7, 11)]
Now we can even filter that:
= AdjacentPrimes[1:].filter(lambda e:e[1]-e[0] == 2)
TwinPrimes print(list(TwinPrimes[:10]))
[(3, 5), (5, 7), (11, 13), (17, 19), (29, 31), (41, 43), (59, 61), (71, 73), (101, 103), (107, 109)]
We can now look ahead, even rather faraway without any trouble:
print(TwinPrimes[1000]) # It took around 2seconds
(79631, 79633)
We can of course nest even more functional operators.
= TwinPrimes.filter(lambda e:e[0] % 4 == 1)
P4 print(list(P4[:10]))
[(5, 7), (17, 19), (29, 31), (41, 43), (101, 103), (137, 139), (149, 151), (197, 199), (269, 271), (281, 283)]
Practical use-cases
So reiterability is a nice feature with a rather poor complexity hidden behind. It allows to play with infinite sequences nicely without all the technicalities of iterators. But can it be used in practical reasonable situations?
Actually some algorithms requires to be able to go through dataset few times. If the dataset didn’t fit in main memory, then it is become annoying quickly. With our decorator, it is become really easy.
Let us produce a file with some useless data.
from random import sample
with open("/tmp/file", "w") as f:
for i in range(1000):
f"{i}\n") f.write(
Traceback (most recent call last):
File "/var/www/cha/check_py_md", line 77, in repl
exec(code, env)
File "<string>", line 3, in <module>
PermissionError: [Errno 13] Permission denied: '/tmp/file'
Now we can easily construct a reiterator over lines:
@reiterable
def lines():
with open("/tmp/file") as f:
yield from f
= lines().map(lambda e:int(e.strip()))
Lines print(list(Lines[:10]))
print(list(Lines[8:20:2]))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[8, 10, 12, 14, 16, 18]
Actually, we package this code snippet to provides a reiterable through files seen as lines.
from reiterable import ReitLinesFiles
= ReitLinesFiles("/tmp/file")
Lines print(list(Lines[:10]))
print(list(Lines[8:20:2]))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
['8', '10', '12', '14', '16', '18']
Database access are another situation where reiterability can simplify a lot. We thus create another small function helper that should work for any database connector respecting python db-api.
import sqlite3
from reiterable import ReitDBTable
= sqlite3.connect(":memory:")
db "CREATE TABLE A (x, y)")
db.execute("INSERT INTO A VALUES (?,?)", enumerate(range(1000)))
db.executemany(= ReitDBTable(db, "SELECT * FROM A")
ATable print(list(ATable[:5]))
print(list(ATable.filter(lambda e:e[0] % 3 == 0)[:5]))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
[(0, 0), (3, 3), (6, 6), (9, 9), (12, 12)]
Compiled the: dim. 07 janv. 2024 23:19:29 CET