Reiterability

With Bruno Guillon we developed a nice class we call reiterable within a larger (and hopefully soon to be disclosed) project.

For people not at ease with iterators and generators, you can read an introduction here or the official documentation here

An example of Generator

Lets say that you are into number theory and you want to manipulate the sequences of prime numbers. Of course it is infinite, but nevertheless you can manipulate an iterator over all primes. For instance:

def primes():
    Primes = []
    n = 2
    while True:
        if all(n % p for p in Primes):
            Primes.append(n)
            yield n
        n += 1

P = primes()
# P is an iterator over all primes!
for i in range(10):
    print(next(P), end=", ")
2, 3, 5, 7, 11, 13, 17, 19, 23, 29,

The iterator is useful itself since we can map or filter it:

P = primes()
P2 = filter(lambda e:e%4 == 3, P)
# P2 is the infinite sequence of primes
# whose congruent to 3 mod 4.
for i in range(10):
    print(next(P2), end=", ")
3, 7, 11, 19, 23, 31, 43, 47, 59, 67,

One classical caveat of iterators (which is actually a feature) is that they are designed to not be reusable. For instance:

P = primes()
P2 = filter(lambda e:e%4 == 3, P)
P3 = filter(lambda e:e%4 == 3, P)
print(next(P3))
print(next(P2))
3
7

In particular, we have an inter-dependencies between the definitions of iterators that is highly prone of mistake. In the case of Primes, you could (and should) regenerate the iterators for each new definition:

P2 = filter(lambda e:e%4 == 3, primes())
P3 = filter(lambda e:e%4 == 3, primes())
print(next(P3))
print(next(P2))
3
3

When building complicated code, iterator regeneration can become a nightmare and make requirement to inspect the object to check if it is iterable or iterator.

Reiterability

In many situations (but certainly not all) where iterators are useful, it is possible to maintain the information of how the iterator has been generated. For prime numbers, it is not hard to provide a way of restarting the computation. It is actually the case with all generators: we can simply regenerate the iterator.

Hence, we have the possibility to have iterators that can be reset. In the reiterable.py we introduce a decorator of generators that simply does that. Instead of generating iterators, it will generate iterators with reset (called reiterator).

from reiterable import reiterable
@reiterable
def primes():
    Primes = []
    n = 2
    while True:
        if all(n % p for p in Primes):
            Primes.append(n)
            yield n
        n += 1

P = primes()
for i in range(10):
    print(next(P), end=', ')
P.reset()
print()
print("After the reset:", next(P))
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 
After the reset: 2

The reiterator does not store much more information than the initial iterator. We simply remember the generator and regenerate internally the iterator. Reiterators are not iterators. Their __iter__ method actually generates a new reiterator starting from the beginning.

i = 0
for p in P:
    print(p, end=", ")
    i += 1
    if i == 10:
        break
print()
print("We can do that again")
for p in P:
    print(p, end=", ")
    i += 1
    if i == 20:
        break
3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 
We can do that again
3, 5, 7, 11, 13, 17, 19, 23, 29, 31,

Every function that call __iter__ will regenerate an iterator from the current position. If we redo the previous code, but after popping the first prime number:

next(P) # We pop 2
i = 0
for p in P:
    print(p, end=", ")
    i += 1
    if i == 10:
        break
print()
print("We can do that again")
for p in P:
    print(p, end=", ")
    i += 1
    if i == 20:
        break
5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 
We can do that again
5, 7, 11, 13, 17, 19, 23, 29, 31, 37,

Extra operations on reiterators

On Reiterator, we don’t lose information by iterating. So we can actually move freely within the sequence of elements without really consuming the elements. For that reason, we support the full slices of lists. Of course, reaching for the last element of an infinite reiterator is a really bad idea.

P.reset()
print("The 5th Prime", P[5])
The 5th Prime 13

If we pop the first element, then the 5th is the 6th prime.

next(P)
print("The 6th Prime", P[5])
The 6th Prime 17

Slice allow to have a lookahead.

print("The next elements", list(P[:5]))
The next elements [3, 5, 7, 11, 13]

We didn’t pop from the sequence yet, it is just a lookahead.

print(next(P))
print("The next elements", list(P[:10]))
3
The next elements [5, 7, 11, 13, 17, 19, 23, 29, 31, 37]

We can also look behind

print(P.previous())
2

Although these operations are supported, in term of complexity they are not efficient. Each time we copy an iterator, we have to re-execute the generator from the beginning to the current point. So it should be used lightly.

Functional operations

We can of course perform map and filter on reiterators. It will generate new iterators that will be independent from the initial generators:

P.reset()
P2 = filter(lambda e: e%4 == 1, P)
print(next(P2))
print(next(P)) # It didn't get consummed.
5
2

Unfortunately, it is not possible to stack many layers of filter and map by keeping the good behaviour of regeneration.

P3 = map(lambda e:e**2, P2)
print(next(P3))
print(next(P2)) # P2 get consummed :(
169
17

There are no reasonable ways of overloading the action of functional operators like filter and map. Thus, as alternative, we can call .filter and .map as class methods. The reiterators obtained are independents.

P2 = P.filter(lambda e: e%4 == 1)
P3 = P2.map(lambda e: e**2)
print(list(P2[:5]))
print(next(P3))
print(list(P2[:5])) # Not consumed
[5, 13, 17, 29, 37]
25
[5, 13, 17, 29, 37]

We also overload accumulate but use the semantic from the standard version in itertools starting from python 3.8.

P.reset()
AdjacentPrimes = P.accumulate(lambda x,y: (x+(y,))[-2:], ())
print(list(AdjacentPrimes[:5]))
[(2,), (2, 3), (3, 5), (5, 7), (7, 11)]

Now we can even filter that:

TwinPrimes = AdjacentPrimes[1:].filter(lambda e:e[1]-e[0] == 2)
print(list(TwinPrimes[:10]))
[(3, 5), (5, 7), (11, 13), (17, 19), (29, 31), (41, 43), (59, 61), (71, 73), (101, 103), (107, 109)]

We can now look ahead, even rather faraway without any trouble:

print(TwinPrimes[1000]) # It took around 2seconds
(79631, 79633)

We can of course nest even more functional operators.

P4 = TwinPrimes.filter(lambda e:e[0] % 4 == 1)
print(list(P4[:10]))
[(5, 7), (17, 19), (29, 31), (41, 43), (101, 103), (137, 139), (149, 151), (197, 199), (269, 271), (281, 283)]

Practical use-cases

So reiterability is a nice feature with a rather poor complexity hidden behind. It allows to play with infinite sequences nicely without all the technicalities of iterators. But can it be used in practical reasonable situations?

Actually some algorithms requires to be able to go through dataset few times. If the dataset didn’t fit in main memory, then it is become annoying quickly. With our decorator, it is become really easy.

Let us produce a file with some useless data.

from random import sample

with open("/tmp/file", "w") as f:
    for i in range(1000):
        f.write(f"{i}\n")

Now we can easily construct a reiterator over lines:

@reiterable
def lines():
    with open("/tmp/file") as f:
        yield from f
Lines = lines().map(lambda e:int(e.strip()))
print(list(Lines[:10]))
print(list(Lines[8:20:2]))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[8, 10, 12, 14, 16, 18]

Actually, we package this code snippet to provides a reiterable through files seen as lines.

from reiterable import ReitLinesFiles
Lines = ReitLinesFiles("/tmp/file")
print(list(Lines[:10]))
print(list(Lines[8:20:2]))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
['8', '10', '12', '14', '16', '18']

Database access are another situation where reiterability can simplify a lot. We thus create another small function helper that should work for any database connector respecting python db-api.

import sqlite3
from reiterable import ReitDBTable
db = sqlite3.connect(":memory:")
db.execute("CREATE TABLE A (x, y)")
db.executemany("INSERT INTO A VALUES (?,?)", enumerate(range(1000)))
ATable = ReitDBTable(db, "SELECT * FROM A")
print(list(ATable[:5]))
print(list(ATable.filter(lambda e:e[0] % 3 == 0)[:5]))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
[(0, 0), (3, 3), (6, 6), (9, 9), (12, 12)]


Mastodon