Skip to content

ebonnal/streamable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

coverage testing typing formatting PyPI Anaconda-Server Badge

เผ„ streamable

Pythonic Stream-like manipulation of iterables

  • ๐Ÿ”— Fluent chainable lazy operations
  • ๐Ÿ”€ Concurrent via threads/processes/asyncio
  • ๐Ÿ‡น Typed, fully annotated, Stream[T] is an Iterable[T]
  • ๐Ÿ›ก๏ธ Tested extensively with Python 3.7 to 3.14
  • ๐Ÿชถ Light, no dependencies

1. install

pip install streamable

or

conda install conda-forge::streamable 

2. import

from streamable import Stream

3. init

Create a Stream[T] decorating an Iterable[T]:

integers: Stream[int] = Stream(range(10))

4. operate

Chain lazy operations (only evaluated during iteration), each returning a new immutable Stream:

inverses: Stream[float] = (
    integers
    .map(lambda n: round(1 / n, 2))
    .catch(ZeroDivisionError)
)

5. iterate

Iterate over a Stream[T] just as you would over any other Iterable[T], elements are processed on-the-fly:

  • collect
>>> list(inverses)
[1.0, 0.5, 0.33, 0.25, 0.2, 0.17, 0.14, 0.12, 0.11]
>>> set(inverses)
{0.5, 1.0, 0.2, 0.33, 0.25, 0.17, 0.14, 0.12, 0.11}
  • reduce
>>> sum(inverses)
2.82
>>> from functools import reduce
>>> reduce(..., inverses)
  • loop
>>> for inverse in inverses:
>>>    ...
  • next
>>> next(iter(inverses))
1.0

๐Ÿ“’ Operations

A dozen expressive lazy operations and thatโ€™s it!

.map

Applies a transformation on elements:

๐Ÿ‘€ show example
integer_strings: Stream[str] = integers.map(str)

assert list(integer_strings) == ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

concurrency

Note

Preserves the upstream order by default (FIFO), but you can set ordered=False for First Done First Out.

thread-based concurrency

Applies the transformation via concurrency threads:

๐Ÿ‘€ show example
import requests

pokemon_names: Stream[str] = (
    Stream(range(1, 4))
    .map(lambda i: f"https://pokeapi.co/api/v2/pokemon-species/{i}")
    .map(requests.get, concurrency=3)
    .map(requests.Response.json)
    .map(lambda poke: poke["name"])
)
assert list(pokemon_names) == ['bulbasaur', 'ivysaur', 'venusaur']

Note

concurrency is also the size of the buffer containing not-yet-yielded results. If the buffer is full, the iteration over the upstream is paused until a result is yielded from the buffer.

Tip

The performance of thread-based concurrency in a CPU-bound script can be drastically improved by using a Python 3.13+ free-threading build.

process-based concurrency

Set via="process":

๐Ÿ‘€ show example
if __name__ == "__main__":
    state: List[int] = []
    # integers are mapped
    assert integers.map(state.append, concurrency=4, via="process").count() == 10
    # but the `state` of the main process is not mutated
    assert state == []

asyncio-based concurrency

The sibling operation .amap applies an async function:

๐Ÿ‘€ show example
import httpx
import asyncio

http_async_client = httpx.AsyncClient()

pokemon_names: Stream[str] = (
    Stream(range(1, 4))
    .map(lambda i: f"https://pokeapi.co/api/v2/pokemon-species/{i}")
    .amap(http_async_client.get, concurrency=3)
    .map(httpx.Response.json)
    .map(lambda poke: poke["name"])
)

assert list(pokemon_names) == ['bulbasaur', 'ivysaur', 'venusaur']
asyncio.get_event_loop().run_until_complete(http_async_client.aclose())

"starmap"

The star function decorator transforms a function that takes several positional arguments into a function that takes a tuple:

๐Ÿ‘€ show example
from streamable import star

zeros: Stream[int] = (
    Stream(enumerate(integers))
    .map(star(lambda index, integer: index - integer))
)

assert list(zeros) == [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

.foreach

Applies a side effect on elements:

๐Ÿ‘€ show example
state: List[int] = []
appending_integers: Stream[int] = integers.foreach(state.append)

assert list(appending_integers) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
assert state == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

concurrency

Similar to .map:

  • set the concurrency parameter for thread-based concurrency
  • set via="process" for process-based concurrency
  • use the sibling .aforeach operation for asyncio-based concurrency
  • set ordered=False for First Done First Out

.group

Groups elements into Lists:

๐Ÿ‘€ show example
integers_by_5: Stream[List[int]] = integers.group(size=5)

assert list(integers_by_5) == [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
๐Ÿ‘€ show example
integers_by_parity: Stream[List[int]] = integers.group(by=lambda n: n % 2)

assert list(integers_by_parity) == [[0, 2, 4, 6, 8], [1, 3, 5, 7, 9]]
๐Ÿ‘€ show example
from datetime import timedelta

integers_within_1_sec: Stream[List[int]] = (
    integers
    .throttle(2, per=timedelta(seconds=1))
    .group(interval=timedelta(seconds=0.99))
)

assert list(integers_within_1_sec) == [[0, 1, 2], [3, 4], [5, 6], [7, 8], [9]]

Mix the size/by/interval parameters:

๐Ÿ‘€ show example
integers_by_parity_by_2: Stream[List[int]] = (
    integers
    .group(by=lambda n: n % 2, size=2)
)

assert list(integers_by_parity_by_2) == [[0, 2], [1, 3], [4, 6], [5, 7], [8], [9]]

.groupby

Like .group, but groups into (key, elements) tuples:

๐Ÿ‘€ show example
integers_by_parity: Stream[Tuple[str, List[int]]] = (
    integers
    .groupby(lambda n: "odd" if n % 2 else "even")
)

assert list(integers_by_parity) == [("even", [0, 2, 4, 6, 8]), ("odd", [1, 3, 5, 7, 9])]

Tip

Then "starmap" over the tuples:

๐Ÿ‘€ show example
from streamable import star

counts_by_parity: Stream[Tuple[str, int]] = (
    integers_by_parity
    .map(star(lambda parity, ints: (parity, len(ints))))
)

assert list(counts_by_parity) == [("even", 5), ("odd", 5)]

.flatten

Ungroups elements assuming that they are Iterables:

๐Ÿ‘€ show example
even_then_odd_integers: Stream[int] = integers_by_parity.flatten()

assert list(even_then_odd_integers) == [0, 2, 4, 6, 8, 1, 3, 5, 7, 9]

thread-based concurrency

Flattens concurrency iterables concurrently:

๐Ÿ‘€ show example
mixed_ones_and_zeros: Stream[int] = (
    Stream([[0] * 4, [1] * 4])
    .flatten(concurrency=2)
)
assert list(mixed_ones_and_zeros) == [0, 1, 0, 1, 0, 1, 0, 1]

.filter

Keeps only the elements that satisfy a condition:

๐Ÿ‘€ show example
even_integers: Stream[int] = integers.filter(lambda n: n % 2 == 0)

assert list(even_integers) == [0, 2, 4, 6, 8]

.distinct

Removes duplicates:

๐Ÿ‘€ show example
distinct_chars: Stream[str] = Stream("foobarfooo").distinct()

assert list(distinct_chars) == ["f", "o", "b", "a", "r"]

specifying a deduplication key:

๐Ÿ‘€ show example
strings_of_distinct_lengths: Stream[str] = (
    Stream(["a", "foo", "bar", "z"])
    .distinct(len)
)

assert list(strings_of_distinct_lengths) == ["a", "foo"]

Warning

During iteration, all distinct elements that are yielded are retained in memory to perform deduplication. However, you can remove only consecutive duplicates without a memory footprint by setting consecutive_only=True:

๐Ÿ‘€ show example
consecutively_distinct_chars: Stream[str] = (
    Stream("foobarfooo")
    .distinct(consecutive_only=True)
)

assert list(consecutively_distinct_chars) == ["f", "o", "b", "a", "r", "f", "o"]

.truncate

Ends iteration once a given number of elements have been yielded:

๐Ÿ‘€ show example
five_first_integers: Stream[int] = integers.truncate(5)

assert list(five_first_integers) == [0, 1, 2, 3, 4]

or when a condition is satisfied:

๐Ÿ‘€ show example
five_first_integers: Stream[int] = integers.truncate(when=lambda n: n == 5)

assert list(five_first_integers) == [0, 1, 2, 3, 4]

If both count and when are set, truncation occurs as soon as either condition is met.

.skip

Skips the first specified number of elements:

๐Ÿ‘€ show example
integers_after_five: Stream[int] = integers.skip(5)

assert list(integers_after_five) == [5, 6, 7, 8, 9]

or skips elements until a predicate is satisfied:

๐Ÿ‘€ show example
integers_after_five: Stream[int] = integers.skip(until=lambda n: n >= 5)

assert list(integers_after_five) == [5, 6, 7, 8, 9]

If both count and until are set, skipping stops as soon as either condition is met.

.catch

Catches a given type of exception, and optionally yields a replacement value:

๐Ÿ‘€ show example
inverses: Stream[float] = (
    integers
    .map(lambda n: round(1 / n, 2))
    .catch(ZeroDivisionError, replacement=float("inf"))
)

assert list(inverses) == [float("inf"), 1.0, 0.5, 0.33, 0.25, 0.2, 0.17, 0.14, 0.12, 0.11]

You can specify an additional when condition for the catch:

๐Ÿ‘€ show example
import requests
from requests.exceptions import ConnectionError

status_codes_ignoring_resolution_errors: Stream[int] = (
    Stream(["https://github.com", "https://foo.bar", "https://github.com/foo/bar"])
    .map(requests.get, concurrency=2)
    .catch(ConnectionError, when=lambda error: "Max retries exceeded with url" in str(error))
    .map(lambda response: response.status_code)
)

assert list(status_codes_ignoring_resolution_errors) == [200, 404]

It has an optional finally_raise: bool parameter to raise the first exception caught (if any) when the iteration terminates.

Tip

Apply side effects when catching an exception by integrating them into when:

๐Ÿ‘€ show example
errors: List[Exception] = []

def store_error(error: Exception) -> bool:
    errors.append(error)  # applies effect
    return True  # signals to catch the error

integers_in_string: Stream[int] = (
    Stream("012345foo6789")
    .map(int)
    .catch(ValueError, when=store_error)
)

assert list(integers_in_string) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
assert len(errors) == len("foo")

.throttle

Limits the number of yields per time interval:

๐Ÿ‘€ show example
from datetime import timedelta

three_integers_per_second: Stream[int] = integers.throttle(3, per=timedelta(seconds=1))

# takes 3s: ceil(10 integers / 3 per_second) - 1
assert list(three_integers_per_second) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

.observe

Logs the progress of iterations:

๐Ÿ‘€ show example
>>> assert list(integers.throttle(2, per=timedelta(seconds=1)).observe("integers")) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
INFO: [duration=0:00:00.001793 errors=0] 1 integers yielded
INFO: [duration=0:00:00.004388 errors=0] 2 integers yielded
INFO: [duration=0:00:01.003655 errors=0] 4 integers yielded
INFO: [duration=0:00:03.003196 errors=0] 8 integers yielded
INFO: [duration=0:00:04.003852 errors=0] 10 integers yielded

Note

The amount of logs will never be overwhelming because they are produced logarithmically (base 2): the 11th log will be produced after 1,024 elements have been yielded, the 21th log after 1,048,576 elements, ...

Tip

To mute these logs, set the logging level above INFO:

๐Ÿ‘€ show example
import logging
logging.getLogger("streamable").setLevel(logging.WARNING)

+

Concatenates streams:

๐Ÿ‘€ show example
assert list(integers + integers) == [0, 1, 2, 3 ,4, 5, 6, 7, 8, 9, 0, 1, 2, 3 ,4, 5, 6, 7, 8, 9]

zip

Tip

Use the standard zip function:

๐Ÿ‘€ show example
from streamable import star

cubes: Stream[int] = (
    Stream(zip(integers, integers, integers))  # Stream[Tuple[int, int, int]]
    .map(star(lambda a, b, c: a * b * c))  # Stream[int]
)

assert list(cubes) == [0, 1, 8, 27, 64, 125, 216, 343, 512, 729]

Shorthands for consuming the stream

Note

Although consuming the stream is beyond the scope of this library, it provides two basic shorthands to trigger an iteration:

.count

Iterates over the stream until exhaustion and returns the number of elements yielded:

๐Ÿ‘€ show example
assert integers.count() == 10

()

Calling the stream iterates over it until exhaustion and returns it:

๐Ÿ‘€ show example
state: List[int] = []
appending_integers: Stream[int] = integers.foreach(state.append)
assert appending_integers() is appending_integers
assert state == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

.pipe

Calls a function, passing the stream as first argument, followed by *args/**kwargs if any:

๐Ÿ‘€ show example
import pandas as pd

(
    integers
    .observe("ints")
    .pipe(pd.DataFrame, columns=["integer"])
    .to_csv("integers.csv", index=False)
)

Inspired by the .pipe from pandas or polars.

๐Ÿ’ก๐Ÿ’ก๐Ÿ’ก๐Ÿ’ก๐Ÿ’ก๐Ÿ’ก TIPS ๐Ÿ’ก๐Ÿ’ก๐Ÿ’ก๐Ÿ’ก๐Ÿ’ก๐Ÿ’ก

Exceptions are not terminating the iteration

Tip

If any of the operations raises an exception, you can resume the iteration after handling it:

๐Ÿ‘€ show example
from contextlib import suppress

casted_ints: Iterator[int] = iter(
    Stream("0123_56789")
    .map(int)
    .group(3)
    .flatten()
)
collected: List[int] = []

with suppress(ValueError):
    collected.extend(casted_ints)
assert collected == [0, 1, 2, 3]

collected.extend(casted_ints)
assert collected == [0, 1, 2, 3, 5, 6, 7, 8, 9]

Extract-Transform-Load

Tip

Custom ETL scripts can benefit from the expressiveness of this library. Below is a pipeline that extracts the 67 quadruped Pokรฉmon from the first three generations using PokรฉAPI and loads them into a CSV:

๐Ÿ‘€ show example
import csv
from datetime import timedelta
import itertools
import requests
from streamable import Stream

with open("./quadruped_pokemons.csv", mode="w") as file:
    fields = ["id", "name", "is_legendary", "base_happiness", "capture_rate"]
    writer = csv.DictWriter(file, fields, extrasaction='ignore')
    writer.writeheader()

    pipeline: Stream = (
        # Infinite Stream[int] of Pokemon ids starting from Pokรฉmon #1: Bulbasaur
        Stream(itertools.count(1))
        # Limits to 16 requests per second to be friendly to our fellow PokรฉAPI devs
        .throttle(16, per=timedelta(seconds=1))
        # GETs pokemons concurrently using a pool of 8 threads
        .map(lambda poke_id: f"https://pokeapi.co/api/v2/pokemon-species/{poke_id}")
        .map(requests.get, concurrency=8)
        .foreach(requests.Response.raise_for_status)
        .map(requests.Response.json)
        # Stops the iteration when reaching the 1st pokemon of the 4th generation
        .truncate(when=lambda poke: poke["generation"]["name"] == "generation-iv")
        .observe("pokemons")
        # Keeps only quadruped Pokemons
        .filter(lambda poke: poke["shape"]["name"] == "quadruped")
        .observe("quadruped pokemons")
        # Catches errors due to None "generation" or "shape"
        .catch(
            TypeError,
            when=lambda error: str(error) == "'NoneType' object is not subscriptable"
        )
        # Writes a batch of pokemons every 5 seconds to the CSV file
        .group(interval=timedelta(seconds=5))
        .foreach(writer.writerows)
        .flatten()
        .observe("written pokemons")
        # Catches exceptions and raises the 1st one at the end of the iteration
        .catch(Exception, finally_raise=True)
    )

    pipeline()

Visitor Pattern

Tip

A Stream can be visited via its .accept method: implement a custom visitor by extending the abstract class streamable.visitors.Visitor:

๐Ÿ‘€ show example
from streamable.visitors import Visitor

class DepthVisitor(Visitor[int]):
    def visit_stream(self, stream: Stream) -> int:
        if not stream.upstream:
            return 1
        return 1 + stream.upstream.accept(self)

def depth(stream: Stream) -> int:
    return stream.accept(DepthVisitor())

assert depth(Stream(range(10)).map(str).foreach(print)) == 3

Functions

Tip

The Stream's methods are also exposed as functions:

๐Ÿ‘€ show example
from streamable.functions import catch

inverse_integers: Iterator[int] = map(lambda n: 1 / n, range(10))
safe_inverse_integers: Iterator[int] = catch(inverse_integers, ZeroDivisionError)

Contributing

Many thanks to our contributors!

Feel very welcome to help us improve streamable via issues and PRs, check CONTRIBUTING.md.

๐Ÿ™ Community Highlights โ€“ Thank You!