
"Python's asyncio.gather function is great for I/O bound parallel processing. There's a simple utility function I like to use that I call gather_in_batches: async def gather_in_batches(tasks, batch_size=100, return_exceptions=False): for i in range(0, len(tasks), batch_size): batch = tasks[i:i+batch_size] for result in await asyncio.gather(*batch, return_exceptions=return_exceptions): yield result"
"And process_async is an async function that would just do whatever processing you need to do on that object. Assuming it is mostly I/O bound, then this is very simple and effective method to process data in parallel, without getting into threads, multi-processing, greenlets, or any other method. You'll need to experiment to figure out what the optimal batch_size is for your use case. And unless you don't care about errors, you should set return_exceptions=True, then check if isinstance(result, Exception) to do proper error handling."
gather_in_batches is an async generator that splits a list of coroutine tasks into fixed-size batches and runs asyncio.gather on each batch. The function accepts tasks, batch_size, and return_exceptions parameters and yields individual results from each gathered batch. Typical usage creates a list of tasks and consumes results via an async comprehension or iteration. Task sources can include database records, URLs, or filenames. process_async represents the I/O-bound coroutine to execute. Batch size requires tuning for throughput and resource constraints. Setting return_exceptions=True enables explicit exception detection and handling by checking isinstance(result, Exception).
Read at StreamHacker
Unable to calculate read time
Collection
[
|
...
]