Parallelism in Python
Published:
This post summarizes some useful resourses for parallelism in Python.
Multiprocessing and Multithreading
- multiprocessing : CPU intensive tasks
- multithreading : IO intensive tasks
Python packages:
- Built-in modules :
multiprocessing
,threading
- High-level interface for parallel and/or distributed computing :
pathos
- Lightweight embarrassingly parallel pipelines :
joblib
- Distributed Computing (on Servers and Clounds) :
ray
Code Snippets with joblib
(Recommended)
Basic Usage
from math import sqrt
from joblib import Parallel, delayed
Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
The function delayed
is used to allocate the multiple calls of sqrt
with argument(s) to the pool.
Reusing a pool of workers
from math import sqrt
from joblib import Parallel, delayed
# Specify the context to reuse a pool of
# workers in loops and to avoid overhead.
with Parallel(n_jobs=2) as parallel:
accumulator = 0.
n_iter = 0
while accumulator < 1000:
results = parallel(
delayed(sqrt)(accumulator + i ** 2) for i in range(5)
)
accumulator += sum(results) # synchronization barrier
n_iter += 1
Function with multiple outputs.
from math import modf
from joblib import Parallel, delayed
# `modf` return the fractional and integer parts of
# the input number in a two-item tuple.
r = Parallel(n_jobs=1)(delayed(modf)(i/2.) for i in range(10))
# Unzip the results to get each output on the whole pool
res, i = zip(*r)
Result orders and input orders
According to Stackoverflow, the result orders match the input orders with the multiprocessing
and the loky
backends. However, there is an issue with the threading
backend and the pre_dispatch
option (Issue).