Parallelism in Python

1 minute read

Published:

This post summarizes some useful resourses for parallelism in Python.

Multiprocessing and Multithreading

  • multiprocessing : CPU intensive tasks
  • multithreading : IO intensive tasks

Python packages:

  • Built-in modules : multiprocessing, threading
  • High-level interface for parallel and/or distributed computing : pathos
  • Lightweight embarrassingly parallel pipelines : joblib
  • Distributed Computing (on Servers and Clounds) : ray

Code Snippets with joblib (Recommended)

Basic Usage

from math import sqrt
from joblib import Parallel, delayed
Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))

The function delayed is used to allocate the multiple calls of sqrt with argument(s) to the pool.

Reusing a pool of workers

from math import sqrt
from joblib import Parallel, delayed

# Specify the context to reuse a pool of 
# workers in loops and to avoid overhead.
with Parallel(n_jobs=2) as parallel:
	accumulator = 0.
    n_iter = 0
    while accumulator < 1000:
        results = parallel(
        	delayed(sqrt)(accumulator + i ** 2) for i in range(5)
        	)
        accumulator += sum(results)  # synchronization barrier
        n_iter += 1

Function with multiple outputs.

from math import modf
from joblib import Parallel, delayed

# `modf` return the fractional and integer parts of 
# the input number in a two-item tuple.
r = Parallel(n_jobs=1)(delayed(modf)(i/2.) for i in range(10))

# Unzip the results to get each output on the whole pool
res, i = zip(*r)

Result orders and input orders

According to Stackoverflow, the result orders match the input orders with the multiprocessing and the loky backends. However, there is an issue with the threading backend and the pre_dispatch option (Issue).