API Reference

The loky module manages a pool of worker that can be re-used across time. It provides a robust and dynamic implementation os the ProcessPoolExecutor and a function get_reusable_executor() which hide the pool management under the hood.

loky.get_reusable_executor(max_workers=None, context=None, timeout=10, kill_workers=False, reuse='auto', job_reducers=None, result_reducers=None, initializer=None, initargs=(), env=None)[source]

Return the current ReusableExectutor instance.

Start a new instance if it has not been started already or if the previous instance was left in a broken state.

If the previous instance does not have the requested number of workers, the executor is dynamically resized to adjust the number of workers prior to returning.

Reusing a singleton instance spares the overhead of starting new worker processes and importing common python packages each time.

max_workers controls the maximum number of tasks that can be running in parallel in worker processes. By default this is set to the number of CPUs on the host.

Setting timeout (in seconds) makes idle workers automatically shutdown so as to release system resources. New workers are respawn upon submission of new tasks so that max_workers are available to accept the newly submitted tasks. Setting timeout to around 100 times the time required to spawn new processes and import packages in them (on the order of 100ms) ensures that the overhead of spawning workers is negligible.

Setting kill_workers=True makes it possible to forcibly interrupt previously spawned jobs to get a new instance of the reusable executor with new constructor argument values.

The job_reducers and result_reducers are used to customize the pickling of tasks and results send to the executor.

When provided, the initializer is run first in newly spawned processes with argument initargs.

The environment variable in the child process are a copy of the values in the main process. One can provide a dict {ENV: VAL} where ENV and VAL are string literals to overwrite the environment variable ENV in the child processes to value VAL. The environment variables are set in the children before any module is loaded. This only works with with the loky context.

Task & results serialization

To share function definition across multiple python processes, it is necessary to rely on a serialization protocol. The standard protocol in python is pickle but its default implementation in the standard library has several limitations. For instance, it cannot serialize functions which are defined interactively or in the __main__ module.

To avoid this limitation, loky relies on cloudpickle when it is present. cloudpickle is a fork of the pickle protocol which allows the serialization of a greater number of objects and it can be installed using pip install cloudpickle. As this library is slower than the pickle module in the standard library, by default, loky uses it only to serialize objects which are detected to be in the __main__ module.

There are three ways to temper with the serialization in loky:

  • Using the arguments job_reducers and result_reducers, it is possible to register custom reducers for the serialization process.

  • Setting the variable LOKY_PICKLER to an available and valid serialization module. This module must present a valid Pickler object. Setting the environment variable LOKY_PICKLER=cloudpickle will force loky to serialize everything with cloudpickle instead of just serializing the object detected to be in the __main__ module.

  • Finally, it is possible to wrap an unpicklable object using the loky.wrap_non_picklable_objects decorator. In this case, all other objects are serialized as in the default behavior and the wrapped object is pickled through cloudpickle.

The benefits and drawbacks of each method are highlighted in this example.

loky.wrap_non_picklable_objects(obj, keep_wrapper=True)[source]

Wrapper for non-picklable object to use cloudpickle to serialize them.

Note that this wrapper tends to slow down the serialization process as it is done with cloudpickle which is typically slower compared to pickle. The proper way to solve serialization issues is to avoid defining functions and objects in the main scripts and to implement __reduce__ functions for complex classes.

Processes start methods in loky

The API in loky provides a set_start_method() function to set the default start_method, which controls the way Process are started. The available methods are {'loky', 'loky_int_main', 'spawn'}. On unix, the start methods {'fork', 'forkserver'} are also available. Note that loky is not compatible with multiprocessing.set_start_method() function. The default start method needs to be set with the provided function to ensure a proper behavior.

Protection against memory leaks

The memory size of long running worker processes can increase indefinitely if a memory leak is created. This can result in processes being shut down by the OS if those leaks are not resolved. To prevent it, loky provides leak detection, memory cleanups, and workers shutdown.

If psutil is installed, each worker periodically [1] checks its memory usage after it completes its task. If the usage is found to be unusual [2], an additional gc.collect() event is triggered to remove objects with potential cyclic references. If even after that, the memory usage of a process worker remains too high, it will shut down safely, and a fresh process will be automatically spawned by the executor.

If psutil is not installed, there is no easy way to monitor worker processes memory usage. gc.collect() events will still be called periodially [1] inside each workers, but there is no guarantee that a leak is not happening.

Footnotes