Question :
I am trying to implement multiprocessing in my code, and so, I thought that I would start my learning with some examples. I used the first example found in this documentation.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
When I run the above code I get an AttributeError: can't get attribute 'f' on <module '__main__' (built-in)>
. I do not know why I am getting this error. I am also using Python 3.5 if that helps.
Answer #1:
This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.
File: defs.py
def f(x):
return x*x
File: run.py
from multiprocessing import Pool
import defs
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(defs.f, [1, 2, 3]))
If you use print or a different built-in function, the example should work. If this is not a bug (according to the link), the given example is chosen badly.
Answer #2:
The multiprocessing
module has a major limitation when it comes to IPython use:
Functionality within this package requires that the
__main__
module be
importable by the children. […] This means that some examples, such
as themultiprocessing.pool.Pool
examples will not work in the
interactive interpreter. [from the documentation]
Fortunately, there is a fork of the multiprocessing
module called multiprocess
which uses dill instead of pickle to serialization and overcomes this issue conveniently.
Just install multiprocess
and replace multiprocessing
with multiprocess
in your imports:
import multiprocess as mp
def f(x):
return x*x
with mp.Pool(5) as pool:
print(pool.map(f, [1, 2, 3, 4, 5]))
Of course, externalizing the code as suggested in this answer works as well, but I find it very inconvenient: That is not why (and how) I use IPython environments.
<tl;dr> multiprocessing
does not work in IPython environments right away, use its fork multiprocess
instead.
Answer #3:
If you’re using Jupyter notebook (like the OP), then defining the function in a separate cell and executing that cell first fixes the problem. The accepted answer works too, but it’s more work. Defining the function before, i.e. above the pool, isn’t adequate. It has to be in a completely different notebook cell which is executed first.