Using scikit-learn in Celery

Celery is great if you want to distribute your tasks into multi processes/machines.

However if you use Python’s multiprocess with a Celery task, you may probably encounter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
raised unexpected: AttributeError("'Worker' object has no attribute '_config'",)
Traceback (most recent call last):
File "/Users/.../.virtualenvs/mdt-dev/lib/python3.5/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/Users/.../.virtualenvs/mdt-dev/lib/python3.5/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/Users/.../pyproj/modeling/tasks/miscellaneous.py", line 29, in predict_house_type
clf = pickle.loads(f.read())
File "/Users/.../.virtualenvs/mdt-dev/lib/python3.5/site-packages/sklearn/__init__.py", line 57, in <module>
from .base import clone
File "/Users/.../.virtualenvs/mdt-dev/lib/python3.5/site-packages/sklearn/base.py", line 11, in <module>
from .utils.fixes import signature
File "/Users/.../.virtualenvs/mdt-dev/lib/python3.5/site-packages/sklearn/utils/__init__.py", line 17, in <module>
from ..externals.joblib import cpu_count
File "/Users/.../.virtualenvs/mdt-dev/lib/python3.5/site-packages/sklearn/externals/joblib/__init__.py", line 127, in <module>
from .parallel import Parallel
File "/Users/.../.virtualenvs/mdt-dev/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 25, in <module>
from ._multiprocessing_helpers import mp
File "/Users/liangdeo/.virtualenvs/mdt-dev/lib/python3.5/site-packages/sklearn/externals/joblib/_multiprocessing_helpers.py", line 25, in <module>
_sem = mp.Semaphore()
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/context.py", line 81, in Semaphore
return Semaphore(value, ctx=self.get_context())
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/synchronize.py", line 127, in __init__
SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX, ctx=ctx)
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/synchronize.py", line 59, in __init__
kind, value, maxvalue, self._make_name(),
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/synchronize.py", line 117, in _make_name
return '%s-%s' % (process.current_process()._config['semprefix'],
AttributeError: 'Worker' object has no attribute '_config'

It’s a known issue that Celery doesn’t support multiprocess well in task issue. One solution will be to refactorize your code to avoid using multiprocess, since you already have Celery to do it for you.

However if you are using libries lick scikit-learn, ntlk which use multiprocess underneath, you may try the following to workaround it:

1
2
3
4
5
6
7
8
9
10
11
# some_task.py
from celery.signals import worker_process_init
@worker_process_init.connect
def fix_multiprocessing(**_):
from multiprocessing import current_process
try:
current_process()._config
except AttributeError:
current_process()._config = {'semprefix': '/mp'}

Zhanzhao Deo Liang wechat
欢迎关注我的个人订阅号: deoXdeo
今天的午餐全赖有你支持!