Search code examples
celerysolariseventlet

Celery worker errors using eventlet on Solaris


I'm running a standard celery worker using the eventlet class and concurrency set to 8. These are pretty busy workers when this is happening (but may happen when not busy, it's hard to tell).

I know I don't have any leaks in my task, and have run it with max task per child set just in case.

I'm getting these errors, though, and I have no idea why. It's random as far as I can tell. Is it something in my environment? Known issues? Unless I am misunderstanding, this does not seem to be errors due to the task itself. Am I wrong about that? Any ideas would be great!

Also, I get none of these errors when running as prefork.

Error 1:

2015-05-02 06:13:37,452: WARNING/MainProcess] /opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/app/trace.py:364:
 RuntimeWarning: Exception raised outside body: SystemError('error return without exception set',):
Traceback (most recent call last):
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/app/trace.py", line 283, in trace_task
    uuid, retval, SUCCESS, request=task_request,
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/backends/base.py", line 248, in store_result
    request=request, **kwargs)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/backends/base.py", line 481, in _store_result
    self.set(self.get_key_for_task(task_id), self.encode(meta))
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/backends/cache.py", line 126, in set
    return self.client.set(key, value, self.expires)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/memcache.py", line 584, in set
    return self._set("set", key, val, time, min_compress_len)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/memcache.py", line 835, in _set
    return _unsafe_set()
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/memcache.py", line 827, in _unsafe_set
    return(server.expect("STORED", raise_exception=True)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/memcache.py", line 1196, in expect
    line = self.readline(raise_exception)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/memcache.py", line 1182, in readline
    data = recv(4096)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/greenio.py", line 325, in recv
    timeout_exc=socket.timeout("timed out"))
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/greenio.py", line 200, in _trampoline
    mark_as_closed=self._mark_as_closed)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/hubs/__init__.py", line 159, in trampoline
    return hub.switch()
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/hubs/hub.py", line 293, in switch
    return self.greenlet.switch()
SystemError: error return without exception set

Error 2:

Traceback (most recent call last):
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/worker/__init__.py", line 227, in _process_task
    req.execute_using_pool(self.pool)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/worker/job.py", line 263, in execute_using_pool
    correlation_id=uuid,
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/concurrency/base.py", line 156, in apply_async
    **options)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/concurrency/eventlet.py", line 144, in on_apply
    self.getpid)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/greenpool.py", line 106, in spawn_n
    self.sem.acquire()
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/semaphore.py", line 96, in acquire
    hubs.get_hub().switch()
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/hubs/hub.py", line 293, in switch
    return self.greenlet.switch()
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/greenpool.py", line 93, in _spawn_n_impl
    self._spawn_done(coro)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/greenpool.py", line 125, in _spawn_done
    self.coroutines_running.remove(coro)
KeyError: <greenlet.greenlet object at 0xc759e0>

Error 3:

[2015-05-02 08:41:48,786: WARNING/MainProcess] /opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/app/trace.py:364:
 RuntimeWarning: Exception raised outside body: TypeError('sequence item 1: expected string, NoneType found',):
Traceback (most recent call last):
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/app/trace.py", line 253, in trace_task
    I, R, state, retval = on_error(task_request, exc, uuid)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/app/trace.py", line 201, in on_error
    R = I.handle_error_state(task, eager=eager)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/app/trace.py", line 85, in handle_error_state
    }[self.state](task, store_errors=store_errors)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/app/trace.py", line 118, in handle_failure
    req.id, exc, einfo.traceback, request=req,
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/backends/base.py", line 121, in mark_as_failure
    traceback=traceback, request=request)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/backends/base.py", line 248, in store_result
    request=request, **kwargs)
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/backends/base.py", line 481, in _store_result
    self.set(self.get_key_for_task(task_id), self.encode(meta))
  File "/opt/app/thisapp/software/python/lib/python2.7/site-packages/celery-3.1.16-py2.7.egg/celery/backends/base.py", line 406, in get_key_for_task
    self.task_keyprefix, key_t(task_id), key_t(key),
TypeError: sequence item 1: expected string, NoneType found

Solution

  • Try to lower Celery concurrency.

    We don't have a good concurrency implementation (hub) for Solaris. Except pyevent which is not really supported. So it would use select or poll and those are limited to watching 1024 file descriptors. One task may use zero or more file descriptors, so I propose to adjust lower than now, no particular number.

    Upgrade to latest Eventlet version is also always a good idea. Today 2015-05-04 it's 0.17.3.