Talk: Python + Celery¶

Meetup Python Grenoble - 2019/10/30

Manage your asynchronous tasks with Celery!

About me¶

Romain Clement¶

Meetup Python Grenoble co-organizer
Freelance Software Engineer
CTO @ Sylha (FinTech startup)
Website
GitHub

Why asynchronous tasks?¶

More and more use-cases everyday for non-blocking processes:

Sending e-mails (single, batch, scheduled)
Data pipelines (ETL)
Machine learning
Micro-services communication
Logging
...

Is it that hard?¶

Mmmh, kinda, see for yourself:

Concurrence model (threading, etc.)
Result storage (databases, etc.)
Scaling strategy (pool/cluster of workers)
Scheduling (cron-jobs)
Retry on failure strategies
Distributing tasks accross multiple services
Task broadcasting
...

Why Celery?¶

Celery is a Python package abstracting task definitions and invocations, using a message-broker and a result-backend behind the scenes:

Choose a message broker (Redis, RabbitMQ, etc.) and a result backend (Redis, SQLAlchemy, Mongo, etc.), if any
Define your tasks using Python code
Define your cron-jobs using Python code
Define retry on failure strategies for each task
Call your tasks (almost) as a function call!

Celery Overview¶

Show me some code!¶

Initialize Celery¶

In [ ]:

import os

from typing import Optional
from celery import Celery

celery_broker_url: Optional[str] = os.environ.get("CELERY_BROKER_URL")
celery_result_backend: Optional[str] = os.environ.get("CELERY_RESULT_BACKEND")

celery_app: Celery = Celery("example")

celery_app.conf.update(
    broker_url=celery_broker_url,
    result_backend=celery_result_backend,
)

Define a simple task¶

In [ ]:

def say_hi(name: str) -> None:
    print(f"Hi, {name}!")

In [ ]:

celery_app.task(
    say_hi,
    name="tasks.say_hi",
    ignore_result=True,
)

Call a simple task¶

Start a Celery worker before-hand:

celery worker \
    --app example.celery:celery_app \
    --loglevel=info

In [ ]:

celery_app.send_task("tasks.say_hi", args=["Joe"])

Get a result from a task¶

In [ ]:

from datetime import datetime

def talking_clock() -> datetime:
    return datetime.now()

In [ ]:

celery_app.task(
    talking_clock,
    name="tasks.talking_clock",
    ignore_result=False
)

In [ ]:

from datetime import datetime
from celery.result import AsyncResult

talking_clock_task: AsyncResult = celery_app.send_task("tasks.talking_clock")
# store `talking_clock_task.id` somewhere for further use

In [ ]:

talking_clock_task_result = AsyncResult(talking_clock_task.id, app=celery_app)
now: datetime = talking_clock_task_result.get()

print(f"result: {now}")

Auto-retry failing tasks¶

In [ ]:

def failing_task() -> None:
    raise RuntimeError

In [ ]:

celery_app.task(
    failing_task,
    name="tasks.failing_task",
    ignore_result=True,
    autoretry_for=(RuntimeError,),
    max_retries=5,
    retry_backoff=True,
    retry_jitter=True,
)

In [ ]:

celery_app.send_task("tasks.failing_task")

Scheduling tasks¶

In [ ]:

def send_report(email: str) -> None:
    print(f"Sending report to {email}")

In [ ]:

celery_app.task(send_report, name="tasks.send_report", ignore_result=True)

In [ ]:

from celery import Celery, schedules

celery_app.conf.update(
    enable_utc=True,
    timezone="UTC",
    beat_schedule={
        "send-report-minutely": {
            "task": "tasks.send_report",
            "kwargs": {"email": "admin@domain.com"},
            "options": {"queue" : "default"},
            "schedule": schedules.crontab(
                minute="*",
                hour="*",
                day_of_week="*",
                day_of_month="*",
                month_of_year="*",
            ),
        },
    }
)

Start a Celery beat process (act as an autonomous producer):

celery beat \
    --app example.celery:celery_app \
    --loglevel=info

Warning: beware of distributed beat to only have one of the replicas executing tasks! (use alternative implementations)

Splitting tasks in queues¶

When starting a Celery worker, specify which queues to listen to:

celery worker \
    --app example.celery:celery_app \
    --queues=mailing,broadcast \
    --loglevel=info

In [ ]:

celery_app.send_task(
    "mailing.send_email",
    queue="mailing",
    kwargs={"to": "joe@doe.com"}
)

In [ ]:

celery.conf.update(
    task_routes={
        "mailing.*": "mailing",
    },
)

Pipelining tasks¶

Lots of possible workflows using the "canvas" feature, but the main ones are:

chain: chain tasks (with or without passing arguments)
group: parallelize tasks
chord: parallelize then executing a finalizing task

Eliminates the problem of a tasks wanting to call/wait other tasks!

In [ ]:

class User:
    name: str
    email: str

def get_all_users() -> List[User]:
    return [
        User(name="John Doe", email="john@doe.com"),
        User(name="Jane Doe", email="jane@doe.com"),
    ]

celery_app.task(
    get_all_users,
    name="users.get_all",
    ignore_result=False
)

In [ ]:

def send_newsletter_to_all(all_users: List[User]) -> None:
    for u in all_users:
        print(f"Sending newsletter to {u.name}: {u.email}")
    
celery_app.task(
    send_newsletter_to_all,
    name="mailing.send_newsletter_to_all",
    ignore_result=True
)

In [ ]:

from celery import chain, signature

def send_newsletter():
    user_sig = signature(
        "users.get_all",
        queue="users"
    )
    newsletter_sig = signature(
        "mailing.send_newsletter_to_all",
        queue="mailing"
    )
    task_chain = chain(user_sig | newsletter_sig)
    task_chain.apply_async()

celery_app.task(
    send_newsletter,
    name="tasks.send_newsletter",
    ignore_result=True
)

Things to consider¶

Python only (experimental Node.js client with node-celery)
No asyncio (async / await) support for tasks results (coming to celery >= 5.0.0)
Some regressions recently introduced either within celery or kombu, be careful

Talk: Python + Celery¶

About me¶

Romain Clement¶

Why asynchronous tasks?¶

Is it that hard?¶

Why Celery?¶

Celery Overview¶

Show me some code!¶

Initialize Celery¶

Define a simple task¶

Call a simple task¶

Get a result from a task¶

Auto-retry failing tasks¶

Scheduling tasks¶

Splitting tasks in queues¶

Pipelining tasks¶

Things to consider¶

Follow-up¶

Alternatives¶

Sources¶

Q&A¶