Remarks on “Setting Up Celery, Flower, & RabbitMQ for Airflow”
Automating a useful article
Introduction
One of my primary learning resources is Medium. Here I spend a great deal of my day and on the feed aggregators Feedly and Inoreader where I maintain my Data and Developer related feeds respectively. They are very important to me since I find very interesting articles of tutorial nature. One of these was that one. It was long in my TODO list but now that I am switching jobs I had the opportunity to follow it line by line, word by word. Before proceeding with this article you are strongly advised to read that article. At least quickly. Our purpose is to automate the manual steps of that article. This article is the documentation of my adventure.
My first reading
On my first read I quickly realized that starting RabbitMQ and MySQL should be completely dockerized otherwise I run the risk of getting lost with the details. So I quickly downloaded the bitnami docker-compose of MySQL. I opted to “enhance” it by adding a section for the docker image of RabbitMQ. The instructions were clear. Plus, it has a useful section on what should be mapped to a docker volume and voilà. My first docker compose file that starts both RabbitMQ and MySQL as the article requires was a reality. No erland and no other downloads.
On my Mac laptop, Docker Desktop reports
So now I was able to start these services and follow along the article. Soon I was able to use Celery and Flower with Airflow. Quickly, though, I realized that I had to start lots of Airflow services manually. This is a suboptima solution because:
- It gets boring and impairs the learning experience
- Automation is what people are after
- Reproducibility is not an option any more
My second reading
Obviously the very helpful folks of the Airflow project had a nice docker compose for the beginners. With a catch. The article is about RabbitMQ + MySQL and Celery/Flower. The compose file though is about Redis + PostgrSQL and Celery/Flower. So, it is time to “modify” the official recipe with the urls of the article and incorporate my previous docker-compose contributed sections. This way I came up with a comparable convenient solution. The airflow.cfg portions of the article should be incorporated in the docker-compose and after a while I arrived at this solution. Of course, the airflow.cfg coreesponding environment variables, must change, in order to refer to the proper service url and not just localhost. The mappings are
- mysql+pymysql://sql_username:sql_password@localhost/airflow_db -> mysql+pymysql://sql_username:sql_password@mysql/airflow_db
- amqp://admin:admin@localhost/ -> amqp://admin:admin@rabbitmq
- db+mysql+pymysql://sql_username:sql_password@localhost/airflow_db -> db+mysql+pymysql://sql_username:sql_password@mysql/airflow_db
Last but not least, the “depends_on” section needed modifications too! Please do not run it yet, it is not working correctly.
This blind incorporation has 2 big flaws. The first is in line 106. It should be commented out. The second is that the Docker image does not have mysql by default. We need to extend the image for MySQL, comment out line 53 and uncomment line 54. The extended Dockerfile is
Now we can run our docker-compose.
docker-compose build && docker-compose --profile flower up
We cross fingers and we fail 😞!!!!
dependency failed to start: container for service "rabbitmq" has no healthcheck configured
We need to add a health check to rabbitmq. The documentation has a hint. We can use “rabbitmq-diagnostics -q ping”. I tried first with the typically used way
It did not work out. It has to do with docker compose version. So I switched to this one which seems to work fine
Unfortunately, now, MySQL is reported unhealthy, even with the bitnami code. So I did something similar:
Now the cluster starts fine (the code is here) . But ... is it working fine?
Manual tests
The first manual test is a simple run of the Getting Started example of the Celery Excutor. Here is the code that corresponds to our docker image. File is tasks.py
Please run it with
celery -A tasks worker --loglevel=INFO
and in the same folder as tasks.py open a python interpeter so as to pick up the file. By following the “Getting Started” guide. You should see this after execution.
➜ celery-airflow-medium-article python3
Python 3.11.2 (main, Feb 15 2023, 18:41:04) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tasks import add
>>> result = add.delay(4, 4)
>>> result.ready()
True
>>> result.get(timeout=1)
8
>>>
and by going to the local Flower server tasklist:
Do you see at the top of our task? Yay!!! Also you can follow the second part of the original article. Don’t forget to throw “celery_executor_demo.py” at the dags folder to be picked up, it is mapped through docker compose file. Here you are (at the top, username/password = airflow/airflow)
I also suggest to have a look here too.
Concluding
I learnt some new tricks by trying to automate this useful article while learning. I am very interested in having an Airflow local setup and I wanted to have 2 ways to run it. Now, you also have two ways to run it. The big gain here is the smooth learning experience. As usual the code is on Github. Feel free to download it and play with it. Also report problems and read the two beautiful articles I mentioned.