SlideShare a Scribd company logo
Airflow
Production tales
Eran Shemesh - Senior Big Data Developer
2
Pipeline
Airflow’s Architecture
Why?
4
Spark
Update
DB
Http
Spark
Update
DB
Send
emails
Http Spark
Update
DB
30m-50m 5m-10m
1h-1.5h
10 sec
1m-3m 20m-40m 10 sec
The cron way
0 * * * * 0 * * * *
15 * * * *
50 * * * *
0 * * * * 5 * * * * 55 * * * *
Why?
5
The cron way
■ Each valid flow takes more time than it should
■ Each job should be aware to the buffer from its execution time to its working time
■ In a case of a retry for a certain task in the flow, the whole flow can fail
■ What if the time buffer is sometimes not enough?
■ What if one of the system that runs a cron job was down for a run or more?
■ What if the input data to a flow was incorrect?
■ What if, for a product requirement change, I need to re-run the past X runs?
■ Visability
Why?
6
The airflow way
■ Tasks are really dependant on each other
■ Easily Scalable
■ Web UI
■ Can recover from downtime
■ Each valid flow takes more time than it should
■ Each job should be aware to the buffer from its execution time to its working time
■ In a case of a retry for a certain task in the flow, the whole flow can fail
■ What if the buffer is sometimes not enough?
■ What if one of the system that runs a cron job was down for a run or more?
■ What if the input data to a flow was incorrect?
■ What if, for a product requirement change, I need to re-run the past X runs?
Why?
7
The airflow way
■ An HTTP request to invoke job on databricks (SimpleHttpOperator)
■ Extract the databricks task_id from the response (PythonOperator)
■ Monitor task progress (HttpSensor) by task id
■ In case of success, get the result (SimpleHttpOperator)
■ Extract result from the HttpResponse (PythonOperator)
Hello Airflow
SimpleHttpOperator PythonOperator HttpSensor SimpleHttpOperator PythonOperator
Fyber - airflow best practices in production
Subdags
Use with caution!
■ An operator like any other, for self-running a group of tasks
■ Better visualisation
■ Reusable Components
■ Encapsulation
Sub - DAGs
// Previous code
■ There is no retry mechanism on a dag level, only on task level
■ Out of the box, a sub DAG does not retry well
■ We utilized the sub DAG’s on_retry_callback for it’s retry mechanism when needed
Retryable Sub Dags
Airflow’s Architecture
Sub dags - use with caution!
15
subdag task task subdag task taskWorker
Concurrency Level
task subdag task task
Sub dags - use with caution!
16
subdag subdag subdag subdag task taskWorker
Concurrency Level
task subdag task task
Sub dags - use with caution!
17
subdag subdag subdag subdag subdag subdagWorker
Concurrency Level
task subdag task task
Sub dags - use with caution!
18
subdag subdag subdag subdag task taskWorker
Thread pool
task subdag task task
task task task task
Airflow 10’s default solution:
SequentialExecutor ( One process to run them all)
Sub dags - use with caution!
19
subdag subdag subdag subdag subdag subdagWorker 1
Concurrency Level
task subdag task task
task subdag taskWorker 2
Concurrency Level
task taskWorker 3
Concurrency Level
Second option -
Add more workers!
Monitoring
And auto fixing...
21
Pipeline
Monitoring pipeline
22
A typical flow
Monitoring pipeline
23
Each task (or a group of tasks) be followed by a monitoring task
Monitoring pipeline
24
Each monitoring task is a group of tasks for monitoring and auto fixing
Building modules
25
Building modules
26
■ A template of tasks and dependencies between them
■ Using the template method design pattern, the module dictates the general flow, to be
implemented by different business logic subclasses
■ Most commonly used inside a sub dag, like in the monitoring example
DAG extensions
Building modules
27
Creating a template for a sets of tasks
Building modules
28
Further extending this template when needed
Building modules
29
Further extending this template when needed
Some dev
paradigms
Use case 1: Skipping daily tasks
31
■ Each hour calculates hourly aggregation and than daily agg
■ When fixing data or when the task runs are delayed, it’s unnecessary to calculate partial
daily aggregations
■ Using the ShortCircuitOperator, we check if the next execution should have happened
already
■ If it has, we skip all following tasks in the same dag run
Hourly and daily flow
32
Use case 1: Skipping daily tasks
Hourly and daily flow
33
Use case 1: Skipping daily tasks
Hourly and daily flow
Use case 1: Skipping daily tasks
34
Hourly and daily flow
Use case 2: Programatically clearing DAG
35
S3/{bucket_name}/day=23
S3/{bucket_name}/day=22
S3/{bucket_name}/day=21
S3/{bucket_name}/day=10
36
■ Creating a DAG for executing a single day’s flow
■ The scheduling for the above DAG would occur by another DAG (and not the Airflow’s scheduler)
■ The scheduling DAG would:
○ Create a new run for each day in the target DAG
○ Clear the target DAG runs for the previous 14 days
Use case 2: Programatically clearing DAG
37
Using another DAG to clear the above DAG for the last 14 days:
Use case 2: Programatically clearing DAG
Tips and best
practices
Tips and best practices
39
■ Create only idempotent tasks
■ Notice that the worker only creates an OS process for each task
■ Always use a retry on a task, the workers can fail!
■ Use connections to store passwords and secret keys (for encryption)
■ Notice that your python files gets executed constantly by the scheduler
■ Use a docker compose environment on your dev machine
Thanks!

More Related Content

PDF
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
Itai Yaffe
 
PDF
How I learned to time travel, or, data pipelining and scheduling with Airflow
Laura Lorenz
 
PPTX
Airflow Clustering and High Availability
Robert Sanders
 
PPTX
Airflow at lyft
Tao Feng
 
PDF
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Kaxil Naik
 
PPTX
Apache Airflow in Production
Robert Sanders
 
PDF
Building Better Data Pipelines using Apache Airflow
Sid Anand
 
PDF
AIRflow at Scale
Digital Vidya
 
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
Itai Yaffe
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
Laura Lorenz
 
Airflow Clustering and High Availability
Robert Sanders
 
Airflow at lyft
Tao Feng
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Kaxil Naik
 
Apache Airflow in Production
Robert Sanders
 
Building Better Data Pipelines using Apache Airflow
Sid Anand
 
AIRflow at Scale
Digital Vidya
 

What's hot (20)

PDF
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
PDF
Introduction to Apache Airflow
mutt_data
 
PPTX
Apache airflow
Pavel Alexeev
 
PDF
Apache Airflow
Knoldus Inc.
 
PDF
Airflow Best Practises & Roadmap to Airflow 2.0
Kaxil Naik
 
PDF
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Kaxil Naik
 
PDF
Airflow presentation
Ilias Okacha
 
PDF
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
Kaxil Naik
 
PPTX
Airflow 101
SaarBergerbest
 
PPTX
Airflow presentation
Anant Corporation
 
PPSX
Data Pipelines with Apache Airflow
Manning Publications
 
PPTX
Apache Airflow overview
NikolayGrishchenkov
 
PDF
Introducing Apache Airflow and how we are using it
Bruno Faria
 
PDF
Apache airflow
Purna Chander
 
PDF
Building an analytics workflow using Apache Airflow
Yohei Onishi
 
PDF
nginx + ansible로 점검모드 만들기
June Kim
 
PDF
Apache Airflow at Dailymotion
Germain Tanguy
 
PPTX
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Bolke de Bruin
 
PDF
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 
PDF
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Jarek Potiuk
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
Introduction to Apache Airflow
mutt_data
 
Apache airflow
Pavel Alexeev
 
Apache Airflow
Knoldus Inc.
 
Airflow Best Practises & Roadmap to Airflow 2.0
Kaxil Naik
 
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Kaxil Naik
 
Airflow presentation
Ilias Okacha
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
Kaxil Naik
 
Airflow 101
SaarBergerbest
 
Airflow presentation
Anant Corporation
 
Data Pipelines with Apache Airflow
Manning Publications
 
Apache Airflow overview
NikolayGrishchenkov
 
Introducing Apache Airflow and how we are using it
Bruno Faria
 
Apache airflow
Purna Chander
 
Building an analytics workflow using Apache Airflow
Yohei Onishi
 
nginx + ansible로 점검모드 만들기
June Kim
 
Apache Airflow at Dailymotion
Germain Tanguy
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Bolke de Bruin
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Jarek Potiuk
 
Ad

Similar to Fyber - airflow best practices in production (20)

PDF
Airflow Intro-1.pdf
BagustTriCahyo1
 
PDF
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Noam Elfanbaum
 
PDF
Airflow - Insane power in a Tiny Box
Dovy Paukstys
 
PDF
Apache Airflow® Best Practices: DAG Writing
Aggregage
 
PPTX
airflow web UI and CLI.pptx
VIJAYAPRABAP
 
PDF
Data Pipelines with Apache Airflow 1st Edition Bas P Harenslak Julian Rutger ...
uzjrbdj376
 
PPSX
Introduce Airflow.ppsx
ManKD
 
PPTX
airflowpresentation1-180717183432.pptx
VIJAYAPRABAP
 
PDF
Data Pipelines with Apache Airflow 1st Edition Bas P Harenslak Julian Rutger ...
awuahmeraiga
 
PDF
Airflow introduction
Chandler Huang
 
PPTX
DataPipelineApacheAirflow.pptx
John J Zhao
 
PDF
Managing transactions on Ethereum with Apache Airflow
Michael Ghen
 
PPTX
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
PDF
Apache Airflow
Knoldus Inc.
 
PDF
Building Automated Data Pipelines with Airflow.pdf
abhaykm804
 
PDF
Airflow tutorials hands_on
pko89403
 
PPTX
Airflow
JitheeshaThankachan
 
PDF
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Burasakorn Sabyeying
 
PPTX
Apache AirfowAsaSAsaSAsSas - Session1.pptx
MuhamedAhmed35
 
PPTX
Apache Airdrop detailed description.pptx
prince07031999
 
Airflow Intro-1.pdf
BagustTriCahyo1
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Noam Elfanbaum
 
Airflow - Insane power in a Tiny Box
Dovy Paukstys
 
Apache Airflow® Best Practices: DAG Writing
Aggregage
 
airflow web UI and CLI.pptx
VIJAYAPRABAP
 
Data Pipelines with Apache Airflow 1st Edition Bas P Harenslak Julian Rutger ...
uzjrbdj376
 
Introduce Airflow.ppsx
ManKD
 
airflowpresentation1-180717183432.pptx
VIJAYAPRABAP
 
Data Pipelines with Apache Airflow 1st Edition Bas P Harenslak Julian Rutger ...
awuahmeraiga
 
Airflow introduction
Chandler Huang
 
DataPipelineApacheAirflow.pptx
John J Zhao
 
Managing transactions on Ethereum with Apache Airflow
Michael Ghen
 
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Apache Airflow
Knoldus Inc.
 
Building Automated Data Pipelines with Airflow.pdf
abhaykm804
 
Airflow tutorials hands_on
pko89403
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Burasakorn Sabyeying
 
Apache AirfowAsaSAsaSAsSas - Session1.pptx
MuhamedAhmed35
 
Apache Airdrop detailed description.pptx
prince07031999
 
Ad

More from Itai Yaffe (20)

PDF
Mastering Partitioning for High-Volume Data Processing
Itai Yaffe
 
PDF
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Itai Yaffe
 
PDF
Lessons Learnt from Running Thousands of On-demand Spark Applications
Itai Yaffe
 
PPTX
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
PDF
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Itai Yaffe
 
PDF
Evaluating Big Data & ML Solutions - Opening Notes
Itai Yaffe
 
PDF
Big data serving: Processing and inference at scale in real time
Itai Yaffe
 
PDF
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Itai Yaffe
 
PDF
Unleashing the Power of your Data
Itai Yaffe
 
PDF
Data Lake on Public Cloud - Opening Notes
Itai Yaffe
 
PDF
Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the ...
Itai Yaffe
 
PDF
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
Itai Yaffe
 
PDF
Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)
Itai Yaffe
 
PDF
Introducing Kafka Connect and Implementing Custom Connectors
Itai Yaffe
 
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
PDF
Scalable Incremental Index for Druid
Itai Yaffe
 
PDF
Funnel Analysis with Spark and Druid
Itai Yaffe
 
PDF
The benefits of running Spark on your own Docker
Itai Yaffe
 
PDF
Optimizing Spark-based data pipelines - are you up for it?
Itai Yaffe
 
PDF
Scheduling big data workloads on serverless infrastructure
Itai Yaffe
 
Mastering Partitioning for High-Volume Data Processing
Itai Yaffe
 
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Itai Yaffe
 
Lessons Learnt from Running Thousands of On-demand Spark Applications
Itai Yaffe
 
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Itai Yaffe
 
Evaluating Big Data & ML Solutions - Opening Notes
Itai Yaffe
 
Big data serving: Processing and inference at scale in real time
Itai Yaffe
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Itai Yaffe
 
Unleashing the Power of your Data
Itai Yaffe
 
Data Lake on Public Cloud - Opening Notes
Itai Yaffe
 
Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the ...
Itai Yaffe
 
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
Itai Yaffe
 
Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)
Itai Yaffe
 
Introducing Kafka Connect and Implementing Custom Connectors
Itai Yaffe
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
Scalable Incremental Index for Druid
Itai Yaffe
 
Funnel Analysis with Spark and Druid
Itai Yaffe
 
The benefits of running Spark on your own Docker
Itai Yaffe
 
Optimizing Spark-based data pipelines - are you up for it?
Itai Yaffe
 
Scheduling big data workloads on serverless infrastructure
Itai Yaffe
 

Recently uploaded (20)

PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Presentation on animal welfare a good topic
kidscream385
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 

Fyber - airflow best practices in production

  • 1. Airflow Production tales Eran Shemesh - Senior Big Data Developer
  • 4. Why? 4 Spark Update DB Http Spark Update DB Send emails Http Spark Update DB 30m-50m 5m-10m 1h-1.5h 10 sec 1m-3m 20m-40m 10 sec The cron way 0 * * * * 0 * * * * 15 * * * * 50 * * * * 0 * * * * 5 * * * * 55 * * * *
  • 5. Why? 5 The cron way ■ Each valid flow takes more time than it should ■ Each job should be aware to the buffer from its execution time to its working time ■ In a case of a retry for a certain task in the flow, the whole flow can fail ■ What if the time buffer is sometimes not enough? ■ What if one of the system that runs a cron job was down for a run or more? ■ What if the input data to a flow was incorrect? ■ What if, for a product requirement change, I need to re-run the past X runs? ■ Visability
  • 6. Why? 6 The airflow way ■ Tasks are really dependant on each other ■ Easily Scalable ■ Web UI ■ Can recover from downtime
  • 7. ■ Each valid flow takes more time than it should ■ Each job should be aware to the buffer from its execution time to its working time ■ In a case of a retry for a certain task in the flow, the whole flow can fail ■ What if the buffer is sometimes not enough? ■ What if one of the system that runs a cron job was down for a run or more? ■ What if the input data to a flow was incorrect? ■ What if, for a product requirement change, I need to re-run the past X runs? Why? 7 The airflow way
  • 8. ■ An HTTP request to invoke job on databricks (SimpleHttpOperator) ■ Extract the databricks task_id from the response (PythonOperator) ■ Monitor task progress (HttpSensor) by task id ■ In case of success, get the result (SimpleHttpOperator) ■ Extract result from the HttpResponse (PythonOperator) Hello Airflow SimpleHttpOperator PythonOperator HttpSensor SimpleHttpOperator PythonOperator
  • 11. ■ An operator like any other, for self-running a group of tasks ■ Better visualisation ■ Reusable Components ■ Encapsulation Sub - DAGs
  • 13. ■ There is no retry mechanism on a dag level, only on task level ■ Out of the box, a sub DAG does not retry well ■ We utilized the sub DAG’s on_retry_callback for it’s retry mechanism when needed Retryable Sub Dags
  • 15. Sub dags - use with caution! 15 subdag task task subdag task taskWorker Concurrency Level task subdag task task
  • 16. Sub dags - use with caution! 16 subdag subdag subdag subdag task taskWorker Concurrency Level task subdag task task
  • 17. Sub dags - use with caution! 17 subdag subdag subdag subdag subdag subdagWorker Concurrency Level task subdag task task
  • 18. Sub dags - use with caution! 18 subdag subdag subdag subdag task taskWorker Thread pool task subdag task task task task task task Airflow 10’s default solution: SequentialExecutor ( One process to run them all)
  • 19. Sub dags - use with caution! 19 subdag subdag subdag subdag subdag subdagWorker 1 Concurrency Level task subdag task task task subdag taskWorker 2 Concurrency Level task taskWorker 3 Concurrency Level Second option - Add more workers!
  • 23. Monitoring pipeline 23 Each task (or a group of tasks) be followed by a monitoring task
  • 24. Monitoring pipeline 24 Each monitoring task is a group of tasks for monitoring and auto fixing
  • 26. Building modules 26 ■ A template of tasks and dependencies between them ■ Using the template method design pattern, the module dictates the general flow, to be implemented by different business logic subclasses ■ Most commonly used inside a sub dag, like in the monitoring example DAG extensions
  • 27. Building modules 27 Creating a template for a sets of tasks
  • 28. Building modules 28 Further extending this template when needed
  • 29. Building modules 29 Further extending this template when needed
  • 31. Use case 1: Skipping daily tasks 31 ■ Each hour calculates hourly aggregation and than daily agg ■ When fixing data or when the task runs are delayed, it’s unnecessary to calculate partial daily aggregations ■ Using the ShortCircuitOperator, we check if the next execution should have happened already ■ If it has, we skip all following tasks in the same dag run Hourly and daily flow
  • 32. 32 Use case 1: Skipping daily tasks Hourly and daily flow
  • 33. 33 Use case 1: Skipping daily tasks Hourly and daily flow
  • 34. Use case 1: Skipping daily tasks 34 Hourly and daily flow
  • 35. Use case 2: Programatically clearing DAG 35 S3/{bucket_name}/day=23 S3/{bucket_name}/day=22 S3/{bucket_name}/day=21 S3/{bucket_name}/day=10
  • 36. 36 ■ Creating a DAG for executing a single day’s flow ■ The scheduling for the above DAG would occur by another DAG (and not the Airflow’s scheduler) ■ The scheduling DAG would: ○ Create a new run for each day in the target DAG ○ Clear the target DAG runs for the previous 14 days Use case 2: Programatically clearing DAG
  • 37. 37 Using another DAG to clear the above DAG for the last 14 days: Use case 2: Programatically clearing DAG
  • 39. Tips and best practices 39 ■ Create only idempotent tasks ■ Notice that the worker only creates an OS process for each task ■ Always use a retry on a task, the workers can fail! ■ Use connections to store passwords and secret keys (for encryption) ■ Notice that your python files gets executed constantly by the scheduler ■ Use a docker compose environment on your dev machine