Exploring Apache Airflow on Different Cloud Providers

SOUTHWORKS dev team discuss deployment options for Apache Airflow in the cloud and which provider might be the best fit for your use case.

By Cristian Uehara, Eduardo Aragon and Juan Alejandro Arguello, SOUTHWORKS

Introduction

As the adoption of Apache Airflow grows, so does the variety of its deployment options in the cloud. Among the leaders in this domain are Azure's Managed Airflow, Google's Cloud Composer, Amazon's Managed Workflows for Apache Airflow (MWAA), and the Astronomer platform. Each provider brings its unique spin on Airflow, tailoring its services to meet the specific needs of its user base.

In this article, we aim to give you a clearer perspective on which might be the best fit for your specific use case.

Architecture

Azure managed Airflow

Azure's managed service offers a streamlines Airflow experience tailored to its cloud ecosystem.

  • Core functionality: Managed version of airflow hosted on Azure Data Factory.
  • Setup: Flexible choice of Airflow version for easy initialization
  • Scalability: Automatically adjusts resources based on demand
  • Security: Backed by Azure Active Directory; robust monitoring and alerting capabilities

Azure Managed Airflow Architecture

Cloud Composer

Google Cloud's managed Airflow solution emphasizes adaptability, with multiple architectural optinos to cater to varying needs.

  • Architecture Flexibility: Offers three configurations: Public IP, Private IP, and Highly Resilient Private IP
  • Redundancy: The highly resiliant private IP minimizes single-point-of-failure risks
  • Database: Utilizes Cloud SQL with primary and standby instances for high reliability
  • Distributed Components: Paired configurations of Airflow schedulers, web servers and triggers (if enabled) spread across zones.
  • Workers: At least two instances dispersed between zones for uninterrupted operations during outages

Highly resilient Private IP Cloud Composer environment resources in the tenant project and the customer project
Large-scale network setup in a non-shared VPC scenario

MWAA (Managed Workflows for Apache Airflow)

Amazon Web Services provides a dedicated environment for Airflow, deeply integrated with its suite of tools and services.

  • Execution: Scheduler and Workers operate within AWS Fargate containers, linked to Amazon VCP private subnets
  • Metabase: Each MWAA environment has its dedicated metabase managed by AWS
  • Access: Provides options for public or private network acccess, governed by AWS's Identity and Access Management

MWAA Infrastructure

Astronomer

This platform is designed around simplicity and efficiency, aiming to make Apache Airflow management an effortless task.

  • UI: Astro UI offers a web-based interface for easy management
  • Command-Line Interface: Astro CLI offers an interactive experience
  • Houston: Manages workflows and provides a GraphQL API
  • Commander: Interfaces with Kubernetes and Helm
  • Load Balancing: Nginx handles traffic and service discovery
  • Communication: NATS/ STAN enables inter-component messaging
  • Container Management: Dedicated Docker container registry
Astronomer Platform K8s components

Network and Security

All the deployments are aligned on:

  • Private endpoints
  • RBAC for managing access to Airflow resources
  • Audit logging for tracing all changes to Airflow resources

Logging and Monitoring

Azure Managed Airflow: Offers robust capabilities or monitoring and logging workflows. By integrating with Azure Monitor and Azure Log Analytics, users gain valuable insights into their Airflow environment. Crucial performance metrics like task execution times, DAG run statuses, and resource utilization can be configured based on these metrics, ensuring proactive workflow management. The integration with Azure Application Insights provides even more detailed telemetry data, streamlinign troubleshooting and pipeine optimization.

Cloud Composer: Seamlessly embeds logging and monitoring into workflow managment. Leveraging Stackdriver Monitoring, users gain real-time visibility into DAG performance metrics, task durations and resource usage. Custom dashboards can be crafted, and alerts can be set to address specific thresholds promptly. Stackdriver Logging centralizes access to logs, enabling insights into DAG run history, task outputs, and potential issues. Cloud Monitoring and Logging together bolster the reliability and efficiency of Composer workflows.

MWAA (Managed Workflows for Apache Airflow): Delivers robust logging and monitoring features for Apache Airflow workflows. Amazon CloudWatch serves as a tool to collect, visualize, and analyse performance metrics, task execution details and durations. CloudWatch Alarms allow notifications for predefined events or thresholds. The integration with CloudWatch Logs streamlines the review of logs associated with DAGs, tasks, and Airflow components, all from one central location. The combination of CloudWatch Metrics and Logs empowers efficient monitoring, diagnosing adn resolution of workflow issues.

Astronomer: Provides comprehensive logging adn monitoring capabilities for Apache Airflow deployments. Through integration with Prometheus and Grafana, users can capture and visualize intricate performance metrics, task execution times, and resource consumption. Customizable Grafana dashboards enable focussed monitoring of key Airflow aspects. Astronomer also supports centralized log collection via integrations with tools like ELK (Elasticsearch, Logstash, Kibana), simplifying the analysis of DAG run history, task logs, and errors. This integrated approach ensures effective managemment and optimization of Airflow workflows.

Conclusion

When is Azure | AWS | GCP Managed Airflow a better alternative than Astronomer?

If you want a fully managed service, a Managed Apache Airflow service takes care of all the underlying infrastructure and operations, so you can focus on your data pipelines. Also, if you already have all your systems running in that particular cloud provider the integration with the ecosystem is seamless.

When is Astronomer a better fit than the rest?

Astronomer shines in multi-cloud scenarios, especially if you're aiming for Cloud Agnostic deployments. Its Astro Hybrid model, as visually depicted above, makes it an ideal choice for implementing strategies in Kubernetes Multi-cloud scenarios. If you're looking to abstract away some complexities while retaining the flexibility of multi-cloud deployments, Astronomer might be your best bet.

Finally, from the table below we can conclude the granularity provided by Astronomer allows you to have an edge on Logging and Monitoring as well as Scaling and Tunning. But id you are not exploiting them, you could be well served by the other Managed Services.