Airflow in Action: Insights From Scaling Multiple Workloads Across A Shared Environment at Rakuten Kobo
Founded in 2009 in Canada and acquired by global technology conglomerate Rakuten in 2012, Kobo offers an audio and eBook catalog of over five million titles, accessed via its eReaders and multi-platform web and mobile apps.
At the 2024 Airflow Summit, Spencer Tollefson, Team Lead Data Engineering at Rakuten Kobo shared his company’s approach to enabling multiple teams to self-service their Airflow needs in a shared environment.
The session outlined their strategy for delineating responsibilities between teams, building guardrails for new developers, and implementing scalable monitoring and access control systems. The talk concluded with insights into their achievements to date and plans for evolving the company’s Apache Airflow® usage.
Airflow at Kobo: A Shared Responsibility
Rakuten Kobo’s data engineering team maintains a single Apache Airflow environment supporting multiple business teams including data science, finance, marketing, and customer support. Each team has a developer responsible for authoring Airflow DAGs, but these developers bring business domain expertise. They are often not Airflow experts.
The responsibilities across teams in the company are clearly delineated:
- Data Engineers: Maintain the Airflow environment, perform code reviews for best practices, and manage resource usage and deployments.
- Business Developers: Write and maintain DAGs with support from data engineers through pair programming and code reviews. They focus on business logic while relying on data engineers for Airflow-specific optimizations.
- Operations Team: Handle the infrastructure setup, including hosting on-premises Kubernetes, implementing access controls, and secrets management.
Figure 1: Rakuten Kobo’s shared Airflow environment supporting multiple business teams. Image source.
Guardrails for New Developers
Kobo has implemented guardrails to support new developers and maintain consistency across the shared environment:
- Comprehensive Documentation: A detailed “Using Airflow at Kobo” README provides developers with essential guidance.
- Code Linters and Pre-Commit Hooks: Tools like pre-commit automate syntax and formatting checks, ensuring code quality before review.
- Local Development Environment: Developers can spin up Docker environments identical to the production setup in minutes, enabling quick iteration.
- Staging Environment: A staging environment mirrors production, allowing teams to test their code in a realistic setting without impacting live systems.
Alerts, Monitoring, and Access Control
Kobo’s team-based alerting system ensures rapid response to pipeline failures. All DAGs are required to implement Airflow’s on_failure_callback methods, triggering alerts to team-specific Slack channels via PagerDuty. This setup facilitates efficient collaboration between data engineers and business developers when issues arise.
To avoid cross-team interference, Kobo codifies access control at the DAG level using Airflow’s access_control parameter. Each team has permissions only for their DAGs, enforcing clear boundaries within the shared environment. This setup is powered by integration with Active Directory and custom roles for each team.
Figure 2. Codifying DAG access controls to enforcing isolation between different teams in the shared Airflow environment. Image source.
Achievements and the Road Ahead
By streamlining responsibilities and building robust guardrails, Rakuten Kobo has empowered their teams to efficiently self-service their Airflow workflows. This approach has minimized bottlenecks, ensured code quality, and improved collaboration between teams.
Looking forward, Spencer and team are exploring new ways to further enhance their Airflow environment:
- YAML-based DAG Authoring: Using tools like Astronomer’s DAG Factory to abstract Python coding, further simplifying DAG creation for developers.
- Multitenancy: With emerging first-class multi-tenancy support in Airflow, Kobo aims to explore the latest capabilities that will provide even greater separation and scalability for their teams.
Next Steps
To learn more about how Kobo has scaled Airflow usage across its organization, watch the replay of Spencer’s Summit session Empowering More Teams in your Organization to Self-service their Airflow Needs.
The upcoming Airflow 3.0 release will provide more features for those teams wanting to run multi-tenant environments. This includes task isolation—providing stronger security and simplified multi-tenancy with tasks no longer having direct access to the Airflow metadatabase. Additionally for each team running Airflow in a multi-tenant environment, Astro Observe defines clear ownership for data assets and data products, ensuring faster remediation when errors or issues are encountered.
The best way to get started with Airflow and to be ready for the 3.0 release is to use Astro, the industry’s leading Airflow managed service. You can start out with the free Astro trial here.