Generative AI's effectiveness is heavily reliant on the quality and orchestration of data. Though increasingly versatile and knowledgeable, to be useful to the business, Generative AI (GenAI) models need access to rich, proprietary datasets and real-time operational data streams to create truly differentiated applications.
As discussed in our earlier post, The Dividing Line Between Generative AI Success and Failure, Apache Airflow®, a standard for data orchestration, plays a crucial role in managing complex data and machine learning workflows, enabling teams to build GenAI apps grounded with their enterprise data. The post also highlighted a number of engineering teams already using Apache Airflow®, managed by Astronomer's Astro data platform, to release enterprise grade GenAI apps faster with higher quality and at lower cost.
One of the most common questions we get asked when discussing data orchestration for GenAI is how to get started. That is what our new GenAI Cookbook is designed to answer.
Why a cookbook?
As state-of-the-art in AI advances, the stack of technologies needed to build an enterprise grade GenAI application is complex and rapidly evolving. Understanding how and where data orchestration integrates into the stack was the primary driver behind developing the cookbook.
In the cookbook, we demonstrate how Airflow is the foundation for the reliable delivery of AI applications through six common GenAI use cases:
- Support automation
- E-commerce product discovery
- Product insight from customer reviews
- Customer churn risk analysis
- Legal document summarization and categorization
- Dynamic cluster provisioning for image creation
For each use case we discuss its benefits to the business along with common technical challenges before presenting a detailed reference architecture.
Each reference architecture is built on a full stack of GenAI technologies — from embedding and foundation models to vector databases, search engines, retrieval frameworks, and cloud services. Don’t worry if you don’t see your own preferred technology included in a specific reference architecture. Because the Astronomer Registry curates Airflow providers for many components of the AI stack and Airflow allows for any custom Python code to run in a task, you can easily swap out one technology or cloud platform for your preferred option.
A few tasters from the cookbook
We’ve worked to incorporate a cross section of the most common generative AI use cases we encounter in the community. To give you a taster of what to expect, we’ve extracted two examples below.
Support Automation
The first use case is an example of conversational AI using GenAI to power a user-facing chatbot for answering support questions. Rather than showcase a simple prototype, in our reference architecture the chatbot learns from interactions to continuously improve its performance.