September 6, 2024

Customizing LLMs Through Astro

Moulay Zaidane Draidia Sr Data Scientist Laurel

Introducing Laurel

Laurel is an AI Timekeeping company platform for time professionals to record, review, and manage their time. Laurel’s current customers include Global 100 law firms and Big-4 Accounting firms (including Ernst & Young). Accurate and timely time tracking is essential for lawyers and accountants. For example, lawyers often have to track the time they spend on client-related tasks, such as: client meetings, court appearances, case research, and drafting legal documents. However, this process is often cumbersome, highly manual, and time consuming.

Laurel provides automation and precision to this process by collecting digital activities and presenting them into ready-to-review timesheets. Central to this automation is Airflow, which orchestrates the workflows needed to train and deploy Laurel’s machine learning models efficiently. The next section discusses Laurel’s GenAI strategy in more detail and how Airflow fits into that strategy.

Laurel’s GenAI Strategy

Laurel’s strategy to automate timesheets relies on allowing users to easily create billable entries from their work activity as identified by the Laurel Assistant. This is achieved by continuously learning how to present that work in an increasingly intuitive way to match users’ billing style. Collected activity includes emails, calendar meetings, teleconference calls, browsing history, local document editing, etc. Transforming these numerous digital activities into billable entries is done in successive steps, each leveraging an ensemble of models:

Clustering: work activities are grouped into meaningful units of work
Classification: each unit of of work is associated to the relevant project and assigned the appropriate work codes (read more here)
Summarization: a succinct natural language description is generated to communicate the work performed in adherence with compliance guidelines

While all these models differ, each core model creation step can be abstracted into the general framework outlined below. Specifically, dataset generation, model training, and model deployment are all orchestrated through Airflow, running on Astronomer’s fully managed platform, Astro.

Airflow provides many benefits as a core component of the modeling infrastructure; Laurel leverages Airflow to increase experimentation, enable model personalization, and provide scalable and cost effective model inferences.

Airflow for Personalization

While all of Laurel's models share a similar structure, each model has its own data and training requirement. This is especially important for security and compliance. The sensitive nature of the processed information requires strict data isolation policies both across, and in some cases, within firms. As a result, all of these models are trained on a firm by firm, if not user-by-user, basis. Airflow helps streamline this process by serving as a unified orchestration layer while enabling dynamic model retriggering based on data policies. This framework has proven extensible at the individual user level as well, which has proven particularly useful in ensuring no data leakage when providing users with autocomplete suggestions.

Additionally, there has been significant performance gains by personalizing models. This is largely motivated by the significant data drift associated with the ever evolving nature of work. Each new project comes with unique guidelines and requirements. Every time a user begins working on a new project, the Laurel Assistant needs to identify which work is relevant to this new project and adapt quickly to its billing style. Airflow makes adapting model retraining cadence particularly smooth. In fact, model finetuning, evaluation and deployment processes can be kept roughly constant with most of the personalization arising within the dataset generation task — keeping the rest of the DAGs unchanged.

Airflow for GenAI Cost Management

Writing an accurate and compliant summary is a critical step in generating time entries. Laurel makes use of large language models (LLMs) through a retrieval augmented generation (RAG) approach to compose such summaries. This method consists of dynamically generating prompts by integrating relevant context. Every time a summary is generated, Laurel draws from previously billed entries, company guidelines and detailed metadata from captured work to create a detailed prompt.

Deploying LLMs at such scale requires careful consideration, especially given prompt sizes in conjunction with the volume of work processed throughout the day. Airflow has proven to be a powerful asset in scaling this feature in a cost sensitive manner.

To manage costs, different models are used for inference at varying frequencies based on user behavior. While some users keep their time contemporaneously, others review and create their timesheet after a few days. These patterns inform which models need to be used for whom and when. For example, lighter and cheaper models are called during the day with additional daily DAG runs that are orchestrated to process the work holistically using more performant models. Frequency of these workflows can easily be configured in Airflow to match user behavior. This ensures that users receive an enhanced experience in a cost sensitive manner. This has resulted in a cost savings of over $40,000 per month on LLM API expenses for a single feature. Choosing the right models is facilitated by the fact that Airflow DAGs’ idempotency make them particularly effective for backfills. This provides a rapid way to explore new ideas like cohort level LLM finetuning, and ensures that compute cost is judiciously allocated.

Summary

Astro and Airflow serve as a foundational platform for LLM customization at Laurel, powering the personalization engine used to ensure strict data privacy and enhancing model performance all while doing so in a cost sensitive fashion. You are welcome to join Laurel at the Airflow Summit where more details around this work will be covered in detail.