Customizing LLMs Through Astro
5 min read |
Introducing Laurel
Laurel is an AI Timekeeping company platform for time professionals to
record, review, and manage their time. Laurel’s current customers include
Global 100 law firms and Big-4 Accounting firms (including Ernst & Young).
Accurate and timely time tracking is essential for lawyers and
accountants. For example, lawyers often have to track the time they spend
on client-related tasks, such as: client meetings, court appearances, case
research, and drafting legal documents. However, this process is often
cumbersome, highly manual, and time consuming.
Laurel provides automation and
precision
to this process by collecting digital activities and presenting them into
ready-to-review timesheets. Central to this automation is Airflow, which
orchestrates the workflows needed to train and deploy Laurel’s machine
learning models
efficiently.
The next section discusses Laurel’s GenAI strategy in more detail and how
Airflow fits into that strategy.
Laurel’s GenAI Strategy
Laurel’s strategy to automate timesheets relies on allowing users to
easily create billable entries from their work activity as identified by
the Laurel Assistant. This is achieved by continuously learning how to
present that work in an increasingly intuitive way to match users’ billing
style. Collected activity includes emails, calendar meetings,
teleconference calls, browsing history, local document editing, etc.
Transforming these numerous digital activities into billable entries is
done in successive steps, each leveraging an ensemble of models:
-
Clustering: work activities are grouped into meaningful units of work
-
Classification: each unit of of work is associated to the relevant
project and assigned the appropriate work codes (read more
here) -
Summarization: a succinct natural language description is generated to
communicate the work performed in adherence with compliance guidelines
While all these models differ, each core model creation step can be
abstracted into the general framework outlined below. Specifically,
dataset generation, model training, and model deployment are all
orchestrated through Airflow, running on Astronomer’s fully managed
platform, Astro.
Airflow provides many benefits as a core component of the modeling
infrastructure; Laurel leverages Airflow to increase experimentation,
enable model personalization, and provide scalable and cost effective
model inferences.
Airflow for Personalization
While all of Laurel’s models share a similar structure, each model has its
own data and training requirement. This is especially important for
security and compliance. The sensitive nature of the processed information
requires strict data isolation policies both across, and in some cases,
within firms. As a result, all of these models are trained on a firm by
firm, if not user-by-user, basis. Airflow helps streamline this process by
serving as a unified orchestration layer while enabling dynamic model
retriggering based on data policies. This framework has proven extensible
at the individual user level as well, which has proven particularly useful
in ensuring no data leakage when providing users with autocomplete
suggestions.
Additionally, there has been significant performance gains by
personalizing models. This is largely motivated by the significant data
drift associated with the ever evolving nature of work. Each new project
comes with unique guidelines and requirements. Every time a user begins
working on a new project, the Laurel Assistant needs to identify which
work is relevant to this new project and adapt quickly to its billing
style. Airflow makes adapting model retraining cadence particularly
smooth. In fact, model finetuning, evaluation and deployment processes can
be kept roughly constant with most of the personalization arising within
the dataset generation task — keeping the rest of the DAGs unchanged.
Airflow for GenAI Cost Management
Writing an accurate and compliant summary is a critical step in generating
time entries. Laurel makes use of large language models (LLMs) through a
retrieval augmented generation (RAG) approach to compose such summaries.
This method consists of dynamically generating prompts by integrating
relevant context. Every time a summary is generated, Laurel draws from
previously billed entries, company guidelines and detailed metadata from
captured work to create a detailed prompt.
Deploying LLMs at such scale requires careful consideration, especially
given prompt sizes in conjunction with the volume of work processed
throughout the day. Airflow has proven to be a powerful asset in scaling
this feature in a cost sensitive manner.
To manage costs, different models are used for inference at varying
frequencies based on user behavior. While some users keep their time
contemporaneously, others review and create their timesheet after a few
days. These patterns inform which models need to be used for whom and
when. For example, lighter and cheaper models are called during the day
with additional daily DAG runs that are orchestrated to process the work
holistically using more performant models. Frequency of these workflows
can easily be configured in Airflow to match user behavior. This ensures
that users receive an enhanced experience in a cost sensitive manner. This
has resulted in a cost savings of over $40,000 per month on LLM API
expenses for a single feature. Choosing the right models is facilitated by
the fact that Airflow DAGs’ idempotency make them particularly effective
for backfills. This provides a rapid way to explore new ideas like cohort
level LLM finetuning, and ensures that compute cost is judiciously
allocated.
Summary
Astro and Airflow serve as a foundational platform for LLM customization
at Laurel, powering the personalization engine used to ensure strict data
privacy and enhancing model performance all while doing so in a cost
sensitive fashion. You are welcome to join Laurel at the Airflow
Summit
where more details around this work will be covered in detail.