How should you adopt LLMs? @ Irrational Exuberance

Hi folks,

This is the weekly digest for my blog, Irrational Exuberance. Reach out with thoughts on Twitter at @lethain, or reply to this email.


Posts from this week:

- How should you adopt LLMs?


How should you adopt LLMs?

Whether you’re a product engineer, a product manager, or an engineering executive, you’ve probably been pushed to consider using Large Language Models (LLM) to extend your product or enhance your processes. 2023-2024 is an interesting era for LLM adoption, where these capabilities have transitioned into the mainstream, with many companies worrying that they’re falling behind despite the fact that most integrations appear superficial.

That context makes LLM adoption a great topic for a strategy case study. This document is an engineering strategy document determining how a hypothetical company, Theoretical Ride Sharing, could adopt LLMs.

Building out the scenario a bit before diving into the strategy: Theoretical has 2,000 employees, 300 of which are software engineers. They’ve raised $400m, are doing $50m in annual revenue, and are operating in 200 cities across North America and Europe. They are a ride sharing business, similar to Uber or Lyft, but have innovated on the formula by using larger vehicles (also known as, they’ve reinvented public transit).


This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.

Reading this document

To apply this strategy, start at the top with Policy. To understand the thinking behind this strategy, read sections in reserve order, starting with Explore, then Diagnose and so on. Relative to the default structure, this document has been refactored in two ways to improve readability: first, Operation has been folded into Policy; second, Refine has been embedded in Diagnose.

More detail on this structure in Making a readable Engineering Strategy document.

Policy

Our combined policy for using LLMs at Theoretical Ride Sharing are:

  • Develop an LLM-backed process for verifying I-9 and US Driver License documents such that we can wholly automate driver onboarding in the United States. Moving from an average onboarding delay of seven days to near-instant onboarding will increase driver supply and allow us to reprioritize the team on servicing rider complaints, which are a major source of concern.

    Verifying I-9 Forms and US Drivers Licenses will be directly useful for accelerating onboarding, and also establish the framework for us to perform document extraction on in other jurisdictions outside the US to the extent that this experiment outperforms our current hybrid automation/services model for onboarding.

    Report on progress monthly in Exec Weekly Meeting, coordinated in #exec-weekly

  • Start with Anthropic. We use Anthropic models, which are available through our existing cloud provider via AWS Bedrock. To avoid maintain multiple implementations, where we view the underlying foundational model quality to be somewhat undifferentiated, we are not looking to adopt a broad set of LLMs at this point.

    Exceptions will be reviewed by the Machine Learning Review in #ml-review

  • Developer experience team (DX) must offer at least one LLM-backed developer productivity tool. This tool should enhance the experience, speed, or quality of writing software in TypeScript. This tool should help us develop our thinking for next year, such that we have conviction increasing (or decreasing!) our investment. This tool should be available to all engineers. Adopting one tool is the required baseline, if DX identifies further interesting tools, e.g. Github Copilot, they are empowered to bring the request to the Engineering Exec team for review. Review will focus on balancing our rate of learning, vendor cost, and data security. We’ve modeled options for measuring LLMs impact on developer experience.

    Vendor approvals to be reviewed in #cto

  • Internal Toolings team (INT) must offer at least one LLM-backed ad-hoc prompting tool. This tool should support arbitrary non-engineering use cases for LLMs, such as text extraction, rewriting notes, and so on. It must be usable with customer data while also honoring our existing data processing commitments. This tool should be available to all employees.

    Vendor approvals to be reviewed in #coo

  • Refresh policy in six months. Our foremost goal is to learn as quickly as possible about a new domain where we have limited internal expertise, then review whether we should increase our investment afterwards.

    Flag questions and suggestions in #cto

Diagnose

The synthesis of the problem at hand regarding how we use LLMs at Theoretical Ride Sharing is:

  1. There are, at minimum, three distinct needs that folks internally are asking us to solve (either separately or with a shared solution):

    1. productivity tooling for non-engineers, e.g. ad-hoc document rewriting,document summarization
    2. productivity tooling for engineers, e.g. advanced autocomplete tooling like Github Copilot
    3. product extensions, e.g. high-quality document extraction in driver onboarding workflows
  2. Of the above, we see product extensions are potential strategic differentiation, and the other two as workflow optimizations that improve our productivity but don’t necessarily differentiate us from wider industry. Some of the opportunities for strategic differentiation we see are:

    1. Faster driver onboarding by processing driver documentation without human involvement, making it possible to bring new driver supply online more quickly, particularly as we move into new regions. We’ve sized the potential impact by developing a model of faster driver onboarding
    2. Improved customer support by increasing the response speed and quality of our responses to customer inquiries
  3. We currently have limited experience or expertise in using LLMs in the company and in the industry. Prolific thought leadership to the contrary, there are very few companies or products using LLMs in scaled, differentiated ways. That’s currently true for us as well

  4. We want to develop our expertise without making an irreversible commitment. We think that our internal expertise is a limiter for effective problem selection and utilization of LLMs, and that developing our expertise will help us become more effective in iterative future decisions on this topic. Conversely, we believe that making a major investment now, prior to developing our in-house expertise, would be relatively high risk and low reward given no other industry players appear to have identified a meaningful advantage at this point

  5. Switching across foundational models and foundational model providers is cheap. This is true both economically (low financial commitment) and from an integration cost perspective (APIs and usage is largely consistent across providers)

  6. Foundational models and providers are evolving rapidly, and it’s unclear how the space will evolve. It’s likely that current foundational model providers will train one or two additional generations of foundational models with larger datasets, but at some point they will become cost prohibitive to train (e.g. the next major version of OpenAI or Anthropic models seem likely to cost $500m+ to train). Differentiation might move into developer-experience at that point. Open source models like LLaMa might become significantly cost-advantaged. Or something else entirely. The future is wide open.

    We’ve built a Wardley map to understand the possible evolution of the foundational model ecosystem.

  7. Training a foundational model is prohibitively expensive for our needs. We’ve raised $400m, and training a competitive foundational model would cost somewhere between $3m to $100m to match the general models provided by Anthropic or OpenAI

Explore

Large Language Models operate on top of a foundational model. Training these foundational models is exceptionally expensive, and growing more expensive over time as competition for more sophisticated models accelerates. Meta allegedly spent $20-30m training LLaMa 2, up from about $3m training costs for LLaMa 1. OpenAI’s GPT-4 allegedly cost $100m to train. With some nuance related to the quality of corpus and its relevance to the task at hand, larger models outperform smaller models, so there’s not much incentive to train a smaller foundational model unless you have a large, unique dataset to train against, and even in that case you might be better off fine-tuning or in-context learning (ICL).

Anthropic charges between $0.25 and $15 per million tokens of input, and a bit more for output tokens. OpenAI charges between $0.50 and $60 per million tokens of input, and a bit more for output tokens. The average English word is about 1.3 tokens, which means you can do a significant amount of LLM work while spending less than most venture funded startups spend on snacks.

There’s significant debate on whether LLMs have reached a point where their performance improvements will slow. Much like the ongoing debate around whether Moore’s Law has died, it’s unclear how much LLM performance will improving going forward. From a cost to train perspective, it’s unlikely that companies can continue to improve foundational models merely by spending more money on compute. A few companies can tolerate a $1B training cost, fewer still a $10B training cost, but it’s hard to imagine a world where any companies are building $100B models. However, algorithmic improvements and investment in datasets may well drive improvements without driving up compute costs. The only high confidence prediction you can make in this space is that it’s likely model improvement will double one or two more times over the next 3 years, after which it might continue doubling at that rate or it might plateau at that level of performance: either outcome is plausible.

For some decisions, there’s a strategic imperative to get it right from the beginning. For example, migrating from AWS to Azure is very expensive due to the degree of customization and lock-in. However, LLMs don’t appear to be in this category. Talking with industry peers, the majority of companies are experimenting with a variety of models from Anthropic, OpenAI and elsewhere (e.g. Mistral). Behaviors do vary across models, but it’s also true that behavior of existing models varies over time (e.g. GPT 3.5 allegedly got “lazier” over time), which means the overhead of dealing with model differences is unavoidable even if you only adopt one. Altogether, vendor lock-in for models is very low from a technical perspective, although there is some lock-in created by regulatory overhead, for example it’s potentially painful to update your Data Processing Agreement multiple times, combined with the notification delay, to support multiple model vendors.

Although there’s an ongoing investment boom in artificial intelligence, most scaled technology companies are still looking for ways to leverage these capabilities beyond the obvious, widespread practices like adopting Github Copilot. For example, Stripe is investing heavily in LLMs for internal productivity, including presumably relying on them to perform some internal tasks that would have previously been performed by an employee such as verifying a company’s website matches details the company supplied in their onboarding application, but it’s less clear that they have yet found an approach to meaningfully shift their product, or their product’s user experience, using LLMs.

Looking at ridesharing companies more specifically, there don’t appear to be any breakout industry-specific approaches either. Uber is similarly adopting LLMs for internal productivity, and some operational efficiency improvements as documented in their August, 2023 post describing their internal developer and operations productivity investments using LLMs and May, 2024 post describing those efforts in more detail.


That's all for now! Hope to hear your thoughts on Twitter at @lethain!


This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
Will Larson · 77 Geary St · co Calm 3rd Floor · San Francisco, CA 94108-5723 · USA

Email Marketing Powered by Mailchimp

Older messages

Load-bearing / Career-minded / Act Two rationales @ Irrational Exuberance

Wednesday, May 8, 2024

Hi folks, This is the weekly digest for my blog, Irrational Exuberance. Reach out with thoughts on Twitter at @lethain, or reply to this email. Posts from this week: - Load-bearing / Career-minded /

Constraints on giving feedback. @ Irrational Exuberance

Wednesday, May 1, 2024

Hi folks, This is the weekly digest for my blog, Irrational Exuberance. Reach out with thoughts on Twitter at @lethain, or reply to this email. Posts from this week: - Constraints on giving feedback.

Notes on how to use LLMs in your product. @ Irrational Exuberance

Saturday, April 13, 2024

Hi folks, This is the weekly digest for my blog, Irrational Exuberance. Reach out with thoughts on Twitter at @lethain, or reply to this email. Posts from this week: - Notes on how to use LLMs in your

Ex-technology companies. @ Irrational Exuberance

Wednesday, March 27, 2024

Hi folks, This is the weekly digest for my blog, Irrational Exuberance. Reach out with thoughts on Twitter at @lethain, or reply to this email. Posts from this week: - Ex-technology companies. Ex-

Leadership requires taking some risk. @ Irrational Exuberance

Wednesday, March 20, 2024

Hi folks, This is the weekly digest for my blog, Irrational Exuberance. Reach out with thoughts on Twitter at @lethain, or reply to this email. Posts from this week: - Leadership requires taking some

You Might Also Like

Maximizing Sponsorships: Lessons from a Coaching Session with Justin Moore

Monday, June 24, 2024

A castle out of a fairy tale in Potsdam, Germany Hi Reader, Here are three updates from me before we dive into the newsletter: 🙋‍♀️ Did a fun AMA on Reddit about what I learned writing the Forte Labs

[Shiny Dime] How to Revise with Your Shiny Dime 

Monday, June 24, 2024

Week 7: Learn how to write with objectivity, using your Shiny Dime as a razor for revising your work. Write of Passage logo transparent-1 The Shiny Dime Challenge A Shiny Dime is a specific and

•  Author Promo • Kindle Vella Ads via Social Media •

Sunday, June 23, 2024

Promote your Vella episodes. We want to help you get your VELLA series out on front of readers. Our VELLA SERIES PROMO Packages can get you there! VELLA SERIES PROMOTIONS by ContentMo We want to help

Best ways to beat depression (NEW RESEARCH) ⋆ AI of the week ⋆ Uncanny Valley

Sunday, June 23, 2024

Dancing tops new depression-fighting strategies, Anthropic's Claude 3.5 Sonnet leads AI advancements, and the uncanny valley hypothesis explores our evolutionary caution toward near-human entities

Landing page optimization isn't guesswork

Sunday, June 23, 2024

Use real audience insight ‌ ‌ ‌ Don't guess what you should be optimizing on your landing pages. Instead, use real audience insights to analyze and evaluate landing pages to know exactly what needs

My New Favorite Writing Process

Sunday, June 23, 2024

Plus, how to approach your dream customers ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Career wisdom from Brazil sustainability leader, Mentorship with LinkedIn top voice, Pacific political coordinator at ​Greenpeace

Sunday, June 23, 2024

The Bloom Issue #175, June 23 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

🧙‍♂️ [Wizard’s Guild] Answering your questions...

Sunday, June 23, 2024

What is sponsorship coaching? ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Food for Agile Thought #448: Reclaiming Agile’s Core Values, Product Pitfalls, Splitting User Stories

Sunday, June 23, 2024

Also: SCREAM! Spotting Failure Early, Closing Loops, Amoeba Management ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

The new status symbols

Saturday, June 22, 2024

Bye bye, Mercedes Benz ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌