Tools, Agents, Interfaces, and Synthetic Luck: AI Interaction Models

By Byrne Hobart • 28 Oct 2024

View in browser

In this issue:

Tools, Agents, Interfaces, and Synthetic Luck: AI Interaction Models—You're using AI in many different ways, and the more aware of it you are the more likely you are to benefit from it.
Lyft Got Fined For Telling the Truth and is Now Compelled to Lie to Drivers—Not a great decision from the FTC.
Ozempic and Signaling—A drug so good one of its salient downsides is that it makes good health a less meaningful signal by making it more common.
Wargaming and Pivots—The joy of finding out you were building for the wrong customer.
Failure Modes—When a product you use gets acquired by cost-cutting PE firms, it's the worst possible outcome except for all the others.
ESG, Etc.—If you make your brand synonymous with something competitors can copy, they'll copy it until it's too popular and there's a backlash.

Today's issue of The Diff is brought to you by our sponsor, Manifold Markets.

The Diff October 28th 2024

12:35 • Click to play audio

Tools, Agents, Interfaces, and Synthetic Luck: AI Interaction Models

You're an early adopter of AI, whether you like it or not, because AI is being deployed in more or less visible ways in all sorts of products that you use.

And we're early.

But watching this transition means staying aware of the four ways end users actually interact with AI, all of which are somewhat overlapping. The taxonomy is:

Tools are the obvious way we use it, specifically LLM-based chatbots, code completion tools, generative image/audio/video services, etc. What sets these apart is that most of the relevant metadata is outside the AI-specific application—there's what the user wants (and what they think about the output) andthere's external data—still mostly human-generated, or generated from automatic processes that were designed by humans. So it's a self-contained approach. If tools were the only way we used AI, AI-based existential risk would be a much less interesting topic.^[1]
Agents complicate things immensely. In a way, every AI tool is an "agent," since it's going to interpret your request, sometimes passing through multiple filters, before fulfilling it.^[2] But what sets agents apart is that they operate in multiple contexts, and they iterate until they're done or have given up. Anthropic launched a nice one last week, and Google is working on one, too ($, The Information). OpenAI's O1 is not strictly an agent, since it doesn't interact outside of its sandbox, but it does have the agency flavor of being able to break a broad task into subtasks that use different tools (to answer the question "how many times does the letter 'r' appear in 'strawberry'," a good first step is to think of it as characters in a string).
Interfaces: Most of what you do with a computer can be modeled as, and often literally is, a series of database updates—create, read, update, or delete describes most of what we're doing, though there is of course plenty of interesting detail under the surface. We have various ways of doing this, including different tools we might use on the same file. (Maybe you do your data analysis in Excel, but only after the source CSV has been through a few iterations of Pandas filtering and aggregation.)^[3] An evolution towards interface is the natural result of LLMs' great strength, that they're a great place to start when you know what outcome you want but don't know what the first step is. Having a smart tool that knows which other tools—either comparatively dumb ones (your inbox, you calendar) or more specialized, smart ones (an LLM trained exclusively on your company's own source code, or one trained on one professional domain)—it needs to accomplish the job. AI-as-interface works backward from what you ask it to do and works forward from what its capabilities are, and solves problems somewhere in the middle.
The most ubiquitous and thus most underrated way people interact with AI is that it's a machine for manufacturing luck in software interactions. There's always some randomness about which email happens to be at the top of your inbox when you check, which ad or post you see in your feed, which long-tail search result comes up for some query, or what pops up in the "related products" section of a site. The tools for shrinking the scope of this randomness keep improving, and there's a feedback loop where the sites with better recommendations get higher usage, which means more training data for the next round of recommendations.^[4] So when the UberEats push notification hits the moment your stomach growls, or the Waymo reroutes itself, not based on past traffic patterns or current traffic but based on a model of how other drivers will respond to traffic (something Google can probably measure better than everyone else), it's AI at work.^[5] When Reels went from a surreal cross-section of TikTok cross-posts to a reasonable source of short-form video, that was the direct result of Meta buying more GPUs than it strictly needed; they'd eventually be repurposed for more generativeAI-flavored tasks.

Many people are using, or will be using, all of the above. But usage follows different skill curves. This is most legible for tools; after a while, it's easy to get a sense for what's worth doing manually, what's worth Googling, where Claude or ChatGPT can help, etc. Adding agents to the mix will continue the process whereby every knowledge worker becomes a team lead whose team is mostly AI.

And every step of this adoption means the disconcerting sensation that you've just explicitly identified a skill of yours that was once a valuable contributor to your wage premium and is now priced in watt-hours. But don't despair: consider the compensation of the average person who mostly executes decisions made by other people, and compare it to the compensation of people who are mostly paid to make decisions executed by others. It's a very good idea to lean into your own obsolescence if it means promoting yourself to a version of you that does a lot more of the interesting work, delegates the rest, and uses some of the time savings to identify new efficiencies.

As you move down the list, there's a qualitative change: AI tools are very much about you telling the AI exactly what to do, and deciding exactly which parts of your mental work are going to be automated and which need the human touch. But every stage past this means deferring more to the algorithms, until you get to the invisible function of AI—shaping all of the data inputs you use that you haven't explicitly sought on your own.

That can be valuable, at least in context; recommendation algorithms have to have a good hit-rate on average to make it out of testing and into widespread production. But if someone using tools and agents is constantly introspecting about the comparative advantage of humans and algorithms, someone using AI as an interface or viewing AI-generated content recommendations has fully outsourced the introspection part. There is still someone thinking about your media diet and how to optimize it, but the main thing they're optimizing for is user-hour retention as a proxy for potential revenue. Outsourcing a growing share of decisions to an increasingly powerful set of tools whose workings are even more opaque to you than they are to the owners of those tools is just not a great way to align reading, watching, and spending habits with your own long-term interests.

So it's an increasingly good idea to take an Amish-style barbell approach to technology: wholehearted adoption but only following due consideration. The Amish are thinking of the feedback loop between technology, behavior, and civilization, but with software, especially AI software, there's a tighter feedback loop between user behavior and what tools offer them. This shift is still net good, but it's one with unavoidable risks. The more powerful such tools get, the more they become a broad interface between a small number of people who use them quite effectively and a large number of people whose behavior is predicted by and guided by the algorithms they've helped train.

Disclosure: Long META.

There would still be risks; the models know how to make pipe bombs and sarin, even if they won't tell you. ↩︎
I used ChatGPT while talking to a doctor recently, and ChatGPT was able to explain some of the terms I requested elaboration on without being too snooty when I mis-transcribed acronyms and the like. This is a good micro division of labor; it's a bad use of the doctor's time to spell these things out, but it's a bad use of my time to get less information than I could out of the interaction. This is one of the surprising side effects of AI—you don't have to write for any one audience, because your readers can ask LLMs to expand or compress your words as needed. This could make AI a subsidy to writing, because writers can just write for themselves and know that readers will read the version that's fine-tuned to be accessible to them. ↩︎
The boundary is porous, since the fight for share of workflow is the front line of the war for market share. One reason your data might start out in a Pandas dataframe is that you're going line by line and applying a regular expression to each entry, but Excel has regular expressions now. ↩︎
Pop music and movie color palettes both experienced something close to "model collapse" before the term was even coined; it was easy enough to copy what worked, and push it a little further in that direction, and the result is a lot of artistic creation that looks suspiciously like a linear extrapolation of the last few big hits. Arguably model collapse happened even earlier than that: what is the paucity of ancient sources that we can still read today if not evidence that data inputs slowly lose their weighting if they're long-tail content that's accessed infrequently, and will eventually be deleted entirely. ↩︎
It would be interesting to know if there's some kind of enterprise plan for delivery apps where, on days with particularly nice weather, office workers get a last-minute discount just before they'd be tempted to take a nice long lunch outside. ↩︎

You're on the free list for The Diff. Last week, paying subscribers read about property rights as a primitive ($), Zoom as a company that benefits from AI without betting its future on AI ($), and why higher growth expectations make housing disproportionately expensive ($) (though there are many bad reasons that housing is expensive, too). Upgrade today for full access!

Upgrade Today

A Word From Our Sponsors

Prediction markets are finally growing up. Manifold Markets has launched "sweepcash"—real-money rewards for accurate forecasts—while keeping their original play-money system intact. The dual-currency model is already generating fascinating divergences: their flagship 2024 presidential market shows Trump at 61%, notably higher than their mana play-money market. Wondering where those odds come from? Manifold offers markets on numerous other outcomes, so you can see what assumptions people are making about swing states, polling error, and other factors—you can either see what assumptions the market is making or spot an arbitrage opportunity.

US residents can get in on the election action here.

Try Manifold

Elsewhere

Lyft Got Fined For Telling the Truth and is Now Compelled to Lie to Drivers

Lyft has agreed to pay a $2.1m fine and change its marketing in a settlement with the FTC. There is room for analysis of what incentives this creates, but first, a bit of outrage: the main complaints are:

Lyft said that drivers in some markets could earn "Up to" some hourly amount. But someone might have misread "up to" as "an average of" (as it turns out, Lyft was using the 80th percentile of earnings, which is actually below what I'd expect an "up to" number in an ad to be).
Lyft offered guaranteed earnings for driving in particular periods, like telling drivers they "would make $975 if they completed 45 rides in a weekend." And then Lyft, apparently deceptively, did exactly that. Some drivers may have thought that a guaranteed minimum would be added to their pay instead of guaranteeing some sort of minimum.

The optimistic reading is that this is Mission Accomplished for the FTC. If the biggest anticompetitive abuse, one that needs to be targeted at the Federal level, is a company making accurate statements that someone might misread, then antitrust enforcement has overwhelmingly accomplished its mission.

But that would be taking this more literally; a better reading is that there were complaints from Lyft drivers, the FTC knew that these complaints were meritless, but Lyft knew that a drawn-out process that would allow the FTC to ask pointed questions about what Lyft knew, or thought it knew, about prospective drivers' reading comprehension. It's entirely possible that many of them did misread the ads, but it's very hard for businesses to operate in an environment where strictly truthful claims are forbidden on the grounds that some people aren't equipped to evaluate them. (Note that all of these people have legal permission from the government to operate heavy machinery, specifically a kind of machine that kills 42,000 people in the US annually.)

The ad raises a good question: is the median driver performance relevant? Which median? Drivers are not heterogeneous: some of them sign up for Lyft on a whim, or because they lost another job, or because they live paycheck-to-paycheck and need extra funds for a one-time expense. And then there may be a set of drivers who are working full-time and treating Lyft as their main source of income. More committed drivers, who have accumulated more five-star reviews, are likely to have higher hourly incomes. There will be some part of the driver income distribution that represents them, and it's unlikely for it to be exactly the median.

A last irony in the settlement is that Lyft did not just get fined for saying completely true things. As part of their settlement, they promised to lie to drivers about their future earnings while recruiting them. Specifically, Lyft agreed not to include tips in its earnings estimates in future ads. Tips are earnings, though. There will be people who, as a result of this settlement, accept the second-best job available because the best is Lyft-including-tips but the information they have is on Lyft-excluding-tips. Lyft will adapt their marketing to this, and may get to fight future skirmishes with the FTC over how prominent the messaging about how tips are not included is. One possible net result is that they'd actually raise driver pay and cut tips. That helps them with recruiting at the expense of paying drivers less. In this case, the company has to weigh the interests of its workers against the legal risk of offering them a deal and then adhering to it.

Ozempic and Signaling

The Economist has a fun piece on the long-term very positive consequences of Ozempic, one of which is that it's harder to use thinness as a proxy for conscientiousness, high energy, and other valuable traits, so the wage premium thin people get will decrease ($). If you can control your appetite with a drug that also seems to reduce the appeal of other bad habits, how do people who have naturally high self-control demonstrate this? There are obvious methods like running marathons or having a nice-looking Github contribution graph. But one effect of this change in signaling is that it raises the value of institutions that put people through the wringer. The easier it is to fake some trait, the more valuable the hard-to-fake measures of it are.

Wargaming and Pivots

It's hard to design a simulation of some real-world scenario because what players want in theory is accuracy, but in reality what keeps them playing is fun. And much of the real world, even the high-status parts, is just not designed to be gamified. But there are also some players who really do appreciate, or even obsess over, hyper-accurate simulations. Catering to that audience is how a British game publisher, Slitherine Software, ended up publishing Command: Professional Edition, a military simulation game popular with the actual military ($, WSJ). The game originally expanded to that customer base because it had such comprehensive data on the specifications of various pieces of real-world military equipment. Today, the game's marketing copy does not exactly sound like it's still targeting consumers (among the features: monte carlo mode for having the AI play against itself over and over to test a strategy's range of outcomes, and the option to integrate "your own sensitive/proprietary material" into unit types. It's a validation of the idea that building something really great for a niche audience can end up solving a problem in a different domain.

Failure Modes

Bloomberg has a good profile of software mini-PE firm Bending Spoons, whose model is to find companies that didn't quite reach escape velocity but still have active users, and then figure out just how much those users will pay if the pace at which features get shipped drops enough. This is part of the implicit social contract of using any subscription-based product: choose the right one, and it will keep getting better aside from the ways the product gets worse for early adopters as it caters to a bigger and more normal audience. If you choose the wrong product, it'll either go out of existence entirely or get expensive and worse while sill fulfilling whatever core function it had that prevented yuo from switching in the first place. PE firms partly exist to speed up that cleanup process, by figuring out exactly hwo many people are willing to pay to keep the product they love alive.

ESG, Etc.

BlackRock has moved away from talking about ESG so much ($, FT). It's interesting to look back at why this policy blew up (as a marketing tool, not a trade—ESG mandates tend to filter out fewer big tech companies, so an ESG approach has, for reasons weakly related to its justification, actually improved returns). BlackRock liked ESG because it allowed them to continue offering passive products, but to differentiate themselves from all the other companies doing pure passive. But it was trivial for competitors to launch similar products, so instead of positioning itself as "ESG-friendly," BlackRock had to go for being the most ESG-friendly. And this made them more synonymous with those policies than they really deserved to be. It was a broad trend whose name got attached to the biggest player who participated in it, a common problem for big brands. There are ways to position a product in the market that work better for challengers than incumbents, and sometimes that's only clear in retrospect.

Diff Jobs

Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:

A hyper-growth startup that’s turning customers’ sales and marketing data into revenue is looking for a product engineer with a track record of building, shipping, and owning customer delivery at high velocity. (NYC)
A well funded startup founded by two SpaceX engineers that’s building the software stack for hardware companies is looking for a staff product manager with 5+ years experience building and managing data-intensive products. (LA, Hybrid)
Ex-Ramp founder and team are hiring a high energy, junior full-stack engineer to help build the automation layer for the US healthcare payor-provider eco-system. (NYC)
A well-funded startup that’s building the universal electronic cash system by taking stablecoins from edge cases to the mainstream is looking for a senior full-stack engineer. Experience with information dense front-ends is a strong plus. (NYC, London, Singapore)
An AI startup building tools to help automate compliance for companies in highly regulated industries is looking for a director of information security and compliance with 5+ years of info sec related experience at a software company. Experience with HIPAA, FedRAMP a plus. (NYC)

Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.

Find a Role

If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.

Find Talent with Diff Jobs

Tools, Agents, Interfaces, and Synthetic Luck: AI Interaction Models

Tools, Agents, Interfaces, and Synthetic Luck: AI Interaction Models

A Word From Our Sponsors

Elsewhere

Lyft Got Fined For Telling the Truth and is Now Compelled to Lie to Drivers

Ozempic and Signaling

Wargaming and Pivots

Failure Modes

ESG, Etc.

Diff Jobs

Older messages

Longreads + Open Thread

Tokenomics, Revisited

Longreads + Open Thread

Offshoring and AI Agents

Longreads + Open Thread

You Might Also Like

After inauguration, it's time to talk taxes

Harry's Take 1-15-25 Stocks Look to Break Lower: Another Sign of a Top on December 16

🇺🇸 America's tariff future

It’s a new year, get a new savings account

Private Equity Is Coming for Your 401(k)

This Skateboarding Economist Suggests We Need More Skateparks And Less Capitalism

Elon Musk Dreams, Mode Mobile Delivers

Shaping inflation expectations: the effects of monetary policy

🌎 Another hottest year

Have you seen the Best Cars & Trucks of 2025?