Astral Codex Ten - Willpower, Human and Machine
Two paragraphs from the mesa-optimizers post, which I quoted again in the adaptation-executors post:
And:
These posts both focus on the difference between two ways that a higher-level optimizer (evolution, gradient descent) can train an intelligence: instincts vs. planning. Probably the distinction is messier in real life, and there are lots of different sub-levels. But both posts share this idea of drives getting implemented at different levels of consequentialism. How does this relate to willpower? It sure feels like one could tell a story where “I” “am” “the planning module” of my mind. I come up with kind-of-consequentialist, long-term plans for achieving goals represented at a high level of abstraction. Then I fight against various instincts represented at lower levels of abstraction. The winner depends on a combination of hard-coded rules, and on which of us (the planning module vs. the lower-level instincts) have been better at getting reinforced in the past. I don’t know how true this story is. “I am the planning module” seems not exactly the same as “I am the global workspace” or “I am a sampling from a probability distribution coherent enough to create working memory out of” (though it doesn’t really contradict those, either). Maybe the “I” of willpower/agency isn’t exactly the same as the “I” of conscious access? After all, the I of conscious access can clearly feel the desire to enact instinctual drives (eg binge on Doritos), even if the I of agency is trying to exert willpower to avoid doing it. But this generally fits my current best guess at how willpower works. One corollary of this model is that future AIs may suffer weakness of will, the same as humans. Suppose an AI is trained on some task through gradient descent. It first learns the equivalent of “intuitive”/”instinctual” hacks and “reflexes” for doing the task. Later (if the mesa-optimizer literature is right), some of these combine/evolve into a genuine “consequentialist” “agent” or planning module, which is “superimposed upon” the original instincts. But the planning module will start out less effective than the original instincts at most things, and the overall mind design will have to come up with a policy for when to use the instincts vs. the planning module. At the beginning, this will be heavily weighted in favor of the instincts. Later, as the planning module gets better, with enough training it should learn to favor the planning module more. But lots of things happen with “enough” training, and real AIs could potentially still have situations where their agentic parts defer to their instinctual parts. Many stories of AI risk focus on how single-minded AIs are: how they can focus literally every action on the exact right course to achieve some predetermined goal. Such single-minded AIs are theoretically possible, and we’ll probably get them eventually. But before that, we might get AIs that have weakness of will, just like we do. You’re a free subscriber to Astral Codex Ten. For the full experience, become a paid subscriber. |
Older messages
Open Thread 225
Monday, May 23, 2022
...
Your Book Review: Making Nature
Friday, May 20, 2022
Finalist #2 of the Book Review Contest
Lavender's Game: Silexan For Anxiety
Wednesday, May 18, 2022
...
Link: Troof On Nootropics
Tuesday, May 17, 2022
...
Contra Dynomight On Sexy In-Laws
Monday, May 16, 2022
...
You Might Also Like
Musk Has Triggered A Corporate Deregulation Bomb
Monday, March 10, 2025
A Delaware bill would award Elon Musk $56 billion, shield corporate executives from liability, and strip away voting power from shareholders. Forward this email to others so they can sign up “
☕ Can’t stop, won’t stop
Monday, March 10, 2025
Why DeepSeek hasn't slowed Nvidia's roll. March 10, 2025 View Online | Sign Up Tech Brew Presented By Notion It's Monday. So much is happening all the time, so you'd be forgiven for
Trump's war on the First Amendment
Monday, March 10, 2025
Plus: Giant white houses everywhere, a woman in chains, and love. View this email in your browser March 10, 2025 Trump, in a navy suit and red tie, is seen from the shoulders up. His mouth is open in
Veterans Administration therapists forced to provide mental health counseling in open cubicles
Monday, March 10, 2025
As part of the Trump administration's frenzied push to end remote work arrangements for federal government workers, the Veterans Administration (VA) is forcing therapists to provide mental health
Armed Man Shot Near White House, Russian Spy Ring, and Jet Lightning
Monday, March 10, 2025
Secret Service agents shot and wounded an “armed man” a block from the White House shortly after midnight Sunday while President Trump was away for the weekend. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Numlock News: March 10, 2025 • Crater, Mickey 17, Hurricane
Monday, March 10, 2025
By Walt Hickey ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Open Thread 372
Monday, March 10, 2025
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
☕ Spending spree
Monday, March 10, 2025
European markets are outpacing the US... March 10, 2025 View Online | Sign Up | Shop Morning Brew Presented By Tubi Good morning, and Happy Mario Day (MAR10). Traditional celebrations include: reckless
Surprise! People don't want AI deciding who gets a kidney transplant and who dies or endures years of misery [Mon Mar 10 2025]
Monday, March 10, 2025
Hi The Register Subscriber | Log in The Register Daily Headlines 10 March 2025 AI Surprise! People don't want AI deciding who gets a kidney transplant and who dies or endures years of misery
How to Keep Providing Gender-Affirming Care Despite Anti-Trans Attacks
Sunday, March 9, 2025
Using lessons learned defending abortion, some providers are digging in to serve their trans patients despite legal attacks. Most Read Columbia Bent Over Backward to Appease Right-Wing, Pro-Israel