Astral Codex Ten - Willpower, Human and Machine
Two paragraphs from the mesa-optimizers post, which I quoted again in the adaptation-executors post:
And:
These posts both focus on the difference between two ways that a higher-level optimizer (evolution, gradient descent) can train an intelligence: instincts vs. planning. Probably the distinction is messier in real life, and there are lots of different sub-levels. But both posts share this idea of drives getting implemented at different levels of consequentialism. How does this relate to willpower? It sure feels like one could tell a story where “I” “am” “the planning module” of my mind. I come up with kind-of-consequentialist, long-term plans for achieving goals represented at a high level of abstraction. Then I fight against various instincts represented at lower levels of abstraction. The winner depends on a combination of hard-coded rules, and on which of us (the planning module vs. the lower-level instincts) have been better at getting reinforced in the past. I don’t know how true this story is. “I am the planning module” seems not exactly the same as “I am the global workspace” or “I am a sampling from a probability distribution coherent enough to create working memory out of” (though it doesn’t really contradict those, either). Maybe the “I” of willpower/agency isn’t exactly the same as the “I” of conscious access? After all, the I of conscious access can clearly feel the desire to enact instinctual drives (eg binge on Doritos), even if the I of agency is trying to exert willpower to avoid doing it. But this generally fits my current best guess at how willpower works. One corollary of this model is that future AIs may suffer weakness of will, the same as humans. Suppose an AI is trained on some task through gradient descent. It first learns the equivalent of “intuitive”/”instinctual” hacks and “reflexes” for doing the task. Later (if the mesa-optimizer literature is right), some of these combine/evolve into a genuine “consequentialist” “agent” or planning module, which is “superimposed upon” the original instincts. But the planning module will start out less effective than the original instincts at most things, and the overall mind design will have to come up with a policy for when to use the instincts vs. the planning module. At the beginning, this will be heavily weighted in favor of the instincts. Later, as the planning module gets better, with enough training it should learn to favor the planning module more. But lots of things happen with “enough” training, and real AIs could potentially still have situations where their agentic parts defer to their instinctual parts. Many stories of AI risk focus on how single-minded AIs are: how they can focus literally every action on the exact right course to achieve some predetermined goal. Such single-minded AIs are theoretically possible, and we’ll probably get them eventually. But before that, we might get AIs that have weakness of will, just like we do. You’re a free subscriber to Astral Codex Ten. For the full experience, become a paid subscriber. |
Older messages
Open Thread 225
Monday, May 23, 2022
...
Your Book Review: Making Nature
Friday, May 20, 2022
Finalist #2 of the Book Review Contest
Lavender's Game: Silexan For Anxiety
Wednesday, May 18, 2022
...
Link: Troof On Nootropics
Tuesday, May 17, 2022
...
Contra Dynomight On Sexy In-Laws
Monday, May 16, 2022
...
You Might Also Like
☕ Great chains
Wednesday, January 15, 2025
Prologis looks to improve supply chain operations. January 15, 2025 View Online | Sign Up Retail Brew Presented By Bloomreach It's Wednesday, and we've been walking for miles inside the Javits
Pete Hegseth's confirmation hearing.
Wednesday, January 15, 2025
Hegseth's hearing had some fireworks, but he looks headed toward confirmation. Pete Hegseth's confirmation hearing. Hegseth's hearing had some fireworks, but he looks headed toward
Honourable Roulette
Wednesday, January 15, 2025
The Honourable Parts // The Story Of Russian Roulette Honourable Roulette By Kaamya Sharma • 15 Jan 2025 View in browser View in browser The Honourable Parts Spencer Wright | Scope Of Work | 6th
📬 No. 62 | What I learned about newsletters in 2024
Wednesday, January 15, 2025
“I love that I get the chance to ask questions and keep learning. Here are a few big takeaways.” ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
⚡️ ‘Skeleton Crew’ Answers Its Biggest Mystery
Wednesday, January 15, 2025
Plus: There's no good way to adapt any more Neil Gaiman stories. Inverse Daily The twist in this Star Wars show was, that there was no twist. Lucasfilm TV Shows 'Skeleton Crew' Finally
I Tried All The New Eye-Shadow Sticks
Wednesday, January 15, 2025
And a couple classics. The Strategist Beauty Brief January 15, 2025 Every product is independently selected by editors. If you buy something through our links, New York may earn an affiliate commission
How To Stop Worrying And Learn To Love Lynn's National IQ Estimates
Wednesday, January 15, 2025
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
☕ Olympic recycling
Wednesday, January 15, 2025
Reusing wi-fi equipment from the Paris games. January 15, 2025 View Online | Sign Up Tech Brew It's Wednesday. After the medals are awarded and the athletes go home, what happens to all the stuff
Ozempic has entered the chat
Wednesday, January 15, 2025
Plus: Hegseth's hearing, a huge religious rite, and confidence. January 15, 2025 View in browser Jolie Myers is the managing editor of the Vox Media Podcast Network. Her work often focuses on
How a major bank cheated its customers out of $2 billion, according to a new federal lawsuit
Wednesday, January 15, 2025
An explosive new lawsuit filed by the Consumer Financial Protection Bureau (CFPB) alleges that Capital One bank cheated its customers out of $2 billion. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏