Astral Codex Ten - Willpower, Human and Machine
Two paragraphs from the mesa-optimizers post, which I quoted again in the adaptation-executors post:
And:
These posts both focus on the difference between two ways that a higher-level optimizer (evolution, gradient descent) can train an intelligence: instincts vs. planning. Probably the distinction is messier in real life, and there are lots of different sub-levels. But both posts share this idea of drives getting implemented at different levels of consequentialism. How does this relate to willpower? It sure feels like one could tell a story where “I” “am” “the planning module” of my mind. I come up with kind-of-consequentialist, long-term plans for achieving goals represented at a high level of abstraction. Then I fight against various instincts represented at lower levels of abstraction. The winner depends on a combination of hard-coded rules, and on which of us (the planning module vs. the lower-level instincts) have been better at getting reinforced in the past. I don’t know how true this story is. “I am the planning module” seems not exactly the same as “I am the global workspace” or “I am a sampling from a probability distribution coherent enough to create working memory out of” (though it doesn’t really contradict those, either). Maybe the “I” of willpower/agency isn’t exactly the same as the “I” of conscious access? After all, the I of conscious access can clearly feel the desire to enact instinctual drives (eg binge on Doritos), even if the I of agency is trying to exert willpower to avoid doing it. But this generally fits my current best guess at how willpower works. One corollary of this model is that future AIs may suffer weakness of will, the same as humans. Suppose an AI is trained on some task through gradient descent. It first learns the equivalent of “intuitive”/”instinctual” hacks and “reflexes” for doing the task. Later (if the mesa-optimizer literature is right), some of these combine/evolve into a genuine “consequentialist” “agent” or planning module, which is “superimposed upon” the original instincts. But the planning module will start out less effective than the original instincts at most things, and the overall mind design will have to come up with a policy for when to use the instincts vs. the planning module. At the beginning, this will be heavily weighted in favor of the instincts. Later, as the planning module gets better, with enough training it should learn to favor the planning module more. But lots of things happen with “enough” training, and real AIs could potentially still have situations where their agentic parts defer to their instinctual parts. Many stories of AI risk focus on how single-minded AIs are: how they can focus literally every action on the exact right course to achieve some predetermined goal. Such single-minded AIs are theoretically possible, and we’ll probably get them eventually. But before that, we might get AIs that have weakness of will, just like we do. You’re a free subscriber to Astral Codex Ten. For the full experience, become a paid subscriber. |
Key phrases
Older messages
Open Thread 225
Monday, May 23, 2022
...
Your Book Review: Making Nature
Friday, May 20, 2022
Finalist #2 of the Book Review Contest
Lavender's Game: Silexan For Anxiety
Wednesday, May 18, 2022
...
Link: Troof On Nootropics
Tuesday, May 17, 2022
...
Contra Dynomight On Sexy In-Laws
Monday, May 16, 2022
...
You Might Also Like
Rejection Dust
Thursday, March 28, 2024
Are you, like Toni Morrison, looking for something more interesting to read? Rejection Dust By Caroline Crampton • 28 Mar 2024 View in browser View in browser The Dust Of God Sam Kriss | Numb At The
🎙️ Meet the Secret Weapon Behind Sci-Fi’s Biggest Franchise
Thursday, March 28, 2024
Plus: A novel anti-aging treatment revitalized mice immune systems — will it work in humans?
Squeezed by African Coups, Biden Cozies Up to the World’s Worst Dictator
Thursday, March 28, 2024
Famous for its repression and torture, Teodoro Obiang's Equatorial Guinea got an aid delivery from US Special Operations forces. Most Read Elon Musk Fought Government Surveillance — While Profiting
Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
Thursday, March 28, 2024
I watched 15 hours of COVID origins arguments so you don't have to - but you should! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
SIROTA’S SIGNALS: This Graph Explains The Discontent
Thursday, March 28, 2024
Plus, a nonprofit health care system secretly becomes a debt collector, regulators may help you find cheaper credit cards, and Big Tech's plan to keep preying on kids. SIROTA'S SIGNALS: This
Is Biden on track for defeat? The debate, explained.
Thursday, March 28, 2024
Plus: How happy are you? What will SBF's sentence be? And more. March 28, 2024 View in browser Good morning! If you've consumed any kind of news about the 2024 presidential election cycle
Boat Probe, Opening Day, and a Cadbury Raccoon
Thursday, March 28, 2024
Facts, without motives.
These 50 companies have donated over $23 million to election deniers since January 6, 2021
Thursday, March 28, 2024
Donald Trump lost the 2020 election. Then, according to the report of the bipartisan January 6 Commission, Trump engaged in a "multi-part conspiracy to overturn the lawful results of the 2020
Numlock News: March 28, 2024 • Orcas, Visas, Dragons
Thursday, March 28, 2024
By Walt Hickey Visas As part of the budget signed into law on Saturday, $50 million has been allocated to the State Department to cut down on the passport backlog and reduce the long wait times for
☕️ Not so rational actors
Thursday, March 28, 2024
Robinhood is playing the long game with its new credit card... March 28, 2024 View Online | Sign Up | Shop Morning Brew PRESENTED BY Aura Health Good morning. It's Opening Day for Major League