Astral Codex Ten - Willpower, Human and Machine
Two paragraphs from the mesa-optimizers post, which I quoted again in the adaptation-executors post:
And:
These posts both focus on the difference between two ways that a higher-level optimizer (evolution, gradient descent) can train an intelligence: instincts vs. planning. Probably the distinction is messier in real life, and there are lots of different sub-levels. But both posts share this idea of drives getting implemented at different levels of consequentialism. How does this relate to willpower? It sure feels like one could tell a story where “I” “am” “the planning module” of my mind. I come up with kind-of-consequentialist, long-term plans for achieving goals represented at a high level of abstraction. Then I fight against various instincts represented at lower levels of abstraction. The winner depends on a combination of hard-coded rules, and on which of us (the planning module vs. the lower-level instincts) have been better at getting reinforced in the past. I don’t know how true this story is. “I am the planning module” seems not exactly the same as “I am the global workspace” or “I am a sampling from a probability distribution coherent enough to create working memory out of” (though it doesn’t really contradict those, either). Maybe the “I” of willpower/agency isn’t exactly the same as the “I” of conscious access? After all, the I of conscious access can clearly feel the desire to enact instinctual drives (eg binge on Doritos), even if the I of agency is trying to exert willpower to avoid doing it. But this generally fits my current best guess at how willpower works. One corollary of this model is that future AIs may suffer weakness of will, the same as humans. Suppose an AI is trained on some task through gradient descent. It first learns the equivalent of “intuitive”/”instinctual” hacks and “reflexes” for doing the task. Later (if the mesa-optimizer literature is right), some of these combine/evolve into a genuine “consequentialist” “agent” or planning module, which is “superimposed upon” the original instincts. But the planning module will start out less effective than the original instincts at most things, and the overall mind design will have to come up with a policy for when to use the instincts vs. the planning module. At the beginning, this will be heavily weighted in favor of the instincts. Later, as the planning module gets better, with enough training it should learn to favor the planning module more. But lots of things happen with “enough” training, and real AIs could potentially still have situations where their agentic parts defer to their instinctual parts. Many stories of AI risk focus on how single-minded AIs are: how they can focus literally every action on the exact right course to achieve some predetermined goal. Such single-minded AIs are theoretically possible, and we’ll probably get them eventually. But before that, we might get AIs that have weakness of will, just like we do. You’re a free subscriber to Astral Codex Ten. For the full experience, become a paid subscriber. |
Older messages
Open Thread 225
Monday, May 23, 2022
...
Your Book Review: Making Nature
Friday, May 20, 2022
Finalist #2 of the Book Review Contest
Lavender's Game: Silexan For Anxiety
Wednesday, May 18, 2022
...
Link: Troof On Nootropics
Tuesday, May 17, 2022
...
Contra Dynomight On Sexy In-Laws
Monday, May 16, 2022
...
You Might Also Like
Silicon Aristotle
Thursday, November 28, 2024
Who Can Claim Aristotle? // Private Chefs For Silicon Valley's Elite Silicon Aristotle By Caroline Crampton • 28 Nov 2024 View in browser View in browser Who Can Claim Aristotle? Edith Hall | Aeon
How the Pilgrims differed from the Puritans
Thursday, November 28, 2024
+ how to avoid awkwardness at Thanksgiving table
♻️ Gratitude & Joy flow in a cycle
Thursday, November 28, 2024
Fun stuff for you to click on curated with joy by CreativeMornings HQ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
How “Y.O.L.O. Joe” Can Beat The Lame Duck
Thursday, November 28, 2024
Here is what Democrats could actually achieve in the months before Trump takes office. Need a productive political topic to discuss at the Thanksgiving table? Want to impart key facts as you pass the
Trump Cabinet Bomb Threats, Ancient Sandwiches, and a Popsicle Caper
Thursday, November 28, 2024
Several of President-elect Donald Trump's Cabinet nominees and administration appointees faced bomb threats and "swatting" attacks on Tuesday and Wednesday. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
First-ever UEFI bootkit for Linux in the works, experts say [Thu Nov 28 2024]
Thursday, November 28, 2024
Hi The Register Subscriber | Log in The Register Daily Headlines 28 November 2024 KITTY LOOKS AT SCREEN AI GENERATED First-ever UEFI bootkit for Linux in the works, experts say Bootkitty doesn't
On My Mind: Fig Ornaments and Striped Bath Mats
Thursday, November 28, 2024
Plus: Eensy-weensy, teeny-tiny gifts. The Strategist Every product is independently selected by editors. If you buy something through our links, New York may earn an affiliate commission. November 27,
What It’s Like to Be on Trump’s Enemies List
Wednesday, November 27, 2024
Columns and commentary on news, politics, business, and technology from the Intelligencer team. Intelligencer power What It's Like to Be on Trump's Enemies List “Revenge does take time.” Photo-
GeekWire Mid-Week Update
Wednesday, November 27, 2024
Read the top tech stories so far this week from GeekWire Top stories so far this week Microsoft credited with spotting sophisticated Chinese hack that hit telecoms including T-Mobile US officials say a
Thursday Briefing: A fragile cease-fire in Lebanon
Wednesday, November 27, 2024
Plus, a post-election Thanksgiving. View in browser|nytimes.com Ad Morning Briefing: Asia Pacific Edition November 28, 2024 Author Headshot By Gaya Gupta Good morning. We're covering the first day