What Can Fetish Research Tell Us About AI?
I. Arguing about gender is like taking OxyContin. There can be good reasons to do it. But most people don’t do it for the good reasons. And even if you start doing it for good reasons, you might get addicted and ruin your life. Walk through San Francisco if you want to see people who ruined their lives with opioids; browse Substack to get a visceral appreciation of the dangers of arguing about gender. Still, I’ve been debating autogynephilia fetishes with Michael Bailey, tailcalled, Zack Davis, and Aella (Bailey and Davis think they’re deeply involved in transgender; tailcalled, Aella and I mostly don’t); I’ve also studied BDSM and lactation fetishes, and Aella has done even more fetish-ology work. In a world that might be on the verge of radical, even unimaginable changes, how do we justify spending time on such an unsavory field? The real answer is - we don’t justify it. I’m easily nerd-sniped just like everyone else, and I assume the same is true of Aella, tailcalled, etc. This post is about a fake answer which I think is funny, but which also has just enough truth to be worth thinking about: I think fetish research can help us understand AI and AI alignment. II. We try to explain AI alignment by analogy to human alignment. Evolution “created” humans. Its “goal” is for humans to spread their genes by (approximately) having as many children as possible. It couldn’t directly communicate that goal to humans - partly because it’s an abstract concept that can’t talk, and partly because for most of biological history it was working with lemurs and ape-men who couldn’t understand words anyway. Instead, it tried to give us instincts that align us with that goal. The most relevant instinct is sex: most humans want to have sex, an action that potentially results in pregnancy, childbearing, and genes being spread to the next generation. This alignment strategy succeeded well enough that humans populations remain high as of 2023. We’ve talked before about a major failure: humans can invent contraception. Evolution’s main alignment strategy was totally unprepared for this. It made us interested in a certain type of genital friction, which was a good proxy for its goal in the ancestral environment. But once we became smarter, we got new out-of-training-distribution options available, and one of those was inventing contraception so that we could get the genital friction without the kids. This is a big part of why average-children-per-couple is declining from 8+ in eg pioneer times to ~1.5 in rich countries today, even though modern rich people have more child-rearing resources available than the pioneers. Another major alignment failure is porn. Giving evolution a little more credit, it didn’t just make people want genital friction - if that had been the sole imperative, we would have died out as soon as someone inventing the dildo/fleshlight. People want genital friction associated with attractive people and certain emotions relating to complex relationships. But now we can take pictures of attractive people and write stories that evoke the complex emotions, while using a dildo/fleshlight/hand to provide the genital friction, and that does substitute for sex pretty well. There’s still debate over whether porn makes people less likely to go out and form real relationships, but it’s at least plausibly another factor in the rich-country fertility decline. At the very least it doesn’t scream “well-thought-out alignment strategy robust to training-vs-deployment differences”. But these are boring examples. These are like 2015 - level alignment concerns, from back when we thought the big problem was AIs seizing control of their reward centers or something. I think we might genuinely be able to avoid problems shaped like these. Unlike evolution, which had to work with lemurs, even weak GPT-level modern AIs are able to understand language and complicated concepts; we can tell them to want children instead of using genital friction as a proxy. 2023 alignment concerns are more about failed generalization - that is, about fetishes. III. Evolution’s alignment problem isn’t just that humans have learned to satiate their libido in ways other than procreative sex. It’s that some humans’ libidos are fundamentally confused. For example, some men, instead of wanting to have sex with women, mostly want to spank them, or be whipped by them, or kiss their feet, or dress up in their clothes. None of these things are going to result in babies! You can’t trivially blame this on the shift from training to deployment (ie the environment of evolutionary adaptedness to the modern world) - women had feet in the ancestral environment too. This is a different kind of failure. Here’s a simple story of fetish formation: evolution gave us genes that somehow unfold into a “sex drive” in the brain. But the genome doesn’t inherently contain concepts like “man”, “woman”, “penis”, or “vagina”. I’m not trying to make a woke point here: the genome is just a bunch of the nucleotides A, T, C, and G in various patterns, but concepts like “man” and “woman” are learned during childhood as patterns of neural connections. We assume that the nucleotides are a program telling the body to do useful things, but that has to be implemented through deterministic pathways of proteins and the brain’s neural connections are too complex to trivially influence that way (see here for more). The genome probably contains some nucleotides that are supposed to refer to the concepts “man” and “woman” once the brain gets them, but there’s are a lot of fallible proteins in between those two levels. So the simple story of fetish formation is that the genome contains some message written in nucleotides saying “have procreative sex with adults of the opposite sex as you”, some galaxy-brained Rube Goldberg plan for translating that message into neural connections during childhood or adolescence, and sometimes the plan fails. Here are some zero-evidence just-so-story speculations for how various fetishes might form, more to give you an idea what I’m talking about than because I claim to have useful knowledge on this topic:
Combine this with equivalent animal “fetishes” - things like beetles species where the females have red dots on their backs, and the males try to mate with anything that has a red dot - and you get a picture where evolution tries to communicate a lot of contigent features of sex in the hopes that one of them will stick, then tells you to be attracted to whatever is most associated with those features. At least for men, I think the features communicated in the genomic message are simple things like curves and thrusting and genitals and smooth skin, plus something that somehow picks out the concept of “woman” (except in 3% of the male population, where it picks out the concept of “men” instead, plus an other 3% where it doesn’t pick out a sex at all). Real procreative sex usually matches enough of features of the genomic message to be attractive to most people, but if the original triggers were associated with some contingent characteristics, the brain might misinterpret that as part of the target - for example, if it was a cartoon animal, the brain might think the target includes cartoon animals. Other times, something that isn’t procreative sex matches the genomic message closely enough to be misinterpreted as the center of the target (eg getting whipped); usually procreative sex is somewhere in the target space, but maybe not the exact center, and a few people have such strong fetishes that procreative sex doesn’t register as erotic at all. The process of forming the category “sexually attractive things” is just a special case of the process of forming categories at all. I discuss the formation of categories like “happiness” and “morality” in The Tails Coming Apart As Metaphor For Life. Society feeds us some labeled data about what is good or bad - for example, we might see someone commit murder on TV, and our parents tell us “No! That’s bad! Don’t do that!” (and the other TV characters hate and punish that character). Then we try to extrapolate such incidents to a broader moral system. If we’re philosophers, we might go further and try to formally describe that moral system, eg Kantianism, utilitarianism, divine command theory, natural law, etc. All of these correctly predict the training data (eg “murder is bad”) while having different opinions on out-of-distribution environments. Which one you choose is just a function of some kind of mysterious intellectual preference for how to generalize inherently ungeneralizeable things - what I previously described as “extrapolating a three-dimensional shape from its two-dimensional reinforcement-learning shadow”. Fetishes are the same way. Here the evolutionary message provides semi-labeled data, giving people weird feelings when they see certain kinds of curvy, smooth-skinned people. Then people try to generalize that into an idea of what’s sexy. Usually their category is centered (in the sense that the category “bird” is centered around “sparrow” and not “ostrich”) around something close to procreative heterosexual sex. Other times they generalize in some very unexpected way, and are only attracted to cartoon mice. I think if we understood the laws of generalization, this would make sense. It would seem like a reasonable mistake that someone using Occam’s Razor and all the rest of the information-theoretic toolkit for generalization could make. But we don’t really understand those laws beyond faint outlines, so instead we’re reduced to YKINMKBYKIOK. IV. How does this relate to AI alignment? First, might the genome’s surprising ability to send a message in nucleotides that gets translated into brain wires help us encode something in a neural net? I think probably not. First, this method seems very unreliable. But second, it’s solving a problem we don’t have. Evolution controls the genetic code but not the reinforcement environment. Humans have the option of training AIs directly, a much higher bandwidth and less lossy communication channel. But it’s still fascinating that evolution accomplishes this difficult thing at all. Is there some sense in which evolution “solved the interpretability problem”, such that it can pick out connections in a neural net and edit them to try to get a message across? If so, figuring out how might help solve our interpretability problem, even though once we had a solution we’d want to exploit it differently from the way evolution did. Second, what do fetishes teach us about generalization? Assuming that the evolutionary message operates by reinforcing people (with pleasurable sexual arousal) when they see certain sex-related characteristics, what can we learn from the fact that some people generalize this reinforcement into the intended concept, and other people misgeneralize it into fetishes? For example: autistic people seem to have more fetishes than neurotypicals; you can find studies showing this, it’s confirmed by the SSC survey, and it’s further confirmed by my anecdotal experience around autistic people. Is this because something about the autistic ultralocal processing style favors misgeneralization? Is there some equivalent in AI parameters that could make them more or less autistic, and would that change how correct (or maybe how consistent) their category generalization is? I think this is an actually potentially fruitful line of research. Most of the really neat results will come from the next generation of AIs, but looking at human fetishes can give us more than zero useful information. You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Open Thread 290
Monday, August 21, 2023
...
Your Book Review: The Mind Of A Bee
Friday, August 18, 2023
Finalist #14 In The Book Review Contest (cw: insect pics)
Bride Of Bay Area House Party
Thursday, August 17, 2023
...
In Defense Of Describable Dating Preferences
Wednesday, August 16, 2023
...
Links For August 2023
Monday, August 14, 2023
...
You Might Also Like
Up in Flames
Saturday, January 11, 2025
January 11, 2025 The Weekend Reader Required Reading for Political Compulsives 1. Trump Won't Get the Inauguration Day He Wanted The president-elect is annoyed that flags will be half-staff for
YOU LOVE TO SEE IT: Biden’s Grand Finale
Saturday, January 11, 2025
Biden drills down on offshore drilling, credit scores get healthier, social security gets a hand, and sketchy mortgage lenders are locked out. YOU LOVE TO SEE IT: Biden's Grand Finale By Sam Pollak
11 unexpected things you can put in the dishwasher
Saturday, January 11, 2025
(And 7 things you should keep far away from there) View in browser Ad The Recommendation January 11, 2025 Ad 11 things that are surprisingly dishwasher-safe An open dishwasher with a variety of dishes
Weekend Briefing No. 570
Saturday, January 11, 2025
Black Swan Threats in 2025 -- Why Boys Don't Go To College -- US Government's Nuclear Power Play ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Your new crossword for Saturday Jan 11 ✏️
Saturday, January 11, 2025
View this email in your browser Take a mental break with this week's crosswords: We have six new puzzles teed up for you this week. Play the latest Vox crossword right here, and find all of our new
Firefighters Make Progress, Water Rankings, and Ohio St. Wins
Saturday, January 11, 2025
Multiple wildfires continued to burn in Southern California yesterday, with officials reporting at least 10 deaths. Over 10000 homes across 27000 acres have burned, and 20 suspected looters have been
☕ So many jobs
Saturday, January 11, 2025
So why did stocks fall? January 11, 2025 View Online | Sign Up | Shop Morning Brew Presented By Indacloud Good morning. It's National Milk Day, the one day of the year you're allowed to skim
What A Day: It ain't easy being Greenland
Friday, January 10, 2025
A Greenlandic politician reacts to Trump's threats: “The most crazy thing.” ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Heavily funded Pandion delivery startup closes abruptly in latest logistics industry fallout
Friday, January 10, 2025
Breaking News from GeekWire GeekWire.com | View in browser Pandion, a Bellevue-based delivery startup launched by a former Amazon Air leader during the pandemic-fueled e-commerce boom, informed
The end of the live streamer mega deals
Friday, January 10, 2025
PLUS: Podcasts are still undervalued ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏