What Can Fetish Research Tell Us About AI?
I. Arguing about gender is like taking OxyContin. There can be good reasons to do it. But most people don’t do it for the good reasons. And even if you start doing it for good reasons, you might get addicted and ruin your life. Walk through San Francisco if you want to see people who ruined their lives with opioids; browse Substack to get a visceral appreciation of the dangers of arguing about gender. Still, I’ve been debating autogynephilia fetishes with Michael Bailey, tailcalled, Zack Davis, and Aella (Bailey and Davis think they’re deeply involved in transgender; tailcalled, Aella and I mostly don’t); I’ve also studied BDSM and lactation fetishes, and Aella has done even more fetish-ology work. In a world that might be on the verge of radical, even unimaginable changes, how do we justify spending time on such an unsavory field? The real answer is - we don’t justify it. I’m easily nerd-sniped just like everyone else, and I assume the same is true of Aella, tailcalled, etc. This post is about a fake answer which I think is funny, but which also has just enough truth to be worth thinking about: I think fetish research can help us understand AI and AI alignment. II. We try to explain AI alignment by analogy to human alignment. Evolution “created” humans. Its “goal” is for humans to spread their genes by (approximately) having as many children as possible. It couldn’t directly communicate that goal to humans - partly because it’s an abstract concept that can’t talk, and partly because for most of biological history it was working with lemurs and ape-men who couldn’t understand words anyway. Instead, it tried to give us instincts that align us with that goal. The most relevant instinct is sex: most humans want to have sex, an action that potentially results in pregnancy, childbearing, and genes being spread to the next generation. This alignment strategy succeeded well enough that humans populations remain high as of 2023. We’ve talked before about a major failure: humans can invent contraception. Evolution’s main alignment strategy was totally unprepared for this. It made us interested in a certain type of genital friction, which was a good proxy for its goal in the ancestral environment. But once we became smarter, we got new out-of-training-distribution options available, and one of those was inventing contraception so that we could get the genital friction without the kids. This is a big part of why average-children-per-couple is declining from 8+ in eg pioneer times to ~1.5 in rich countries today, even though modern rich people have more child-rearing resources available than the pioneers. Another major alignment failure is porn. Giving evolution a little more credit, it didn’t just make people want genital friction - if that had been the sole imperative, we would have died out as soon as someone inventing the dildo/fleshlight. People want genital friction associated with attractive people and certain emotions relating to complex relationships. But now we can take pictures of attractive people and write stories that evoke the complex emotions, while using a dildo/fleshlight/hand to provide the genital friction, and that does substitute for sex pretty well. There’s still debate over whether porn makes people less likely to go out and form real relationships, but it’s at least plausibly another factor in the rich-country fertility decline. At the very least it doesn’t scream “well-thought-out alignment strategy robust to training-vs-deployment differences”. But these are boring examples. These are like 2015 - level alignment concerns, from back when we thought the big problem was AIs seizing control of their reward centers or something. I think we might genuinely be able to avoid problems shaped like these. Unlike evolution, which had to work with lemurs, even weak GPT-level modern AIs are able to understand language and complicated concepts; we can tell them to want children instead of using genital friction as a proxy. 2023 alignment concerns are more about failed generalization - that is, about fetishes. III. Evolution’s alignment problem isn’t just that humans have learned to satiate their libido in ways other than procreative sex. It’s that some humans’ libidos are fundamentally confused. For example, some men, instead of wanting to have sex with women, mostly want to spank them, or be whipped by them, or kiss their feet, or dress up in their clothes. None of these things are going to result in babies! You can’t trivially blame this on the shift from training to deployment (ie the environment of evolutionary adaptedness to the modern world) - women had feet in the ancestral environment too. This is a different kind of failure. Here’s a simple story of fetish formation: evolution gave us genes that somehow unfold into a “sex drive” in the brain. But the genome doesn’t inherently contain concepts like “man”, “woman”, “penis”, or “vagina”. I’m not trying to make a woke point here: the genome is just a bunch of the nucleotides A, T, C, and G in various patterns, but concepts like “man” and “woman” are learned during childhood as patterns of neural connections. We assume that the nucleotides are a program telling the body to do useful things, but that has to be implemented through deterministic pathways of proteins and the brain’s neural connections are too complex to trivially influence that way (see here for more). The genome probably contains some nucleotides that are supposed to refer to the concepts “man” and “woman” once the brain gets them, but there’s are a lot of fallible proteins in between those two levels. So the simple story of fetish formation is that the genome contains some message written in nucleotides saying “have procreative sex with adults of the opposite sex as you”, some galaxy-brained Rube Goldberg plan for translating that message into neural connections during childhood or adolescence, and sometimes the plan fails. Here are some zero-evidence just-so-story speculations for how various fetishes might form, more to give you an idea what I’m talking about than because I claim to have useful knowledge on this topic:
Combine this with equivalent animal “fetishes” - things like beetles species where the females have red dots on their backs, and the males try to mate with anything that has a red dot - and you get a picture where evolution tries to communicate a lot of contigent features of sex in the hopes that one of them will stick, then tells you to be attracted to whatever is most associated with those features. At least for men, I think the features communicated in the genomic message are simple things like curves and thrusting and genitals and smooth skin, plus something that somehow picks out the concept of “woman” (except in 3% of the male population, where it picks out the concept of “men” instead, plus an other 3% where it doesn’t pick out a sex at all). Real procreative sex usually matches enough of features of the genomic message to be attractive to most people, but if the original triggers were associated with some contingent characteristics, the brain might misinterpret that as part of the target - for example, if it was a cartoon animal, the brain might think the target includes cartoon animals. Other times, something that isn’t procreative sex matches the genomic message closely enough to be misinterpreted as the center of the target (eg getting whipped); usually procreative sex is somewhere in the target space, but maybe not the exact center, and a few people have such strong fetishes that procreative sex doesn’t register as erotic at all. The process of forming the category “sexually attractive things” is just a special case of the process of forming categories at all. I discuss the formation of categories like “happiness” and “morality” in The Tails Coming Apart As Metaphor For Life. Society feeds us some labeled data about what is good or bad - for example, we might see someone commit murder on TV, and our parents tell us “No! That’s bad! Don’t do that!” (and the other TV characters hate and punish that character). Then we try to extrapolate such incidents to a broader moral system. If we’re philosophers, we might go further and try to formally describe that moral system, eg Kantianism, utilitarianism, divine command theory, natural law, etc. All of these correctly predict the training data (eg “murder is bad”) while having different opinions on out-of-distribution environments. Which one you choose is just a function of some kind of mysterious intellectual preference for how to generalize inherently ungeneralizeable things - what I previously described as “extrapolating a three-dimensional shape from its two-dimensional reinforcement-learning shadow”. Fetishes are the same way. Here the evolutionary message provides semi-labeled data, giving people weird feelings when they see certain kinds of curvy, smooth-skinned people. Then people try to generalize that into an idea of what’s sexy. Usually their category is centered (in the sense that the category “bird” is centered around “sparrow” and not “ostrich”) around something close to procreative heterosexual sex. Other times they generalize in some very unexpected way, and are only attracted to cartoon mice. I think if we understood the laws of generalization, this would make sense. It would seem like a reasonable mistake that someone using Occam’s Razor and all the rest of the information-theoretic toolkit for generalization could make. But we don’t really understand those laws beyond faint outlines, so instead we’re reduced to YKINMKBYKIOK. IV. How does this relate to AI alignment? First, might the genome’s surprising ability to send a message in nucleotides that gets translated into brain wires help us encode something in a neural net? I think probably not. First, this method seems very unreliable. But second, it’s solving a problem we don’t have. Evolution controls the genetic code but not the reinforcement environment. Humans have the option of training AIs directly, a much higher bandwidth and less lossy communication channel. But it’s still fascinating that evolution accomplishes this difficult thing at all. Is there some sense in which evolution “solved the interpretability problem”, such that it can pick out connections in a neural net and edit them to try to get a message across? If so, figuring out how might help solve our interpretability problem, even though once we had a solution we’d want to exploit it differently from the way evolution did. Second, what do fetishes teach us about generalization? Assuming that the evolutionary message operates by reinforcing people (with pleasurable sexual arousal) when they see certain sex-related characteristics, what can we learn from the fact that some people generalize this reinforcement into the intended concept, and other people misgeneralize it into fetishes? For example: autistic people seem to have more fetishes than neurotypicals; you can find studies showing this, it’s confirmed by the SSC survey, and it’s further confirmed by my anecdotal experience around autistic people. Is this because something about the autistic ultralocal processing style favors misgeneralization? Is there some equivalent in AI parameters that could make them more or less autistic, and would that change how correct (or maybe how consistent) their category generalization is? I think this is an actually potentially fruitful line of research. Most of the really neat results will come from the next generation of AIs, but looking at human fetishes can give us more than zero useful information. You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Open Thread 290
Monday, August 21, 2023
...
Your Book Review: The Mind Of A Bee
Friday, August 18, 2023
Finalist #14 In The Book Review Contest (cw: insect pics)
Bride Of Bay Area House Party
Thursday, August 17, 2023
...
In Defense Of Describable Dating Preferences
Wednesday, August 16, 2023
...
Links For August 2023
Monday, August 14, 2023
...
You Might Also Like
'The most serious telecom hack in our history'
Saturday, November 23, 2024
Elon Musk's problem with Microsoft | Can you lie to an AI chatbot? ADVERTISEMENT GeekWire SPONSOR MESSAGE: Get your ticket for AWS re:Invent, happening Dec. 2–6 in Las Vegas: Register now for AWS
Bitcoin Nears $100,000 | Ledger’s Big Break
Saturday, November 23, 2024
A historic rally fueled by Trump's crypto agenda pushes bitcoin to new heights. Forbes START INVESTING • Newsletters • MyForbes Nina Bambysheva Staff Writer, Forbes Money & Markets Follow me on
The New MASTER PLAN
Saturday, November 23, 2024
Our second season will expose another hidden plot that has brought our world to the brink of collapse.
Guest Newsletter: Five Books
Saturday, November 23, 2024
Five Books features in-depth author interviews recommending five books on a theme Guest Newsletter: Five Books By Sylvia Bishop • 23 Nov 2024 View in browser View in browser Five Books features in-
Weekend Briefing No. 563
Saturday, November 23, 2024
Beyond the Bots -- The Lonely Technology Trap -- Africa's Healthcare Paradox ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Gladiators, vanity and self-restraint
Saturday, November 23, 2024
+ what's causing West Coast's drenching weather
Isabelle Huppert’s Uniqlo Socks and Paige DeSorbo’s White T-shirt
Saturday, November 23, 2024
Plus: Inside New York department stores of yore. The Strategist Every product is independently selected by editors. If you buy something through our links, New York may earn an affiliate commission.
The best carry-on backpacks
Saturday, November 23, 2024
A few of our favorites are on sale View in browser Ad The Recommendation Ad Consider a carry-on travel backpack Three carry-on backpacks pictured together. Connie Park/NYT Wirecutter Opening a good
☕ Ragebait
Saturday, November 23, 2024
Bluesky might be having its Justin Bieber moment... November 23, 2024 View Online | Sign Up | Shop Morning Brew Presented By The Points Guy Good morning. Christkindlmarket season is upon us. Here's
The Russian Missile, America's Deadliest Animals, and a Math Emergency
Saturday, November 23, 2024
NATO and Ukrainian officials will hold emergency talks Tuesday after Russia escalated hostilities with a hypersonic missile strike on a military facility in Dnipro last Thursday. ͏ ͏ ͏ ͏ ͏ ͏