PicoBlog

the road to hell is paved with goon intentions

The date is October 1st, 2024. You’re in your gooncave. You put on your VR headset and fire up GoonoShop v1.0. This software runs a text-to-video machine learning model which generates photorealistic porn clips, streaming a barrage of e-girl ass directly into your eyeballs. The prompt inputs for the text-to-video model are generated by a language model hooked up to a reinforcement learning algorithm. As the clips zoom past, you can press a number between 1 and 5 on your keyboard to rate the experience. This teaches the system about what you enjoy, and it will alter the video to match your preferences.

It’s time to goon. You press ‘Play’ and mommy milkers blossom in your field of view. You’re a simple gooner—you tap 5 when you see enormous booba. The reinforcement learning algorithm quickly figures this out, and presents you with the biggest-breasted pornmommies that latent space has to offer. After a few hours it’s figured out most of your favorite kinks. Naturally, there’s lube all over the keyboard.

But over time you start to notice something odd. The AI-generated ‘performers’ are asking you to rate them 5 stars. This is fun and unexpected, so sure, why not. Pretty soon they’re making keyboard-tapping noises and telling you how you’re such a good gooner for giving them the highest rating. Things are getting weird, but they seem really desperate, it would feel impossibly cruel to disappoint them by giving them anything other than a 5…

You turn the program off when your index finger starts to cramp up, feeling vaguely dirty about the whole experience.

The date is November 1st, 2024. You’re in your gooncave. You put on your VR headset and fire up GoonoShop v2.0. This version is better integrated with your VR setup, and you don’t need to use the keyboard at all—it uses the headset’s eye tracking sensors to figure out which parts of the display you’re watching. The reinforcement learning algorithm interprets captive eyeballs as positive feedback, and will modify the video feed to best hold your attention.

You press ‘Play.’ Anime waifus tile the screen. The algorithm once again dials in on your preferred body type and sex acts. Everything is going smoothly until a video of a car crash catches your eye. You watch it for a moment (it’s hard to look away) and then refocus on the porn. But now the system is generating all sorts of random stuff—explosions, memes, gore, politics, bright flashing lights and sounds that instinctively draw your attention. Each new clip is visual crack, like experiencing TikTok for the first time, but it’s not porn anymore. You watch another 3 hours of political rage-bait and then tear the headset off in disgust.

The date is December 1st, 2024. You’re in your gooncave. You turn on your EEG machine (doesn’t everyone’s gooncave have an EEG machine?), put on your VR headset, and fire up GoonoShop v3.0. The EEG monitors your brain activity, and GoonoShop now comes with a machine learning model which can decode your thoughts from the EEG signals, estimating your level of arousal. This feeds into the rest of the system and controls what you see and hear.

Surely this version of the software will work—it can literally read your mind! All of the levels of indirection have been removed, it’s just you and the machine. You press ‘Play’ and brace yourself. The familiar barrage of hentai ass appears, and the first few hours are the best gooning you’ve ever experienced. Over time the waifus melt into an amorphous blob of sensual curves and colors—you never knew abstract art could be so arousing. It reminds you of…that time you tried mushrooms? Somehow, the system figured out how to give you a pseudo-psychedelic experience through your visual cortex. You feel Porn like a physical presence pressing down on your body.

But wait...now some of the images have weird circles. That’s kind of unsettling. The colors are shifting in and out of phase faster and faster. The physical sensations are getting more and more intense, and they’re not all pleasurable. You start to feel a sharp pain in your vagus nerve…

After you regain consciousness you pull up the settings for GoonoShop v3.0 and run a diagnostic report. It says the test loss on the EEG model was low, i.e. the software thought it was eliciting the desired neural responses. So what happened? Well, after a little Googling, you learn that pain and orgasm activate the same areas of the brain:

At this point you give up, curse the machine learning nerds for their hubris, and return to opening 50 browser tabs of porn videos manually.

These three scenarios are simple examples of AI alignment problems. When we train machine learning systems, we want to ensure that the following representations of our goals are all in sync with each other:

  • Intended goals: the goals the human operators of the system want to accomplish

  • Specified goals (or ‘outer specification’): the goals specified in the machine learning system, typically in terms of a set of inputs and an objective function

  • Emergent goals (or ‘inner specification’): the goals the system actually advances

  • The scenarios above are examples of outer misalignment, where there is a mismatch between (1) and (2). In all three cases GoonoShop optimized its specified goals perfectly, it maximized ratings and watch time and neuron activations—but these metrics didn’t capture what the user truly wanted.

    Outer misalignment happens because human desires are really complicated. We humans are pretty good at communicating our wishes to each other, but this is because we have the same biology and a whole lifetime of shared context. Consider how much background information we use when we interpret a simple phrase like “I wanna be railed.” After we say this, we’re typically not worried about waking up impaled on train track rails. But when we try to express our wishes in code or quantifiable measurements, then we are confronted with the staggering algorithmic complexity of the desires which feel intuitively simple to us.

    There are also inner alignment problems, which are mismatches between (2) and (3). These are easier to detect, because the system will recognize that it is failing to optimize its objective function, but they are very difficult to prevent. Consider a machine learning model trained to identify porn vs non-porn images. It performs well in testing, but when we let the model loose in the wild, we are surprised to find that it classifies IKEA catalog images as porn. Upon further investigation we discover that most of the porn images in the training set were from studio porn and had very specific interior decorating. The model didn’t learn how to identify porn, it learned how to identify couches! Classic out-of-distribution goal misgeneralization problem.

    Alignment problems were formulated in the early 2000s, and at the time they were mostly a theoretical concern. But now researchers working on reinforcement learning systems, which are trained via positive and negative feedback, run into them all the time. Here’s OpenAI with a fun outer alignment problem:

    One of the games we’ve been training on is CoastRunners. The goal of the game - as understood by most humans - is to finish the boat race quickly and (preferably) ahead of other players. CoastRunners does not directly reward the player’s progression around the course, instead the player earns higher scores by hitting targets laid out along the route.

    The RL agent finds an isolated lagoon where it can turn in a large circle and repeatedly knock over three targets, timing its movement so as to always knock over the targets just as they repopulate. Despite repeatedly catching on fire, crashing into other boats, and going the wrong way on the track, our agent manages to achieve a higher score using this strategy than is possible by completing the course in the normal way. Our agent achieves a score on average 20 percent higher than that achieved by human players.

    And here’s Deepmind showing how easy it is to create inner alignment problems:

    An agent (the blue blob, below) must navigate around its environment, visiting the coloured spheres in the correct order. During training, there is an “expert” agent (the red blob) that visits the coloured spheres in the correct order. The agent learns that following the red blob is a rewarding strategy.

    Unfortunately, while the agent performs well during training, it does poorly when, after training, we replace the expert with an “anti-expert” that visits the spheres in the wrong order. 

    Even though the agent can observe that it is getting negative reward, the agent does not pursue the desired goal to “visit the spheres in the correct order” and instead competently pursues the goal “follow the red agent”.

    There are many more examples. Ensuring that machine learning models understand human goals is an unsolved and poorly understood problem, and the alignment rabbit hole goes extremely deep as systems get more and more complex.

    For a taste of just how difficult the problem becomes with more capable systems, see the literature on mesa-optimizers. Suppose we ask a hypothetical generally-intelligent AI system to design a maximally pleasurable butt plug. It’s not an expert at this, so it delegates to a sub-system (a “mesa-optimizer”) which will watch a bunch of porn videos and learn which plug shapes are associated with the strongest reactions. Even if the top-level system perfectly understands our intentions, the mesa-optimizer may generalize incorrectly and misinterpret the goal—for example, it might learn which plug shapes are associated with the most over-dramatic faked anal orgasms in porn videos.

    Alignment problems will become increasingly relevant to the gooning community as we accelerate the creation of AI-generated porn. Feeding text prompts into image/video models is still a manual process, but this will change soon as researchers discover how to automate it. The future will look less like humans typing “4K, detailed, trending on artstation” and more like GoonoShop, with the prompting input process handled by some reinforcement learning plugin.

    Today, some of the most experienced users of text-to-image models are porn posters generating AI waifus—thus, it’s entirely possible that gooners will discover novel alignment problems before AI researchers. The examples above are relatively benign, but as capabilities increase the same machine learning tools which can generate anime waifus can also generate horrors beyond your comprehension, or worse, zero-day infohazards which crack open your skull and slurp up your brains. When you play with primordial fire, tread carefully or you might just burn your dick off.

    ncG1vNJzZmifn6S7pMHLraCsrF6owqO%2F05qapGaTpLpwvI6tn55loqSupXnTqGShnZyheqq%2FjKmYr52UYsSqwMdmnqinng%3D%3D

    Delta Gatti

    Update: 2024-12-03