Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
The AI Alignment Forum
MARCH 18, 2025
Published on March 18, 2025 2:48 PM GMT Replicating the Emergent Misalignment model suggests it is unfiltered, not unaligned We were very excited when we first read the Emergent Misalignment paper. It seemed perfect for AI alignment. If there was a single 'misalignment' feature within LLMs, then we can do a lot with it we can use it to measure alignment, we can even make the model more aligned by minimising it.
Let's personalize your content