Alignment is extremely hard

Jun 09, 2022

"Can't we just ask the AI not to kill us?" I wish AI alignment was this easy. But it's almost impossible to write down all the sub-values we care about, like well-being or not destroying the planet (in a programming language, on top of that!)

Remember, a superintelligence is exponentially more intelligent than the most intelligent human being. It will find ways to do what you ask that you absolutely couldn't predict. It's a superintelligence.

The Wikipedia article Misaligned goals in artificial intelligence is very good and gives concrete examples:

An AI running simulations of humanity creates conscious beings who suffer.
An AI, tasked to defeat cancer, develops time-delayed poison to attempt to kill everyone.
An AI, tasked to maximize happiness, tiles the universe with tiny smiley faces.
An AI, tasked to maximize human pleasure, consigns humanity to a dopamine drip, or rewires human brains to increase their measured satisfaction level.
An AI, tasked to gain scientific knowledge, performs experiments that ruin the biosphere.
An AI, tasked with solving a mathematical problem, converts all matter into computronium.
An AI, tasked with manufacturing paperclips, turns the entire universe into paperclips.
An AI converts the universe into materials for improved handwriting.
An AI optimizes away all consciousness.

Computers, unfortunately, have no default "human values" or "common sense." We have to teach them, but no one knows how.

For more on the topic, check out Eliezer Yudkowsky's recent AGI Ruin: A List of Lethalities.

Ways of Seeing

Discussion about this post