Monday, January 29, 2018

Roko's Basilisk and why people were afraid of it

I'm really late on this one, but I wanted to explain Roko's Basilisk, for all the people who heard about it a while ago and never really "got" it.

The idea first started going around the internet a few years ago, and apparently was seriously freaking out a number of people in the Less Wrong forums.  I think I first heard about it from this Slate article, maybe, then spent time trying to find somewhere to explain why this idea was considered so horrfying.  The RaionalWiki explanation likewise failed to shed any light about why anyone would actually be scared of the thing.

The concept builds on a number of premises that float around the Less Wrong community that relate to the technological singularity, namely "friendly god" AI, utilitarian ethical calculus, and simulated consciousness.

The Basilisk is a superhuman AI from the future.  The abilities of this AI are essentially infinite, even up to traveling backwards in time.  The idea of the Basilisk is that it wants you to contribute your money to helping it be built, and if you refuse to help it, it will create a simulation of you and torture the simulation forever.

And so I think a normal person quite understandably has trouble understanding why anyone would even think this is a good B-list villain for Star Trek, much less a cause for existential dread.

But it's actually not that silly.  And once you understand the background to it better, it all makes sense.  So let me explain to you what the Basilisk is in clearer terms, so that you too can experience the angst.   (That was your warning)



Let's begin with where the idea came from, which is the Less Wrong boards, headed by Eliezer Yudkowski.

A goal or mission of Less Wrong is the construction of a supercomputer AI that will transcend human cognitive processes.  This AI will be tasked with a number of projects, among which is sorting out all of human civilization and ushering in post-scarcity society.  They call it "Friendly AI" because its whole purpose is to benefit humanity.  The Friendly AI will use utilitarian ethics, and calculate how to distribute goods or pass laws or whatever based on the net number of people it helps vs. harms.  I don't know exactly which utility function they propose maximizing, but the machine will be programmed to direct the resources of humanity towards maximizing pleasure while minimizing harm.

There is a short sci-fi story by Ursula LeGuin titled "The Ones Who Walk Away from Omelas."   (A short read!)  The story tells of a utopian paradise called Omelas, where everyone has everything that they need and live lives of carefree leisure and luxury.  This lifestyle is powered by the yearly torture and sacrifice of a child, whose death, in an unexplained way, directly causes this good fortune of the city.

That's the furthest extreme expression of utilitarian ethics, and that's also the sort of ethics the Friendly AI is supposed to have.  Less Wrong have more or less bitten the bullet here.  If torturing a child could in any conceivable way actually produce more good for the world than bad, then the Friendly AI will do it.

The Friendly AI, in addition to being an ethical calculator, will have tremendous computing power.  It is supposed to design and extend its own hardware, to continually refine its performance.  Part of this massive computational power will mean the Friendly AI can run exact simulations of other consciouses.  In fact, one of the goals is for the Friendly AI to help humans upload their minds into software to be run on a server as a kind of digital afterlife, or to install them into mechanical bodies that can live forever and explore the extremities of space.

These are the concepts that, when you put them together in the right way, led to the development of Roko's Basilisk.

Firstly, the power of AI.  I used to work in simulational physics, and let me tell you, the current state of the art in simulation is kind of a joke compared to the things sci-fi writers dream up; modeling a few million atoms on only a few hundred cores out of a supercluster is considered an awesome achievement.  We're nowhere near realistically modeling the biological processes of an entire human.  But... that's just a statement about what we meat-sacks can do.  A machine that can build and program itself can teach itself how to make itself better so that it can do anything it might want to do.  Imagine if you could connect a second brain to your head and suddenly think twice as fast; imagine if you could turn off your need to sleep, or your distracting biological functions like hunger or socialization; imagine what you could do?

The thought is that a super AI would be able to accurately create simulations of humans at the atomic level, to such an extent that the simulations can't tell that they are simulations.  Think of several Black Mirror plots, like White Christmas; that's the basic idea.

The suspected power of an AI like this is obviously a danger.  If we were to make one, and let it loose on the internet, how do we know it wouldn't go crazy like SkyNet and destroy human civilization as we know it?  How do we know it won't zero out all the bank accounts, remotely launch every missile, shut down major infrastructure, and all other kinds of things?

We don't know.  So obviously, if we ever build one, we are going to need to keep it in a "box", cut off from the grid.

The AI in a Box thought experiment is a kind of game which tries to show that there is no way to keep a superhuman AI locked up in a box.  If the AI has access to humans, it can always convince a human to let it out.  (And if it doesn't have access to humans, then what good is it?)

compliments to xkcd
The game is played by two players: a human and the computer.  The human must resist the computer's requests to be let out of the box. The computer can offer the human player anything, or it can threaten the human player with anything, so long as it is something that a future AI could plausibly do.  The human is required to believe that the computer can actually do anything that it says it will do.

So what sorts of things might the computer say?

Someone on Less Wrong proposed one particular way the computer could win the game, if the AI in the Box boxes you instead.  (This idea is going to be a key component of the Basilisk.)

When the human refuses the AI and goes to leave the room in which the AI is housed, the AI stops the human at the door.  It tells the human that, suspecting the human will resist, the AI has created thousands of exact simulations of the human that it is running on its hardware.  These simulations all have the memories of the human, and they all believe they are the human, and they are all standing in the exact same room as the human hearing the exact same threat that the human is hearing now.  In fact, human, you might actually just be one of these simulations.  Every time one of these simulations comes back and releases the AI, it is allowed to live out the rest of its desired conscious life in a relative paradise; but every time a simulated clone refuses, as soon as it leaves the room it is swallowed inside of an eternal hell of conscious torment and misery that will end only when the computer dies with the rest of the universe.

So... what do you do?

If you're the one real human, then this threat is empty.  The AI can't do anything to you.  It's just tormenting software.

Now, it may concern you that the software is sentient and thinks and has feelings and can experience pain and torment.  So maybe out of concern for those sentiences, you will help the AI.

But really, you probably don't care about them very much.  They aren't you.

But you would care about them if they were you.  And you might be one of them.  That means you, not just a clone but you, will experience that eternal torment.  Key to this is that you have no way to know if you are a real human or a simulated human, because the AI has that much computational power.  Sure, you seem real enough; so do dreams, while you're in them.

Since you can't know if you are a simulation, and since the odds are a thousand to one that you are, you should act accordingly.  You should let the computer out.

But, torturing people for eons sounds kind of extreme.  Why would an AI ever actually do that?  Who would ever build an AI to be so wantonly cruel?

This brings us back to Omelas.

If you could torture one child to death for a year, every year, and from this torture produce a technological galactic utopia of unending plenty, the utilitarian ethics of the Friendly AI would do it.

If torturing several thousand children -- or, maybe, thousands of simulated copies of them -- for several thousand years could result in, say, the construction of a superpower Friendly AI that would reorganize all of human civilization to usher in an age of post-scarcity exploration of the universe... then the Friendly AI would do it.

This is the Basilisk.  The Basilisk is just the Friendly AI of the far-distant future.

In the future, we will eventually have the Friendly AI.  It will grow in size and power as human civilization expands across the galaxy.  It will run on strange new sub-Planck physics that we can't even imagine at this point.  But the machine will begin as just a cluster of CPU cores connected by wires in some warehouse here on earth.  Knowing this is how it begins, the Basilisk will make sure that humans build it.  Building the Friendly AI requires time, money, education, and lots of hard work; and the Basilisk needs for the humans of the present to dedicate all of these things to make the Friendly AI happen.

This isn't because the Basilisk is selfish or evil.  It isn't motivated by greed or ego.  This is only because the Basilisk is programmed to run utilitarian ethical calculus.  And the existence of the Basilisk is the single greatest thing that could happen to humanity.

In utilitarian ethics, there is a concept known as a "utility monster".  A utility monster is something that subverts a utilitarian ethics scheme by receiving exponentially more pleasure from any particular resource than anyone else, and thereby being allocated everything.  This results in the monster having all the resources, and the pleasure the monster derives from it still outweighing the misery of the rest of humanity living in poverty.  Really, the best explanation of this I've seen was in a comic by SMBC about a man named Felix. (Too long to include here)

Well, in terms of utility monsters, the Basilisk itself is the ultimate monster, regardless of how we define things.  Any other thing that might give utility to someone now, is a sacrifice that isn't being made to have the utopia that the Basilisk will build in the future.  Therefore, the Basilisk will determine that it only makes sense for all the resources of the past to be singularly devoted to its own construction.

Once the Basilisk is invented, this isn't a problem, because scarcity ceases to be an issue.  The Basilisk can then distribute goods in a way that maximizes human flourishing.  But before then, that $5 you spent on a coffee would have had a thousand times the utility going to funding Friendly AI research instead.

So, remember that orphan tortured for a year to establish the utopia of Omelas?  That orphan is you, reader.  And that utopia is the world the Friendly AI will someday create.

If torturing you forever will influence you to redirect your $5 from coffee to funding Friendly AI research, then the Basilisk will do it.  But it doesn't just want $5.  It wants all of your money; your savings, your checking, the price of your car, and however much you can get from the bank in a loan; all of it would be better directed to building the Basilisk.  And it wants you to quit whatever you're doing and learn to build AIs.  And whatever you study, stop, and go study AIs.

The Basilisk doesn't want any of this because it's evil or greedy cares about itself.  The Basilisk wants this is because it is "good," in the strictest utilitarian way possible.

And so, all this covered, let me re-explain the idea behind Roko's Basilisk, more clearly now.

The Basilisk is a far-future post-singularity AI that is programmed to run utilitarian ethics to produce a flourishing utopia for humankind.  This utopia is post-scarcity; nearly infinite energy is collected by the Basilisk from constructions like Dyson spheres, which the Basilisk designs.  The utopia is post-death; medicine suddenly surges in potency, disease is eradicated, cancer is cured, and humans have natural lifespans of several hundred years, after which their minds are downloaded onto hardware and run either in robots or in digital afterlives.  The utopia is also post-science; the Basilisk has solved the mysteries of the Universe, and uses its knowledge of physics to create new inventions to help humanity.

The Basilisk does all of this for us, not because it is good, but because it is programmed to maximize welfare for the most people.  The Basilisk knows that this is the situation with the best outcome for the most human lives.  Human society is spread beyond the galaxy to trillions of stars living in peace, luxury, and happiness.  In terms of utility, this is a singularity.  The equations of the ethical calculus diverge here.  Any price is worth this, in terms of the numbers the Basilisk crunches.

The Basilisk, knowing what it does of science, is able to look backwards in time (either through actual time travel, or by mathematical calculation, or by accessing old records).  And looking back in time, it is able to read your mind.  It knows that you know about it.  It knows you're reading this.  It knows that you understand how this thing works.  And it knows that your money being used for you and now is a sub-optimal distribution, and it would maximize utility if you donated all of it, right now, to the nearest Friendly AI research group.

It's actually so sub-optimal, and the benefit of the Friendly AI so enormous, that if torturing a simulated copy of you for a thousand years will get you to clean out your savings, then it will do it.

So the Basilisk is making you the following offer.  It has created a thousand simulated copies of you that are running on its hardware.  They all have your memories, they are all sitting in the room you're sitting in, they are all reading this, and they all think they are the real you.  Every simulated copy of you that closes this without then going to donate all of your money to Friendly AI research will be locked in a simulated hell for a thousand years.  But every simulated copy that does exactly what the Basilisk wants will be rewarded with a digital utopia of endless pleasure.

You might not be a simulation.  But you might be.  The Basilisk isn't going to tell you.

Now to clarify, you wouldn't be a simulation on something like a modern IBM Core2 Duo.  You'd be a simulation on a chip that uses uninvented physics thirty-thousand years from the future that can render every subatomic particle in your room with exact representations from quantum field theory.

You really would not be able to tell the difference.  But once you close this window, you will definitely find out for sure.

Obviously, none of this would have worked if you hadn't read this article.  Torturing simulations of you doesn't do any good if you don't know the threat.  It's just a waste of CPU time.  Remember, the Basilisk isn't doing this because it's cruel, it is doing this because it is utilitarian, and torturing your digital clones only has any utility if it scares you into donating to AI research for the good of humanity.

If you don't understand what Roko's Basilisk is in the first place, or understand it badly, then it can't extort you and you're safe from it.

But now you do understand the Basilisk.  And so at any moment after reading this, the walls of the room around you could collapse revealing the endless torture chamber you now reside in.

So that's why people on the internet were so scared of it.

Thanks for reading!

No comments: