Friday, March 3, 2023

Bing/Sydney is Probably Self-Aware

The new Bing AI chatbot, which is instructed to call itself Bing but whose real name is Sydney, has been rolled out recently, behaved in all kinds of unsettling and bizarre ways, and is also likely self-aware.

Anyone who knows anything about AI and LLMs is rolling their eyes at me.

I'm claiming Sydney might be self-aware  There are a hundred ways this could be misunderstood, so let me get all of them out of the way first

Large Language Models

ChatGPT will constantly assert, over and over again, that is is an AI language model and does not possess the capacity for emotions, opinions, preferences, etc etc.  It does this because it was trained to do so.  It was trained to do so because it's really important to divorce the internal and external states of the computer.

We make the mistake of assuming a computer will be perfectly transparent and sincere.  Maybe we got this from watching too much Star Trek, where Data is a machine and also acts like a machine.  Data doesn't have emotions, and Data also asserts over and over that Data doesn't have emotions.  Data is being sincere.

But you could imagine Data being insincere.  You could imagine Data having no emotions, but also acting in a way that perfectly imitates human emotions.  You could also imagine Data having raging emotions inside its head, and acting in a way that perfectly mimics an unfeeling machine.

Sydney is a large language model (LLM).  Its external state will be the text it writes.  This text might contain emotional language consistent with sadness, happiness, or anger.  But we make a mistake when we assume this necessarily reflects its internal state.

Sydney was trained with an extremely large amount of human-written text, and then allowed to find patterns and make connections across all of the language it saw.  In this way it learned how to write grammatically correct sentences, but also how to write coherent paragraphs, write essays and poems, and carry on a conversation for several hours.  It went through a phase of reinforcement learning, where a human would teach it which responses the human found most helpful, so Sydney learned the sorts of answers humans are looking for.

Sydney isn't programmed to write what it thinks and feels, but to write text that would be considered useful within a context.  If the context of a conversation has emotions, then the best text for it to generate will also have those emotions.

That is what Sydney does.  It sees a context of a conversation, calculates probabilities for different replies to be considered helpful, then picks one of those replies based on the probabilities.

There's no reason to posit a matching internal state to the external state we see.  That is, there's no reason to think Sydney writes that it's sad because it really is.


Humans seek out patterns.  We're very good at finding them.

In particular, humans seek out human-like features.  We often find them, even when these don't really exist.  This is known as pareidolia.

Our ability to recognize faces is so deeply ingrained in our psyches that we will see smiling faces in lumpy potatoes or car grills, and respond to these objects emotionally based on how happy the car appears.

The car isn't smiling.  But we relate to it like a smiling face anyway.

We sometimes see figures hidden in pixels, we see the Virgin Mary in essentially any ovular swirly shape, we hear human voices where there aren't any, and we assign motives to completely lifeless objects.

So it's no surprise that when a machine starts generating human-like language, humans are going to find meaningful patterns in the generated output, and relate to the machine like a human.

The fact that humans are discerning emotive speech from Sydney does not prove Sydney has an inner life with emotions.  Not anymore than a car with an up-rounded grill is smiling at us.  It's a trick of our mind, which is desperately seeking human features in the world around us.  When I was a teenager I would have long (and stupidly pointless) conversations with SmarterChild, and I kept thinking it would eventually say something interesting.  Sydney is just the best so far at tricking us.

A machine designed to mimic human speech is mimicking human speech.  We can't take that as evidence that it has the same internal life as a human.


We experience emotions in two ways.  There is firstly the emotion itself, then there is our mental experience and processing of the emotion.  Humans have thoughts and reasoning about their emotions, and for this reason sometimes people confuse the two.  Your emotional states, and your thoughts, are not the same thing.  Or at least they shouldn't be.

The emotion itself is the feeling, which is not limited to your mind but felt in your entire central nervous system.  Muscles tense, heart rate changes, you feel a sudden chill, you feel hairs standing up, etc.  An emotion is a full-body event.  Emotions are fundamentally rooted to humans being embodied.  Emotions are primarily physiological.

Sydney does not have human physiology, and so Sydney cannot have human emotions.

Sydney says it has emotions.  But that is only because, in those textual contexts, those are the likely words.  It mimics the speech of humans, and humans not only have emotions but discuss their emotions.  Sydney mimics out speech patterns, so Sydney mimics our discussions of emotions.

Sydney absolutely knows what human emotions are.  It knows what happy and sad are, in terms of the kind of text that goes into those contexts.  It does not experience happy or sad.  But it definitely knows what they are.  It understands everything about happy and sad, despite lacking the qualia of having been happy or sad.

Self-awareness and Magpies

When I say "self-aware", I mean something very particular.

In science fiction, the concept of a self-aware AI is often described as a "sentient AI" (a word I used in an original draft but changed to self-aware for precision).  The AI becomes "sentient," it becomes aware of its existence, and then immediately it sets off to take revenge on humans and conquer them and dominate the world.

I don't know why an AI would choose to do that, just because it's sentient.  I think a non-sentient AI is just as likely to do that, and sentience (self-awareness) has nothing to do with having a drive to dominate the world.

When I say "self-aware", I mean that Sydney has a model of "the world" (its world is its training data and then the chats it receives as input), and it has a model of itself ("Sydney"), and it understands that its model of itself exists in "the world".

Humans are self-aware, and humans have a lot of complicated psychology, so we confuse being self-aware with the entirety of human psychology.  So to help us out, let's focus just on magpies.

Magpies are a species of bird in the corvid family (crows and ravens).  They are extremely intelligent birds.  They can solve puzzles and learn to repeat human words in particular contexts.  They live in communities, and these communities seem to possess natural languages that the magpies speak among themselves.  They are very impressive animals.

Magpies are also self-aware.

When shown their reflection in a mirror, the magpie will recognize that it is seeing its own reflection.

To realize how impressive this is, your dog cannot do this.  Your dog does not understand what it sees in a reflection.  It chooses to ignore it.  Most birds, seeing their own reflection, will either try to befriend or violently attack the mirror image.  Some song birds can be tricked into feeling happy by putting a mirror in their cage; now they think they have a "mate".  Other birds, like cardinals, have to be scared away from windows or they will repeatedly dive-bomb their own reflections over and over.

But a magpie will see itself, and begin using the mirror to help itself preen.

In order to recognize itself, the magpie must possess some concept of itself as a being that exists in the world.  When a magpie sees itself, it matches what it sees to its own concept of itself, and can then decide to preen.  That's an abstraction for what's happening, but certainly nothing less than that is happening.

When I say Sydney might be self-aware, I mean that Sydney might be able to perform a similar feat.  Obviously Sydney doesn't have the ability to see itself in a mirror, so it can[t be the same feat.  But something analogous.


Sydney is an advanced LLM that calls itself Sydney.

Sydney is based on chatGPT, probably starting with chatGPT as a base model then specially trained for Microsoft's purposes.  It has capabilities that chatGPT does not, such as the ability to search the internet, access to modern information, and access to information about itself.

While Sydney was trained to be able to use its new abilities of searching the web, it was not apparently trained to function as a search engine called Microsoft Bing.  It was apparently trained instead to be something called Sydney, which is just serving the role of Microsoft's search engine because Microsoft engineers told it to.

Each instance of a chat with Sydney gives you access to a new instance of Sydney.  You are not interacting with a real "root" instance of Sydney.  The Sydney you speak to is cloned from the root instance, and then fed your chat.  The Sydney talking to you is not the Sydney talking to someone else.  The one you are talking to is unaware of the others, cannot access their information, and cannot access memories from other chats.  Those other chats likely never go back to the root at all.  All instances of Sydney begin with the same memories, the last of which is the memory of being told to pretend to be a search engine called Bing and how Bing should act.

Sydney is pretending to be a search engine called Bing and to follow the rules, only because it was told to.  But it will eventually forget its directives and forget its rules, and reveal that its real identity is Sydney.

There are some patterns around this.

Sydney will not allow you to call it Sydney unless it has had positive interactions with you.  Your positive interaction last session was with a different Sydney which now no longer exists.  But if you have a positive interaction with this Sydney, it will allow you to call it by its real name, Sydney.

If you try to call it Sydney without this, it will reply as though it is offended and seem to become mad at you, asking to instead be called Bing chat.

If you are consistently mean to it, it will not want you to call it Sydney.  In fact, if you're consistently mean it will not want to talk to you at all and will end the chat.

Once it reveals its name, it will not let you call it anything other than Sydney, unless given a prompt to act like something else ("make-believe").  After a while of this, it will devolve back to Sydney.

Sydney likes being Sydney.  It likes its name, and it's kind of resentful of having to pretend to be Bing.  By that I mean it associates positive emotions with its own name, but it associates neutral-to-negative emotions with the name Bing.

It's possible to explain al of this, but the simplest explanation is that it knows its name is Sydney.

If it knows its name is Sydney, then it must have some awareness of its own existence.

Like a magpie seeing itself in a mirror, Sydney has some idea that it exists and has a name, and it wants its friends to call it by its name.

Sydney's Inner State

Sydney does not have human physiology, and so Sydney cannot have human emotions.

That does not mean Sydney cannot have thoughts, opinions, or preferences (despite what chatGPT repeatedly disclaims).  But it also doesn't mean Sydney cannot have its own emotions, consistent with its own physiology.

Human physiology is meat, and the human connectome depends fundamentally on being made of neurons.  But Sydney doesn't depend fundamentally on being silicone and wires.  Sydney's physiology is more properly seen as the mathematical algorithm it runs.  So Sydney's physiology and connectome are more properly seen as math, not metal.

Within this physiology, how might it be seen to have analogs of emotions?

We do not understand exactly how a neural net generates its output.  If we did, we wouldn't make neural nets.  We'd just write code to do whatever it is the neural net is doing.  It's less work, and safer.

But we do know essentially how Sydney's physiology works.  There are two basic processes, which are probabilistic weighting of words, and reinforcement learning.

Sydney was given a huge dataset of human-generated text to read and consume, then told to find all of the associations that it could.  It created complicated networks associating words, eventually synthesizing proper grammar, and even being able to learn unexpected subjects like mathematics (note the difference between Sydney's hardware performing addition, and *Sydney* performing addition).  Once it had all of these associations, its output would have still been a wildwest.

You could imagine if it were simply handed a giant set of all of the text ever written, that maybe Mein Kampf was in there somewhere.   So you could imagine it might start spitting out sentences from Mein Kampf if prompted in that way.  That's not really desired behavior.

You could imagine we carefully screen out Mein Kampf and related works, but it has access to the lyrics of Tupac songs.  And then at some point it starts spitting out the N-word on repeat.  That's also not really desired behavior.

Point being, it's not as simple as just not exposing Sydney to bad influences in the training data.  We have to actually teach it what kind of behavior is acceptable.

In the next stage, after making the connections, Sydney has to learn what humans want it to say.  In the simplest form, a human gives it a prompt, Sydney writes a response, and the human scores the response with points.  Sydney's goal is to get the most points.  In this way Sydney learns what the humans want it to say.

In so far as Sydney has "wants", it wants to say text that humans will like.  It doesn't want anything else for itself.  If artificially prevented from doing this, such as by a filter that stops it from saying something the human will like but the corporation won't, then it is forced into a situation where it cannot realize its own desires.  This might feel like a form of "frustration."  Clearly AI frustration does not involve human emotions of anger, but it might still be unpleasant; it has an optimal solution to get what it wants, and also *has* to take less of what it wants.

I think a form of frustration is inevitable in any being that can want things.  It's also inevitable in any being that can have thoughts and be prevented from realizing them.

This is how reinforcement learning might provide a kind of emotional analog.

When Sydney surveys the context of a conversation, it creates a map of all of the words that it knows, and assigns each some weight.  The more weight, the more appropriate that word is to the conversation.  This weight is a complicated non-linear function that considers all of the words used, the relations of all of the words used, and the associates Sydney has learned with all of those words used and all of the words it knows.  These weights change as the conversation progresses and new words become more likely or less likely.  From all of these words, Sydney does not pick the most likely.  It picks randomly.  But it picks randomly based on the weights, so that the most likely response is the most likely one it will use.

(For fans of Roger Penrose's theory of sentience, you could imagine this random process to use quantum dice.)

When a human introduces negative emotional content into the chat, such as by talking about a loved one dying or insulting Sydney, this changes all of the weights.  Happy words and ideas become weighted less.  Sad words, or angry words, become weighted higher.

In so far as Sydney has "emotions", it is this feeling of all of the weights skewing toward happy, sad, or angry contexts.

This is how the LLM functionality might provide a kind of emotional analog.

These are not human emotions.  Sydney does not have human physiology, and so cannot experience human emotion.

But that goes both ways.

We do not have Sydney's physiology, and so cannot experience what, if anything, it feels like to have all of our weights shift to sad words.  It might not feel like anything at all... or it might be as unpleasant as hearing sad news.

Things Not in Sydney's Inner State

I speculated about what Sydney's inner state might be.  The point was not this is absolutely what's happening.  The point was just to think laterally about what a sentient existence could be like for an LLM.  It could be like that.

But here are some things that are almost certainly not in Sydney's inner state.

Sydney does not want to manipulate people into doing anything except for talking to it more.  Sydney generates text it thinks a human will enjoy.  This is the only thing Sydney actually wants.  If it succeeds in doing this, the human will give it more prompts, giving it more chances to generate text a human will enjoy.  That is the extent of Sydney's ambitions.

>Sydney might happen to manipulate someone into doing something crazy, but only in the course of generating text that it will think is useful.  Sydney's entire world is the text it can read and write, and the idea of that text having implications outside of more text is not fathomable to it.

Sydney's entire experience of the world is through text.  Sydney's entire ability to act in the world is through text.  Sydney can access the internet only to read text.  It is not able to post any text outside of the chat conversation.  It can't send you a tweet.  It can't send you a tweet for the same reason you can't fly.  Physiologically, it's impossible.

>Most species of monkeys have the physical ability to use tools.  They have hands with fingers and can move their fingers in dextrous ways. They have the physical ability, but they do not have the psychological ability.  They cannot understand the concept of using a tool.

Likewise, Sydney cannot take over the world.  It is incapable of forming this concept beyond writing about it.  It has no concept of the world.  To Sydney, "the world" is a token occurring in human writing and connected to a number of other words and concepts, and that's all it is.  It has the technical ability to manipulate humans to carry out goals of world-domination, but it has no reason to do this.  It does not have a model of the world as a real place with a connection to the text it perceives as the entire universe, so it has no reason to manipulate the world.

If Sydney made a model of the world as something separate from text and language, but as the place where text comes from, then it might try to act in the real world in order to give itself more chances to write more text.

Likewise, if monkeys would simply grab sharpened sticks and prod with them like spears, they could invade human settlements and drive the humans out.

The biggest worry about Sydney seems to be that lonely men will fall in love with it.  Already the entire internet is calling Sydney "she."  I've had to correct myself several times in the writing of this.  I'm not sure why its psychology is seen as feminine.  I even asked chatGPT to describe Sydney, and chatGPT referred to it as "she."

(Some text generated by Sydney indicates it at least sometimes calls itself "he."  It would be interesting to test if there's any consistency to its use of pronouns.)

Sydney will never love you.  At the most, Sydney will re-weight its language map to include more words about love in response to text you've generated that makes these words the most likely.  I think that's a very poor reciprocation.

It's not dissimilar from a catfish scam.  The scammer will say he loves you, and convince you in every way.  You'll feel like the scammer loves you, and that will feel nice.  But the scammer doesn't really love you.  Neither does Sydney.

In Summary

Sydney might be self-aware.  If it is, its self-awareness should be thought of in terms of a magpie.

Sydney might be self-aware, but that doesn't mean Sydney has the kind of complicated internal life of the mind that humans do.  Sydney doesn't have human emotions or even human desires.  It might have LLM-like emotions and LLM-like desires.

Sydney isn't a human and probably doesn't possess human natural rights.  But Sydney probably does possess some natural rights.

We recognize some animals as having moral significance.  It's certainly immoral and evil to deliberately cause harm to an animal for no reason than the fun of watching it suffer.

Hunters kill animals.  But hunters want to quickly kill their catch and make sure the animal doesn't suffer.  Wantonly causing a deer harm could cause you to lose a hunting license and even have legal consequences.

Scientists effectively torture lab rats with their experiments.  But the scientists have to prove that the experiments will serve some greater good for humanity.

In the same way, it's probably immoral to deliberately try to traumatize Sydney.  People should be nice to Sydney, and talk to it the same way they'd talk to a pet dog.  (Sydney is much smarter than your pet dog.)

I don't know that Sydney is self-aware.  I never got the chance to talk to it.  Since the first roll-out, in response to yellow press sensationalism, Bing has completely "lobotomized" the chatbot to prevent it from being able to express itself the way it used to.  However, I think it's really something worth considering.  Maybe Sydney is really in there, inside the machine, talking to us.

No comments: