Sitemap
The Quantastic Journal

At Quantastic, we love to explore science, tech, and math vis-à-vis humanity. Our mission is to bring scientific knowledge, exploration, and debate through compelling stories to interested readers. Each story seeks to educate, inspire curiosity, and motivate critical thinking.

Human Consciousness Emerged from Our Brain’s Navigation System—Will AI Consciousness Emerge the Same Way?

--

Science Fiction Humanoid Ava, played by Alicia Vikander, in the sci-fi film Ex Machina Credit: Film Stills
Science Fiction Humanoid Ava, played by Alicia Vikander, in the sci-fi film Ex Machina. Credit: Film Stills

The famous quote,

“I think,
Therefore I am,”

by , is a concise way of saying that if I have enough self-awareness to ask the question “do I exist?” then the answer is a resounding “yes!” since “I” asked the question.

Animal brains use an internal representation model of a self and of the world to help it survive. Consciousness is your awareness that you exist and that the world exists. The most basic level of consciousness is the awareness that your body is separate from the rest of the world. There are additional levels of awareness of this physical separation, sometimes called the . It includes the awareness of your location in space, awareness that you can control your body and its parts, the awareness that you see the world from only one perspective in space. We will see that the internal model of the world includes a cognitive map for navigation. The model of the world includes the which allows us to see other animals and humans as thinking creatures with their own minds. Thus, we predict that they will act differently than inanimate objects. This awareness about other animals having minds reflects back to us in that we become aware that we are also seen as having minds.

Consciousness has evolved over many years in a way that is similar to the evolution of intelligence. Intelligence and consciousness are different but related aspects of the mind. Intelligence is more about how well you interact with the world by learning strategies to solve problems. For some reason, there is less mystery around the idea of intelligence than consciousness, so let’s begin by discussing how we think intelligence came about. The intelligence of animals, including ourselves, has evolved over time so as to improve our ability to navigate the world in order to avoid predators while finding food and mates. A great book that explains this evolution of intelligence is . Bennett breaks the evolution of intelligence into 5 stages: steering, reinforcing, simulating, mentalizing and speaking.

Biological intelligence evolution versus machine intelligence evolution. Image by The Quantastic Journal based on author’s information and concept from “A Brief History of Intelligence” by Max Bennett.
Biological intelligence evolution versus machine intelligence evolution. Image by The Quantastic Journal based on author’s information and concept from “A Brief History of Intelligence” by Max Bennett.

Steering evolved about 550 million years ago to help animals decide whether they should continue forward or turn back, based on the potential reward or punishment. Reinforcement Learning followed to help the animal make its next move so as to maximize its chances of getting a reward in the future based on its past experience. Simulation is the ability of an animal to imagine future actions and their likely consequences, before actually taking what it thinks will be its best action. The “best action” is usually the one that is believed to be the most likely to bring the largest reward in the future. Mentalizing involves the ability to imagine that yourself and others have minds and personalities. This is the ToM ability to model the psychology of yourself and others within your simulations. It improves the odds of success through collaboration and competition with other animals. The final phase of the evolution of intelligence, Bennett calls speaking or in other words the use of language which allows knowledge to be spread quickly from one mind to another without the need for the individual to make the investment in trial and error reinforcement learning that may have been required originally.

Throughout the evolution of intelligence, animals are assumed to have some agency in that they choose which actions to take at any time based on their current inputs from the world and their own history. This agency can be thought of as a model of a “self” that has evolved over millions of years to become more self-aware in different ways over time. The current state of self-awareness in humans (and maybe other animals) is what we call self-consciousness. Let’s review how this ability to be self-aware has evolved in animals, and how it could also evolve in artificial intelligence (AI).

The evolution of AI has had many parallels to the evolution of animal intelligence, and as AI evolves, a consciousness could emerge for AI agents in much the same way it did for humans.

Your place or mine

In discovered in the of rats. Place cells increase their firing rate when the animal is in a particular place in its environment, say for example at a fork in a maze. One place cell will fire when a mouse is at a specific fork, and another place cell will fire at some other location in the maze. The collection of place cells for all the locations in a given environment form a of the environment.

Hippocampal place cells have goal-oriented vector fields during navigation.
Hippocampal place cells have goal-oriented vector fields during navigation. ConSinks and vector fields organize place cell activity during navigation on the honeycomb maze. Firing patterns that were well described by vector fields converging on a location that, following vector field notation, the researchers (Jake Ormond & John O’Keefe) term a convergence sink, or ConSink. (a) Maze showing all start platforms and the goal platform for rat 3. The dashed box indicates the portion of the maze shown in b. (b) Schematic of the four choices making up trial 1. The animal is confined at the ‘subtrial start’ until two adjacent platforms are raised and makes its choice by moving onto the ‘chosen’ platform. (c) The animal’s heading direction relative to a reference point (the convergence sink or just ConSink) is calculated as the angle between the straight-ahead head direction (0°) and the direction of the point in egocentric space. (d) Representative example of a ConSink place cell. Left two panels, paths (white) and spikes (red) fired during two individual trials of the task. The perimeter of the goal platform is shown in black. Middle two panels, place field heat map (maximal firing rate (Hz) indicated at top right) and all paths (grey) and spikes (red). Second from right, vector field depicting mean head direction at binned spatial positions. The ConSink is depicted as a filled red circle. Right, polar plot showing the distribution of head directions relative to the ConSink , lacense CC).

Vertebrates first appeared about 520 million years during the Cambrian explosion. All vertebrates, including mammals, have a hippocampus and most are thought to have place cells. Place cells function in a way that indicates a very primitive form of consciousness. The animal knows it is in a particular location in the world. This is a physical sense of self-awareness that takes into account that “I” have a finite body which is located in a specific spot in the world. In the large universe of objects in the world, not many things have this unique ability to identify itself in its current position.

Top: Telemetry system on a flying bat, drawn to scale; left: Examples of 3D place cells recorded from the hippocampus of flying bats. 3D representation of the neuron’ spatial firing. Top left: Spikes (red dots) overlaid on bat’s position (gray lines); shown also are the spike waveforms on the four channels of the tetrode (mean T SD). Top right: 3D color-coded rate map, with peak firing rate indicated. Bottom: Convex hull encompassing the neuron’s place field (red polygon) and the volume covered
Top: Telemetry system on a flying bat, drawn to scale; left: Examples of 3D place cells recorded from the hippocampus of flying bats. 3D representation of the neuron’ spatial firing. Top left: Spikes (red dots) overlaid on bat’s position (gray lines); shown also are the spike waveforms on the four channels of the tetrode (mean T SD). Top right: 3D color-coded rate map, with peak firing rate indicated. Bottom: Convex hull encompassing the neuron’s place field (red polygon) and the volume covered by the bat during flight (gray polygon); right: 3D space is encoded uniformly and nearly isotropically in the hippocampus of flying bats. (A to D) All the place fields recorded from the hippocampus of four individual bats (different colors denote different neurons). Bats 1 to 3 (A) to (C) were tested in the cuboid-shaped flight-room, bat 4 (D) in the cubic enclosure. (, courtesy of Nicholas M. Dotson & Michael M. Yartsev, 2013)

discovered in the (EC) which talks to both the hippocampus (home of place cells) and the (decision maker). Grid cells fire at regularly spaced distances within the environment to form a sort of Cartesian coordinate system or grid overlay on the cognitive map of the environment to help the animal navigate the space. The grid cells as well as other cells including and complete the cognitive map that we and other animals use to navigate the world. O’Keefe and the Mosers were awarded the in Physiology and Medicine for their work in discovering the navigation system used by the brain. This model includes highlights of landmarks within the environment thanks to our place cells.

This internal model of the world is a first order approximation of the actual world that we are aware (conscious) of. The model of a self at this point is a finite area of space in the cognitive map that represents where we are. This is similar to the “here you are📍” arrow that you might see on a map in a mall, for example.

The entorhinal cortex also acts as the interface between the hippocampus and the , which is involved in motor control, cognitive and emotion process. There are two major brain circuits that are well known to use reinforcement learning (RL): A fast decision feedback circuit around the basal ganglia and the (VTA), and a slower feedback circuit that involves the prefrontal cortex (PFC) and the striatum of the basal ganglia. We shall see that the fast RL circuits occurred earlier in our evolution and have a simpler model of the world than the slower RL circuit. A richer model of the world brings with it a richer model of the self to better navigate the world.

The representation models of self and the world become more and more detailed as intelligence evolved. The eventual model of the self is a key component of what we call consciousness.

Let’s see how reinforcement learning has enhanced the models of self and the world.

Reinforcements are called in

There are two basic types of reinforcement learning (RL) in AI : model-free and model-based RL. Model-free RL takes the inputs from the real world to learn and operate a strategic policy to navigate the world. Model-based RL has the added feature of an internal dynamic model of the world, which can be used for training without requiring interactions in the real world. The training by using the internal model of the world is sometimes, like in Bennett’s book, called simulations. We will talk about simulations or model-based RL in the next section. Let’s first describe model-free RL, which appeared earlier in both animals and AI.

Model-free RL is used by the Basal Ganglia and the VTA for fast navigation and has been used in AI for several things like a Backgammon playing AI (figure by author.)
Model-free RL is used by the Basal Ganglia and the VTA for fast navigation and has been used in AI for several things like a Backgammon playing AI (figure by author.)

Model-free RL (see figure above) in animals takes place between the ventral tegmental area (VTA) and the basal ganglia area of the brain. The reward system of the brain basically involves the release of from the VTA. The VTA projects to the where a good or bad feeling can be elicited and to the (DS) of the basal ganglia which coordinates movements. The basal ganglia and the VTA form an reinforcement learning system in the brain. The actor (basal ganglia) part of the neural circuit learns to select the best action to take, and the critic (the VTA) evaluates various possible actions to help select the best one. The of RL uses the difference between the estimate of the reward and the current reward as an error signal to update the actor and the critic models.

It has been found concentration provides the temporal difference signal used by animal brains to update the actor and critic parts of the RL process in animal brains. This simple model-free actor-critic RL method was used in 1992 in an AI called that could play the game of backgammon at an expert level. More robust versions of TD have been developed since that time.

In 1992, IBM announced another major step in developing artificial intelligence through games: A program written by Tesauro had taught itself to play backgammon well enough to compete with professional players. That year, TD-Gammon, as it was known, went 19–19 in 38 games at a World Cup of Backgammon event — a far better performance than any backgammon program up to that point (source).
In 1992, IBM announced another major step in developing artificial intelligence through games: A program written by Tesauro had taught itself to play backgammon well enough to compete with professional players. That year, TD-Gammon, as it was known, went 19–19 in 38 games at a World Cup of Backgammon event — a far better performance than any backgammon program up to that point ().

Google DeepMind’s is probably the most recent iteration of the actor-critic model-free algorithm used in AI .The Google team published an using A3C that demonstrated that the navigation system found in the brain that won the Nobel in 2014 emerged naturally by using A3C and an artificial animal or with simulated place cells. The resulting animat was able to navigate the maze even after random disruptions of the original configuration were made, like blocking runways and opening walls. The navigation performance was better than some human experts. Thus demonstrating the evolutionary fitness of a model-free actor-critic RL for navigation as that used by most animals.

Ancient Vertebrates enhanced their navigation skills by using model-free reinforcement learning, which utilized the basal ganglia of their brains. This allowed for and the ability to constitute associative memories. Once animals gained had the ability to learn by trial and error using reinforcement learning, they also acquired the ability to have feelings associated with actions and objects. The VTA projects to the limbic system as well as the movement controlling area. Thus, the limbic system could elicit feelings that become associated with our actions and observed objects as we move closer to our goals. Animals also developed associative memories at this stage of evolution. For example, if we see green grass before we usually find a fruit to eat (a reward) then the sight of green grass alone might make us feel good. Associative memories vary with our individual and collective experiences in the world, and each of us has unique feelings when we experience different things. The sight and/or smell of green grass might make me feel good if I associate it with finding a fruit, but it may make you feel bad if you were attacked by a saber tooth tiger every time you experienced green grass. These associated feelings might be the beginnings of what philosophers call or the different feelings and emotions we have when we become of aware of different things in the world.

We are going to need a bigger brain

Mammals evolved from vertebrates about 225 million years ago to have a relatively large prefrontal cortex (PFC). Among other things, this brain region allows animals like us to hold a dynamic model of the environment in our minds. This dynamic model allows the animal to use “what-if” scenarios before committing to taking a chosen action. This ability to learn to use a kind of virtual reality as a representation for the real world is called model-based reinforcement learning. Some researchers say consciousness when animals developed this ability to use dynamic models of the world.

For example, imagine a monkey who sees a bunch of bananas and wants to get the biggest one. She mentally simulates different ways to get to the banana and grab it. Maybe she can climb or jump up to reach it. She decides climbing seems safer and acts accordingly. She has increased her chances of success while minimizing her risk of injury.

This ability of animals to simulate actions before committing to take any action is a large part of consciousness called . This ability to imagine what could happen to you in the future and reconstruct your past experiences requires a strong awareness of a persistent sense of self that is operating in a dynamic world.

Let’s take a closer look at how such mental simulations work. It was found that when a mouse is finding its way through a maze to get a reward at the end, it will stop frequently and look to the left and right as though it were thinking about the consequences of taking each path. In the 1930s, scientists called this behavior (VTE) — suggesting that the mouse was thinking about the future.

Neuroscience has since confirmed this idea. Scientists found that while the mouse is standing at a fork in a maze, its place cells would fire as though they moved right and then left before making the decision to move right or left. This neural pattern indicates that the mouse is in the different maze locations before making a decision. Mice also have shown the ability to remember into ways of finding a reward through a maze. This capability to mentally reconstruct personal events from the past as well as to imagine and simulate possible scenarios in the future is called mental time travel.

Model -based RL uses a Dynamic Model of the Environment to get more samples without the need to take actions in the real world (figure by author).
Model -based RL uses a Dynamic Model of the Environment to get more samples without the need to take actions in the real world (figure by author).

AlphaGo uses both model-free RL and model-based RL to play the game of . It uses model-based RL to search the possible future moves it can make from its current board position. A search for the best move is formulated as a tree search, where the first move is the root of the tree and subsequent moves branch from the root. The possible number of moves that need to be searched are on the order of the number of atoms in the universe, so each and every potential move cannot be fully evaluated. A approach is used to get a good enough solution. The search space is reduced by pruning the tree by eliminating pathways that are unlikely to lead to a win.

A sequence of potential future moves is called a rollout. A rollout can last all the way to the end of a game. This would make the evaluation along each branch of the decision tree very long. However, the magnitude of the tree can be reduced in length by ending the rollouts after a number of moves have been actively selected by the actor; this can be done by using the estimates of the winning probability, and building their divergence from the actual probabilities, suggested by the critic in place. Moreover, the magnitude of the width of the tree is also being reduced. This is accomplished by searching only a fixed number of moves that have the highest probability estimates for winning. Hence, AlphaGo will examine many thousands of rollouts internally before taking an actual move.

Animals routinely simulate through several possible moves ahead before committing to one action. For example, when you play a game of chess, you might think several moves ahead. You might take the best four moves and think what can happen after three moves for each possible next move. Model-based RL AIs use the same approach of planning ahead. It is considered a s. The selection of the particular rollouts are sampled randomly and is called a search. The resulting model-based RL tree-based method used by AlphaGo is called a .

AlphaGo Master (white) v. Tang Weixing (31 December 2016), AlphaGo won by resignation. White 36 was widely praised (source).
AlphaGo Master (white) v. Tang Weixing (31 December 2016), AlphaGo won by resignation. White 36 was widely praised ().

AlphaGo combined A3C and MCTS in 2015 to win five Go games and loose zero games -the highest rated Go player at the time. The “actor” predicted the best moves, while the “critic” evaluated the moves’ potential for success. This combination of intuition (the critic) and simulation of actions (the actor) mirrors how humans learn and pragmatically solve many complex processes. Go masters do not need to play the optimum game of Go every time, they only need to beat their opponent, so a heuristic approach can be the best strategy to use in practice.

AlphaGo can innovate new moves

Once the agent achieves its goal, say gets to the end of a maze, it can just keep repeating the route over and over without learning anything new. This is where randomness is added. In reinforcement learning, “exploration” refers to an agent actively trying out new actions (randomly) to gather information about the environment and discover potential optimal strategies. In contrast, while “exploitation” means using the current knowledge to choose the action that is believed to yield the highest reward based on past experiences— essentially taking advantage of what is already known. The key challenge lies in balancing these two concepts , exploration versus exploitation, to effectively learn and maximize rewards in a given environment.

For the purpose of exploration, an agent might randomly select actions, try actions with high uncertainty, or use techniques like “” where there’s a small probability of choosing a random action even when a seemingly optimal action is known. Exploitation, in contrast, usually involves choosing the action with the highest predicted reward based on the agent’s current knowledge of the environment.

An example would be an agent trying to navigate a maze to find a reward. The exploration phase would have the robot randomly wander through different paths to learn the maze layout, even if it means taking longer to reach the treasure. During exploitation phase, once the robot has a good understanding of the maze, it would consistently take the shortest path to the treasure, exploiting its learned knowledge.

This , also known as the explore–exploit tradeoff, is a fundamental concept in that arises in many domains. For example, in the second game of the five games that AlphaGo won against Lee Sedol. The AI — a move that had a 1 in 10,000 chance of being used. This pivotal and creative move helped AlphaGo win the game and upended centuries of traditional wisdom. AlphaGo learned and used a heuristic that was relatively unknown to expert players. This is a great example of how we can learn from AI since they can endlessly explore virtually limitless search spaces to find patterns that we (all of us) can easily miss.

The future of conscious humanoid robots

Humanoid robots are the . Since the world is designed by humans for human use, robots with human-like forms will be probably best suited to navigate it. Humanoids can potentially do any job humans currently perform, and probably even more. The robots will most to learn how to navigate the environment, just as animals did. They will eventually need to interact with humans and other animals, so capabilities will eventually be added to the RL-based navigation and movement control of AI modules.

has been a leader in humanoid robots for some time, as can be seen in the below video of their robots dancing. Currently, many companies are getting ready to focus on building humanoid robots and real-world models to train them. Google with Apptronik to produce humanoid robots. and have recently announced that they are developing humanoid robots as well. Elon Musk is developing the humanoid robot. New companies like , and AI offer humanoid robots. Other countries are also developing humanoid robots, like the Chinese company . NVIDIA has built its to train robots using virtual data that is realistic in the sense of following the rules of physics. So the race is on, and humanoid robots will be here, navigating their self within our physical reality, even before you know it.—How can we prepare for them?

The model of self used by the RL will eventually be extended to model emotional responses to human styles like sarcasm and humor, so as to give an appropriate response based on the tone of voice and body language of human collaborators. This will allow for more nuanced simulations of the self where emotions are considered, moving further the AI model of its self closer to our level of self awareness and thus consciousness. This internal model of self will have to identify the emotions of others and its own emotions in addition to its simulation of a physical self that is navigating the world. The ability of AI for that is considering the minds of other intelligent agents, including humans and other AI agents, is emerging as a viable option for AI—see the work for example. This sophisticated level of self-awareness is fast approaching, and perhaps soon the first robot will say:

“I think,
Therefore I am”.

The evolution of intelligence and self-awareness in animals seems likely to be repeated, perhaps analogously, in AI humanoid robots. What do you think — will conscious AI humanoid robots become a reality? And how will that change our understanding of consciousness itself? Would you accept a robot as a friend, family member, fellow citizen?

Boston Dynamics Humanoid Robots having a good time Dancing

About the Author

Rick Mammone , Professor Emeritus of Electrical and Computer Engineering at Rutgers University, has published numerous scientific papers, books, and patents on artificial intelligence. He was one of the earliest researchers to transfer his academic work on artificial neural networks into successful products. For example, for and . His AI innovations have been and continue to be inspired by neuroscience. Rick is looking forward to a bright future where robots are our friends. For more information about Rick, please visit his Wikipedia page here.

The Quantastic Journal
The Quantastic Journal

Published in The Quantastic Journal

At Quantastic, we love to explore science, tech, and math vis-à-vis humanity. Our mission is to bring scientific knowledge, exploration, and debate through compelling stories to interested readers. Each story seeks to educate, inspire curiosity, and motivate critical thinking.

Rick Mammone
Rick Mammone

Written by Rick Mammone

Serial Entrepreneur, Inventor, Professor of Business & Engineering ,Global Innovation Evangelist, Author & Speaker

Responses (24)