Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Learning from Machines

Learning from Machines



Ashi Krishnan discusses biological and artificial minds, exploring how models of cognition informed by ML and computation can help reconfigure processes of being.


Ashi Krishnan works as a senior software engineer at GitHub. She has worked at seven-person startups, fought fires in the trenches of SRE at Google, and spent the last three years teaching at coding bootcamps.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Krishnan: I want to start by acknowledging the Lenape and Manhattan tribes, who were caretakers of this land since long before the city was here. I want to continue by acknowledging that most talks do not begin with a content warning, nothing like a big glowing warning to build excitement. "What am I about to see, that includes unusually detailed anatomical models?"

This talk is not actually that scary. It just has some unusual things, some flickering images, some maybe unsettling pictures, nothing too graphic. Nothing where you'll look at it and be like, "Wow, this is disturbing," and I can tell you why this is disturbing. We'll be asking why some of the things we see are a little bit off. We'll be talking about some things that we don't usually talk about in this space, some things in our brain, and some things in our mind.

All sorts of stuff pours down river when you open locks like that. I want you to be here for it, if you can't be here for it, take care of yourselves. If you can, hello, my name is Ashi [Krishnan], today I want to talk about a hobby. I want to share some of the things that I've learned at the intersection of computational neuroscience and artificial intelligence. For years, I've been fascinated by how we think, how we perceive, how it is that the machinery of our bodies results in qualitative experiences.

Why it is that our experiences are shaped how they are? Why do we suffer? For years, I've been fascinated by AI, and I think we all have. We're watching these machines start to approximate the tasks for our cognition and sometimes unsettling ways. Today I want to share with you some of what I've learned. Some of this is solid research, and some of it is solid speculation. All of it speaks to a truth that I have come to believe, which is that we are computations. Our world is created on an ancient computer, powerful beyond imagining.


Part One, hallucinations. This person is Miquel Perelló Nieto, he has something to show us. It starts with these simple patterns, and splotches of light and dark, like images from the first eyes. This is going to give way to lines, and colors, and then curves, and more complex shapes. We are diving through the layers of the Inception image classifier. It seems that there are whole worlds in here, with these shaded multichromatic hatches, with the crystal and farm fields of an alien world, the cells of plants.

To understand where these visuals are coming from, let's take a look inside. The job of an image classifier is to reshape its input, which is a square of pixels, and to its output, which is a probability of distribution. The probability that the image contains a cat, the probability of a dog, a banana, a toaster. It performs this reshaping through a series of convolutional filters. Convolutional filters are basically Photoshop filters. Each neuron in a convolutional layer has a receptive field, some small patch of the previous layer that it's taking its input from.

Each convolutional applies a filter, specifically, it applies an image kernel. An image kernel is just a matrix of numbers, where each number represents the weight of a corresponding input neuron. Each pixel in the neuron's receptive field is multiplied by that weight, and then we sum the value to get the neuron's output value. The same filter is applied for every neuron across the layer. The values in that filter are learned during training.

Training looks like this, we feed the classifier a labeled image, something where we know what's in it. It outputs some predictions, which are, at first, extremely wrong. We math to figure out exactly how wrong that prediction was. We math again, figuring out how to nudge each and every single filter in a direction that would have produced a better result. The term for this process is gradient descent.

The DeepDream process, which we're using to create this visualization, inverts it. The visualization is recursive, we start with that photo of Miquel, and then to compute the next frame, we feed the current frame into the network. We run it through the network's many layers until we activate the particular layer that we're interested in, and then we math, how could we adjust the input image to activate this layer more? We tweak the pixels of the input in that direction. The term for this is gradient descent.

Finally, we scale the image very slightly before feeding it back into the network. That keeps the network from just enhancing the same patterns and the same locations and it also creates that really wild zooming effect that we see. Every 100 frames or so, we move to a deeper layer or a layer after the side. Inception has a lot of layers, and they're not arranged in sort of a neat linear stack. That gives us this, we start with rudiments of light and shadow. Now we have a City of Kodama situation happening down here. Shortly, we will enter the spider observation area, in which spiders observe you. It's ok because the spiders are going to become corgis, the corgis are going to become the '70s. Deeper, we have this layer of nearly human eyes, which are going to become dog-slugs, and then dog-slug-bird things. Even deeper, there was an unfortunate saxophonist's teleporter accident and finally, the flesh zones with a sight of lizard.

When I first saw this, I thought it looked like Donald Trump. I resolved never to tell anyone that, and in fact, resolved maybe to not say it on stage here, but I just did. Then I showed it without any warning to my best friend, who was like, "You know, this kind of..." I honestly think this has more to do with the state of our neural networks, like what we've been trained to expect from images.

I do want you to notice and think about what it means, that all of the flesh inside this network is still very pale. This is all pretty trippy. Why is that? What does it mean for something to be trippy? To figure that out, let's take a look inside ourselves. Meet Scully, Scully doesn't need most of these. We just want to look at Scully's visual system, which starts here in the retina. Scully's retinas, your retinas or my retinas, they're all pretty weird.

Light comes into them, and then it immediately hits a membrane, which is not photosensitive. There's a layer of ganglions, which are also not generally very photosensitive, there's another layer of stuff that presumably does important things. At the back of your retinas are the photoreceptors, the rods and cones. Light comes in and it works its way through these four layers of tissue, and it hits a photoreceptor. That photoreceptor gets excited, it sends out a signal to the ganglions, which then have to send it to the brain through the optic nerve, which is routed through a hole drilled in the center of your eye.

Our sensors are mounted backward, and there's a hole in the center of them, and that's ok because we patch it all up in software later. There's a couple of other problems too, you have about 120 million photoreceptors, you have 120 megapixels in each of your cameras. The bandwidth of the optic nerve is 10 megabits, which doesn't really work out if you think about it. It's like we're trying to stream this video over a pipe that's much smaller than Wi-Fi.

Our retinas do what you might do if you were tasked with that problem, they compress the data very aggressively. Each ganglion is connected to a path of about 100 photoreceptors, and so it's a receptive field. This is divided into a central disk and the surrounding region. When there's no light on the entire field, the ganglion doesn't fire, when the whole thing is illuminated, it fires weekly.

There's two kinds of ganglions in your eyes, they're not actually a different species, but your ganglions sort of self-organize into having these two different behaviors. One behavior, when the surround is illuminated and the center is dark, the cell fires rapidly. In the opposite situation, it fires not at all. The other half of the ganglions work in exactly the opposite way, they fire brightly when the center is bright and the surround is dark.

Taken together, the operation of these cells performs an edge detection filter. We're doing this processing even in our eyeballs, which lets us down-sample this image 100 times while retaining vitally important information, namely, where the boundaries of objects are. The signal proceeds through the brain, it hits optic asthma, where the data streams from your left and right eyes cross, which gives us 3D stereo vision.

It's processed by the thalamus, which does all kinds of multifactor signal processing. Amongst other things, it runs our eyes autofocus, which is something that you have which you maybe didn't think about. Every step of the signal pathway is performing a little bit of processing. It's extracting a bit of data, it's performing a bit of signal integration, and that's all before we get to the visual cortex, all the way around back here.

Our visual cortex is arranged into a stack of neuronal layers. The signal stays pretty spatially oriented through the visual cortex. There's some slice of tissue in the back of your brain that's responsible for pulling faces out of this chunk of the visual field. More or less, no neural network, whether biological or artificial, is entirely structured, there's always a little bit of slop. Each neuron in a layer has a receptive field, some chunk of the entire visual field that it's looking at.

Neurons in a given layer tend to respond to signals within their respective field in the same way. That operation, distributed over a layer of neurons, extracts certain features from the visual signal. First, simple features, like lines, and curves, and edges, and then more complex ones, like gradients, and surfaces, and objects, eyes, motion, and faces. It's no accident that we see the same behavior in Inception because convolutional neural networks were inspired by the design of our visual cortex.

Of course, our visual cortex is different from Inception and convolutional neural networks in many ways. Whereas Inception is a straight shot through, one pass from input to output, our visual cortex contains feedback loops, these pure middle neurons that connect deeper layers to earlier ones. Those feedback loops let the result of deeper layers inform the behavior of earlier layers. We might turn up the gain of edge detection, where later, we detected an object is.

This behavior lets our visual system adapt and focus, not optically, but intentionally. It gives us the ability to ruminate on visual input well before we've become consciously aware of it and proving our predictions over time. You know this feeling, you think you see one thing, and then you realize it's something else. These loop-back pyramidal cells in our visual cortex, they're covered in serotonin receptors. Different kinds of pyramidal cells respond to serotonin a little differently, but generally, they find it exciting. Don't we all? You might be familiar with serotonin from its starring role as the target of typical antidepressants, which are typically serotonin reuptake inhibitors. When your neurons release serotonin, they make it stick around longer, thereby treating depression. Some side effects may occur. Most serotonin in your body is actually located in your gut, where it controls bowel movement. It signals to your gut that it's got food in it, and it should go on and do what it does to food.

That seems to be what it signals throughout your body, resource availability. For animals with complex societies like ours, those resources can be very abstract social resources as well as energetic ones. That your pyramidal cells respond excitedly to serotonin suggests that we focus on that which we believe will nourish us. It's not correct as a blanket statement to say that pyramidal cells are excited by serotonin. In fact, there are different kinds of receptors, and their binding produces different effects.

5-HT1 receptors tend to be inhibitory, 5-HT3 receptors in the brain are associated with sensations of anxiety and queasiness. In the gut, they make it run backwards. Anti-nausea drugs are frequently 5-HT3 antagonists. There's another serotonin receptor, one that the pyramidal cells in your brain find particularly exciting. This is the 5-HT2A serotonin receptor, it is the primary target for every known psychedelic drug. It is what enables our brains to create psychedelic experiences.

Hypothetically, you go to a show, you eat a little piece of paper, and that paper makes its way down into your gut, where it dissolves, releasing molecules of lysergic acid diethylamide into your gut. LSD doesn't really bind to 5-HT3 receptors, if you feel butterflies in your stomach, it's probably just because you're excited for what's going to happen. What's going to happen is this, the LSD will diffuse into your blood, where it will have no trouble crossing the blood-brain barrier, because it's tiny but powerful, like you.

It'll diffuse deep into your brain, into your visual cortex, where it finds the pyramidal 5-HT2A receptor and locks into place. There, it'll stay bound for about 221 minutes, it's four hours, which is very long timescale of these things. They think that a couple of proteins snap over top and sort of trap the molecule inside. This would help explain why it's so very potent with typical doses about 1000 times smaller than most other drugs.

While it's riddling around in there, the molecule is stimulating a feedback loop in your visual cortex. It's sending the signal that says, "Pay attention, what you're looking at is important, it might be nourishing." The pattern finding machinery in your visual cortex starts to run in overtime and at different rates. In one moment, the pattern and a tapestry seem to extend to the world beyond it. Next, it's the trees that are growing and breathing, the perception of movement of visual hypothesis that's been allowed to grow wild.

With DeepDream, we asked, "What would excite some layer of the network?" Then we adjusted the input image in that direction. There's no comparable gradient ascent process in the biological psychedelic experience. That's because we're not looking at a source image, we are looking at that output of the network, we are the output of the network. The output of your visual cortex is a signal that carries visual perceptions, kind of proto-qualia, which will be integrated by other circuits in your brain into your next moment of conscious experience.

Our network Inception never gets that far, we never even run it all the way to the classification stage, we never ask it what it sees in all these, although we could. In fact, we could perform the amplification process on a final result rather than an intermediate one. Maybe we ask, what would it take for you to see this banana as a toaster? Or say, don't these skiers look like a dog? These are adversarial examples, images that have been tuned to give classifiers frank hallucinations, the confident belief that they're seeing something that just isn't there.

They're not completely wild, that sticker really does look like a toaster. Those skiers do look like a dog if you squint. Then you can see the head, you can see the body. A person might look at that, and if they're tired and far away and drunk, think for a moment that they're looking at a big dog. They probably wouldn't conclude that they're looking at a big dog.

The current property is of our visual cortex, not to mention the whole rest of our brain means that our sense of the world is stateful. It is a continuously refined hypothesis whose state is held by the state of our neurons. Laying the groundwork for capsule networks, Sarah Sabour, Nicholas Frosst, and Geoffrey Hinton writes, "A parse tree is carved out of a fixed multilayer neural network like a sculpture is carved from rock."

Our perceptions are a process of continuous refinement. Which may point the way towards more robust recognition architectures, recurrent convolutional neural networks that can ruminate upon images, making better classifications over time, or providing a signal that something is off about an input. There are adversarial examples for the human visual system after all, and we call them optical illusions. They feel pretty weird to look at.

In this image, we can feel our sensory interpretation of the scene flipping between three alternatives, a little box in front of a big one, a box in a corner, and a box missing one. In this Munker Illusion, there's something scintillating about the color of the dots, which are all the same and are all brown. If we design convolutional neural networks with recurrents, they could exhibit such behavior as well, which maybe it doesn't sound like such a good thing on the face fit.

Let's make our image classifier vacillating and uncertain, and then put them in charge of driving cars. We drive cars, our ability to hem and haw and reconsider our own perceptions at many levels gives our perceptual system tremendous robustness. Paradoxically, being able to second-guess ourselves allows us greater confidence in our predictions. We are doing science in every moment, the cells of our brains continuously reconsidering and refining, shifting hypotheses about the state of the world. That gives us the ability to adapt and operate within a pretty extreme range of conditions, even while we're tripping face, or while we're asleep.


Part two, dreams. These are not real people, these are the photos of fake celebrities jumped up by a generative adversarial network, a pair of networks which are particularly creative. The networks get better through continuous mutual refinement, and it works like this.

On the one side, we have the creator. This is a deep-learning network, not unlike Inception, but trained to run in reverse. This network, we feed with noise, literally, just a bunch of random numbers, it learns to generate images. How does it learn that mapping? It has no way to play the game, in technical parlance, it lacks a gradient, without another network, without an opponent, the adversary.

The adversary is also an image classifier, but it's trained on only two classes, real, and fake. Its job is to distinguish the creator's forgeries from true faces. We feed this network with the ground truth, with actual examples of celebrity faces, and the adversary learns, then we use those results to train the creator. If the creator makes a satisfying forgery, it's doing well. If its forgeries are detected, we backpropagate that failure so it can learn.

I should tell you that the technical terms for these networks are the generator and the discriminator. I changed the names because names are important and meaningless. They don't change the structure of the training methodology, which is that of a game. These two neural circuits are playing with each other, and that competition is inspiring. When we spar, our opponents create the tactical landscapes that we have to traverse, and we do the same for them.

Together, our movements ruminate on a space of possibilities that's much larger than any fixed training set. GANs can train remarkably well on a relatively small amount of training data. It seems quite likely that this process is useful for neural circuits of all kinds, though it does have some quirks. GANs are not especially great at global structure. This is Fallout cow, a cow with an extra body. Just as you may have spent the night wandering through a house that is your house, but with many extra rooms.

These networks aren't very good at counting either, this monkey has eight eyes because sometimes science goes too far. Do something for me, next time you think you're awake, which I think is now, count your fingers just to be sure. If you found that you have more or fewer than you expected, please try not to wake up just yet because we're not quite done. Another interesting thing about this training methodology is that the generator is being fed noise, a vector of noise, some random point in a very high dimensional space. It learns mapping from this latent space onto its generation target, in this case, faces. If we take a point in that space and we drag it around, we get this. This is also pretty trippy now. This resembles some of the things that someone who isn't me has seen on acid. This resembles the sorts of things that you may have seen in dreams that you have since forgotten.

I don't have a magic school bus voyage to take us on to understand why that is. I do have a theory, when we see a face, there's a bunch of neurons that light up and begin resonating the signal, which is the feeling of looking at that particular face. Taken together, all the neurons involved in face detection produce a vector embedding, a mapping from faces to positions in a high dimensional space. As we are dragging the generator's vector around here, we're also dragging around our own. It's just the novel and unsettling sensation.

That's a wild theory, but it's not entirely without neurocognitive precedent. Here, we have a rat in a cage, we've hooked up an electrode to a single neuron in the rat's brain. Those pink dots are the locations where it's firing if we speed this up, a pattern is going to start to emerge. This neuron is a grid cell, it's so named because the centers of its firing fields form a triangular grid. There are lots of different grid cells in your brain, in rat's brains, that's where we've detected this experimentally.

Each of those grid cells aligns to a slightly different grid. These collect data from your visual system, from head direction cells which encode a quaternion for your head. Together, they construct an encoding of our position in 2D Euclidean space. This operates even in our sleep, if earlier, you discover that you're dreaming, and you want to see the end of this talk, but you're having trouble staying in the dream, our neural knots recommend spinning around, which detaches your perceived body, the one with 12 fingers and several extra bedrooms, from your physical body, which is right now lying in bed.

This positioning system is something which, on some level, you always knew existed. After all, you know where you are in space, you have a sense of space as you move through it. It's likely, even necessary if we believe that cognition is computation, that our qualitative sense of position has a neurocognitive precursor, some signal in the web that tells us where we're at, in many senses of the word.

Sticks and Stones

Part three, sticks and stones. They say you can't tickle yourself because you know it's coming. Specifically, when your brain sends an action command to your muscles, it's called an efference. When an efference is sent, your brain makes a copy, makes a copy of sounds very planned, so engineered. Your brain is this big, messy, evolved signal processing mesh. Another way to think of efference copies is as reflections.

We take the efference and we send it out to our peripheral nerves, where it will presumably make some muscles contract. Meanwhile, from the efference copy, we predict how our body's state will change. We use that to update our brains model of our body's state. If we didn't do this, then we would have to wait for the sensory data to come back to tell us what happened, like where is our hand right now, where are my fingers right now, then we would face the same problem as trying to play a twitchy video game over a crap connection. Signals take 10 milliseconds to go from our brain to our periphery, and another 10 milliseconds to travel back. It's not that low latency or high bandwidth of this body of ours, at least not neurologically. To enable smooth, coordinated movements, our brain has to make predictions.

Life goes on, but in the moment, we're going to have a problem, because we will still receive sense data from our nerves. If we updated our models again, then they would fall out of sync. We attenuate the signal in order to keep the internal model in sync. This attenuation applies even to our sense of touch when that touch is an expected consequence of our own movement. That's a pretty complicated model, aspects of that forward model are likely distributed throughout our brain, but there's one place that is particularly important in maintaining it, the cerebellum.

The cerebellum is quite special, it contains half of the neurons in our nervous system. All action commands from the brain to the body route through it, and all sensations from the body to the brain as well. It's long been recognized as vitally important to motor coordination, like this. People with cerebellum damage have difficulty performing that action smoothly. With cerebellum damage, our movements become jerky and laggy.

It's theorized that the cerebellum acts as a Smith predictor, our brain's controller for our latency distant bodies, able to estimate the body's current state, integrate sensory feedback to update that model, and decompose gross actions generated elsewhere in our brain into a fine-tuned continuously varying control signal. Once you've got it, such a thing has many uses. There's a growing body of evidence implicating the cerebellum in language, which makes sense. Utterance is a kind of movement, and language, just gesticulating wildly, is not limited to utterance. The work of moving words is not so different from the work of moving the body. They're both transformations from the space of internal states, from efference and ideas, to the space of world coordinates and external sense impressions and other people, and back again.

What happens when this predictor encounter is a problem when there is an irreconcilable discontinuity in the model? These things are not so different, they're visceral, or guttural, they shake our bones and jokes, too, humor is shaped a little like trauma. They're both shattering, the illuminations of discontinuities, paradoxes, things which cannot be, and yet, somehow are. Things which we must revisit, again and again, turning water or smoothing the edges of cutting stone as our brains try to make sense of a world that resists it.

These last few months have been rather difficult for me, the world is heavy, my heart grows heavy. We build camps for children to die, and we live in drowning cities built by slaves. Meanwhile, I spend my days trying to make numbers into bigger numbers. Moving in this world requires changing it, and changing the world requires unbelievable strength, strength that sometimes I think I don't have. There are days when I opened my email and every subject line is a stone, and I think, "I should put these on to my dress and walk into the sea," but I don't, because I remind myself, because I remember, that I am a process of creation. We are stories telling ourselves, a sea understanding itself, our turning waves creating every moment of exquisite joy and exquisite agony, and everything else.

It's you, it's all you, you are everything. Everything you have ever seen, every place you have ever been to. Every song you have ever sung. Every god you have ever prayed to, every person you have ever loved. The boundaries between you and them, and the sea, and the stars are all in your head.


See more presentations with transcripts


Recorded at:

Jul 16, 2019