Logic gates, complex cells and perceptual constancy

Logic gates, complex cells and perceptual constancy

In the last post, we demonstrated how we could use the information from a group of V1 cells (a population code) to make a good guess about the orientation of a line or edge that those cells were responding to. This allowed us to solve some problems related to how to measure values for some image feature (in this case, orientation), using a relatively small number of sensors. The use of a population code allowed us to measure orientation fairly precisely even if we only had a few cells that had a few different preferred orientations, and it also allowed us to make predictions about what would happen if some of those cells changed the way they were responding to a pattern due to adaptation. At this point, I’d say we have a decent grasp of how to measure some simple spatial features in images: We know how to encode wavelength information with photoreceptors, we know how to measure local increments and decrements of light with cells in the RGC layer and the LGN, and we know how to measure contrast edges, lines, and bars with simple cells in V1. So what next?

To start thinking about what comes next, I’d like to introduce you to a bit of a problem that we might have if we go trying to use these tools for spatial vision to recognize something in our visual world. To get us started thinking about this particular problem, I’d like to show you two completely different images:

Figure 1 - Two images that are very different from one another. Really.

I know what you’re thinking: These images do not look completely different. In fact, they look very much the same – specifically, they look like two different pictures of the same person (drawn from AT&T’s ORL face database – link here). Why am I saying that they’re completely different if they depict the same person? Let’s take a look at the kind of representation your V1 cells have of these two images. Remember, this means that we’re measuring the orientation of edges in the picture and end up producing far less activity related to the parts of the image where intensity values aren’t changing much:

Figure 2 - The same two pictures as a population of V1 simple cells might "see" them.

This is a lot messier than the first two pictures, but you probably still think they look a lot alike. You can see the outline of the guy’s head, the outline of his glasses, so you still might think they look a good bit alike. But now, let’s take a look at how those two sets of edges line up with each other if we superimpose them:

Figure 3 - Similar-looking images can nonetheless lead two very different response from V1 simple cells.

Even though this is the same person in both images, you can see that there’s a lot of disagreement between the two images in terms of where the different edges are and what orientation they are. If we’re thinking in terms of V1 cells and their responses, we have to conclude that the population of simple cells that can “see” this image in their receptive fields will respond very differently to these pictures. Simple cells in V1 are sensitive to the position, thickness and orientation of edges in their receptive fields, so the fact that these two images differ so much in terms of these qualities means that the cells responding to the two pictures will do very different things.

So what’s the problem? The problem is that at some point, you don’t want to do different things when you see these pictures. At some point, you want to look at both of them and produce the same response – something like “Dave”or “That guy from the train with the glasses.” But how? If all of your cells are doing different things when images of the same thing look this different, how can you end up generating the same label for those images? This is the problem of perceptual constancy: Broadly speaking, how do you maintain constant responses to objects, surfaces, etc. when the raw appearance of those items can change so much? This is a very big topic in visual perception research, and we’re going to talk about it again in some other contexts soon. For now, we’re going to look at one small piece of the solution that’s implemented in your primary visual cortex, and talk about how it could be put together computationally.

So what can we do to try and address this problem? Here’s an idea: If the problem depends on the fact that simple cells in V1 change what they’re doing when the position or orientation of an edge changes within their receptive field, maybe we should imagine a cell that doesn’t change when those properties change. For example, if we had a cell that preferred vertically-oriented lines, it could be useful if we could arrange things so that cell kept firing if a vertical line was in different places in the cell’s receptive field. That constant response to a vertical line that changed position, might help us keep responding the same way to the two different images of the same guy depicted above – lines could appear at different spots and still elicit a consistent response from a group of cells that behaved this way. OK, this sounds nice, but is this a thing that your brain could do? Check out the video at the following link and come back when you’re done:

https://www.youtube.com/watch?v=UU2esxycMAw

Neat, huh? That was a video of a complex cell in primary visual cortex that exhibits exactly the behavior that we wanted: A vertical line at lots of different positions led to a consistent response from the cell. Now that we know these exist, the next question to think about is how they exist. How do you get a cell that behaves that way? To answer that question, I need to introduce you to a different way of thinking about how to combine the responses from groups of cells to do something new.

I want to motivate this by thinking a little bit about how we got to the simple cells we measured in V1 from the measurements that came before. I haven’t really emphasized this very much so far, but what we’re really doing as we move through the visual system is thinking about how we measure some information from the world and transform it in different ways as we move from the eye into the brain. Light spectra become photoreceptor responses, which become LGN responses, which become simple cell responses. Each stage (so far) depends on what came before. But what are those connections like? How do you turn the local spots of light and dark contrast that the LGN measures into something like a cell that prefers a line with a specific orientation? One way to think about doing this is to imagine that cells in an earlier stage send their responses on to a cell at a later stage, and that this later cell has a rule for deciding what it’s going to do based on what all the incoming responses are. That rule can be described in terms of something called a logic gate. Specifically, a logic gate is a set of rules for telling us how to produce an output response (say, what a V1 cell will do) based on what a set of input responses were (say, the responses of a bunch of LGN cells connected to our V1 cell). In general, a logic gate can be depicted with a simple diagram that includes the inputs and the outputs, and the rules for producing an output based on the inputs can be written down in a table. Below, I’m showing you a particularly simple logic gate called an AND gate:

Figure 4 - A logic gate provides a rule for combining input responses to produce an output response. This logic gate is an AND gate because both A and B have to be responding for the output to respond.

I hope it’s fairly clear why this is called an AND gate: The only case where the output cell produces a response is when the first input and the second input are also responding. If either one of those input cells isn’t doing anything, then the output cell won’t do anything either. Why is this a neat way to think about how a V1 cell is put together out of LGN cells? Imagine that each of the inputs to an AND gate is a single LGN cell with a receptive field at a different spot in the visual field. Further, imagine that the output is a V1 cell (See figure). If all of those LGN cells are producing a response, it’s a decent bet that there’s something like a line or edge at that location with a specific orientation. If one of them is missing, then it’s likely to be true that there’s an edge there. The AND rule formalizes this reasoning by saying that the V1 cell will only start responding if all of those input cells are doing something, which means we’re sort of building the oriented line that the V1 cell prefers out of local spots of light that these LGN cells prefer. This little logic gate allows us to “wire up” something more complicated out of the responses of a group of simpler cells.

Figure 5 - By combining the responses of LGN on-center cells at different positions with an AND gate, we can wire up a V1 simple cell that has a specific orientation preference.

So what about our friend the complex cell? How do we build that kind of response of out of simpler responses that we know about? Because this cell responds to a preferred orientation (vertical lines), I’m going to suggest that the simpler responses we want to use are going to be V1 simple cells that also prefer vertically-oriented lines. But how should we combine these? An AND gate won’t do the trick: We don’t want a cell that only fires if there are lots of vertical lines. We need a different rule for deciding what to do with our output based on our inputs. Specifically, we need something like the rule in the table below:

Figure 6 - An OR gate implements a different rule for combining inputs to get outputs: If any of the inputs are responding, then the output will respond.

This rule is called an OR gate, and you can see it’s a good bit different from AND. Specifically, it ends up producing responses under some different circumstances: While AND only led to a response if all of the inputs were responding, OR will lead to a response if any one of the inputs is responding. How does this help us? Imagine that the inputs are all V1 simple cells that prefer vertically-oriented lines but at different positions in the visual field. An OR rule for combining these responses will mean that we’re going to produce a response if the line is at any of the positions that will make just one of the V1 cells fire! That’s the rule we need to turn a group of “fragile” V1 cells that change what they’re doing as position changes into a single complex cell that has some amount of constancy for the position of a vertically-oriented line.

Figure 7 - Combining V1 simple cells with vertical orientation preferences at different positions via an OR gate gives us a complex cell that can respond to that vertical line at multiple positions. This buys us some perceptual constancy for edge position.

Now you may have noticed a small problem with that table. What happens when all of the simple cells are firing? With an OR rule, the complex cell will fire, too, which means that this complex cell will be active when a single line is at any or all of the positions we’re including in the inputs. That’s a little weird. We’d like to tell the difference between one line and three lines. We can fix this, though, with a slightly different logic gate called XOR or exclusive OR:

Figure 8 - An XOR gate combines inputs in a slightly different way than an OR gate: Now two active inputs doesn't produce an output response. This helps us distinguish between one line at different positions and multiple lines that are present in the image all at once.

You can see here that XOR only leads to a response when exactly one of the inputs is responding, but doesn’t do anything if more or fewer inputs are active.

So this is one step towards solving a complex problem – logic gates provide a way to combine information from different cells in some neat ways to create different kinds of responses out of simpler pieces. We can even build logic gates like these out of simple electronic switches and LEDs to see that these are real rules we can use to change the electrical activity in simple components based on what their inputs are doing. We’re far from done with perceptual constancy, however – as we keep going, we’ll see that there are many cases where we’re going to have to think about other ways to keep some measurements stable when images change. More on that in a new (and old) domain in our next few posts.

Seeing and Perceiving

Search This Blog

Logic gates, complex cells and perceptual constancy

Comments

Post a Comment

Popular posts from this blog

Monocular cues for depth perception

What is Light?

Observing the retina (and what it can do)