Logic gates, complex
cells and perceptual constancy
In the last post, we demonstrated how we could use the
information from a group of V1 cells (a population
code) to make a good guess about the orientation of a line or edge that
those cells were responding to. This allowed us to solve some problems related
to how to measure values for some image feature (in this case, orientation),
using a relatively small number of sensors. The use of a population code
allowed us to measure orientation fairly precisely even if we only had a few
cells that had a few different preferred orientations, and it also allowed us
to make predictions about what would happen if some of those cells changed the
way they were responding to a pattern due to adaptation. At this point, I’d say we have a decent grasp of how to
measure some simple spatial features in images: We know how to encode
wavelength information with photoreceptors, we know how to measure local
increments and decrements of light with cells in the RGC layer and the LGN, and
we know how to measure contrast edges, lines, and bars with simple cells in V1.
So what next?
To start thinking about what comes next, I’d like to
introduce you to a bit of a problem that we might have if we go trying to use
these tools for spatial vision to recognize something in our visual world. To
get us started thinking about this particular problem, I’d like to show you two
completely different images:
Figure 1 - Two images that are very different from one another. Really.
I know what you’re thinking: These images do not look completely different. In fact, they look very much the same – specifically, they look like two different pictures of the same person (drawn from AT&T’s ORL face database – link here). Why am I saying that they’re completely different if they depict the same person? Let’s take a look at the kind of representation your V1 cells have of these two images. Remember, this means that we’re measuring the orientation of edges in the picture and end up producing far less activity related to the parts of the image where intensity values aren’t changing much:
Figure 2 - The same two pictures as a population of V1 simple cells might "see" them.
This is a lot messier than the first two pictures, but you probably still think they look a lot alike. You can see the outline of the guy’s head, the outline of his glasses, so you still might think they look a good bit alike. But now, let’s take a look at how those two sets of edges line up with each other if we superimpose them:
Figure 3 - Similar-looking images can nonetheless lead two very different response from V1 simple cells.
Even though this is the same person in both images, you can
see that there’s a lot of disagreement between the two images in terms of where
the different edges are and what orientation they are. If we’re thinking in
terms of V1 cells and their responses, we have to conclude that the population
of simple cells that can “see” this image in their receptive fields will
respond very differently to these pictures. Simple cells in V1 are sensitive to
the position, thickness and orientation of edges in their receptive fields, so
the fact that these two images differ so much in terms of these qualities means
that the cells responding to the two pictures will do very different things.
So what’s the problem? The problem is that at some point,
you don’t want to do different things
when you see these pictures. At some point, you want to look at both of them
and produce the same response – something like “Dave”or “That guy from the
train with the glasses.” But how? If all of your cells are doing different
things when images of the same thing look this different, how can you end up
generating the same label for those images? This is the problem of perceptual constancy: Broadly speaking,
how do you maintain constant responses to objects, surfaces, etc. when the raw
appearance of those items can change so much? This is a very big topic in
visual perception research, and we’re going to talk about it again in some
other contexts soon. For now, we’re going to look at one small piece of the
solution that’s implemented in your primary visual cortex, and talk about how
it could be put together computationally.
So what can we do to try and address this problem? Here’s an
idea: If the problem depends on the fact that simple cells in V1 change what
they’re doing when the position or orientation of an edge changes within their
receptive field, maybe we should imagine a cell that doesn’t change when
those properties change. For example, if we had a cell that preferred
vertically-oriented lines, it could be useful if we could arrange things so
that cell kept firing if a vertical line was in different places in the cell’s
receptive field. That constant response to a vertical line that changed
position, might help us keep responding the same way to the two different
images of the same guy depicted above – lines could appear at different spots
and still elicit a consistent response from a group of cells that behaved this
way. OK, this sounds nice, but is this a thing that your brain could do? Check
out the video at the following link and come back when you’re done:
Neat, huh? That was a video of a complex cell in primary visual cortex that exhibits exactly the
behavior that we wanted: A vertical line at lots of different positions led to
a consistent response from the cell. Now that we know these exist, the next
question to think about is how they
exist. How do you get a cell that behaves that way? To answer that question, I
need to introduce you to a different way of thinking about how to combine the
responses from groups of cells to do something new.
I want to motivate this by thinking a little bit about how
we got to the simple cells we measured in V1 from the measurements that came
before. I haven’t really emphasized this very much so far, but what we’re
really doing as we move through the visual system is thinking about how we
measure some information from the world and transform it in different ways as
we move from the eye into the brain.
Light spectra become photoreceptor responses, which become LGN
responses, which become simple cell responses. Each stage (so far) depends on
what came before. But what are those connections like? How do you turn the
local spots of light and dark contrast that the LGN measures into something
like a cell that prefers a line with a specific orientation? One way to think
about doing this is to imagine that cells in an earlier stage send their responses
on to a cell at a later stage, and that this later cell has a rule for deciding
what it’s going to do based on what all the incoming responses are. That rule
can be described in terms of something called a logic gate. Specifically, a logic
gate is a set of rules for telling us how to produce an output response
(say, what a V1 cell will do) based on what a set of input responses were (say,
the responses of a bunch of LGN cells connected to our V1 cell). In general, a
logic gate can be depicted with a simple diagram that includes the inputs and
the outputs, and the rules for producing an output based on the inputs can be
written down in a table. Below, I’m showing you a particularly simple logic
gate called an AND gate:
Figure 4 - A logic gate provides a rule for combining input responses to produce an output response. This logic gate is an AND gate because both A and B have to be responding for the output to respond.
I hope it’s fairly clear why this is called an AND gate: The
only case where the output cell produces a response is when the first input and
the second input are also responding. If either one of those input cells isn’t
doing anything, then the output cell won’t do anything either. Why is this a
neat way to think about how a V1 cell is put together out of LGN cells? Imagine
that each of the inputs to an AND gate is a single LGN cell with a receptive
field at a different spot in the visual field. Further, imagine that the output
is a V1 cell (See figure). If all of those LGN cells are producing a response,
it’s a decent bet that there’s something like a line or edge at that location
with a specific orientation. If one of them is missing, then it’s likely to be
true that there’s an edge there. The AND rule formalizes this reasoning by
saying that the V1 cell will only start responding if all of those input cells
are doing something, which means we’re sort of building the oriented line that
the V1 cell prefers out of local spots of light that these LGN cells prefer.
This little logic gate allows us to “wire up” something more complicated out of
the responses of a group of simpler cells.
Figure 5 - By combining the responses of LGN on-center cells at different positions with an AND gate, we can wire up a V1 simple cell that has a specific orientation preference.
So what about our friend the complex cell? How do we build
that kind of response of out of simpler responses that we know about? Because
this cell responds to a preferred orientation (vertical lines), I’m going to
suggest that the simpler responses we want to use are going to be V1 simple
cells that also prefer vertically-oriented lines. But how should we combine
these? An AND gate won’t do the trick: We don’t want a cell that only fires if
there are lots of vertical lines. We need a different rule for deciding what to
do with our output based on our inputs. Specifically, we need something like
the rule in the table below:
Figure 6 - An OR gate implements a different rule for combining inputs to get outputs: If any of the inputs are responding, then the output will respond.
This rule is called an OR gate, and you can see it’s a good
bit different from AND. Specifically, it ends up producing responses under some
different circumstances: While AND only led to a response if all of the inputs
were responding, OR will lead to a response if any one of the inputs is
responding. How does this help us? Imagine that the inputs are all V1 simple
cells that prefer vertically-oriented lines but
at different positions in the visual field. An OR rule for combining these
responses will mean that we’re going to produce a response if the line is at
any of the positions that will make just one of the V1 cells fire! That’s the
rule we need to turn a group of “fragile” V1 cells that change what they’re
doing as position changes into a single complex cell that has some amount of
constancy for the position of a vertically-oriented line.
Figure 7 - Combining V1 simple cells with vertical orientation preferences at different positions via an OR gate gives us a complex cell that can respond to that vertical line at multiple positions. This buys us some perceptual constancy for edge position.
Now you may have noticed a small problem with that table.
What happens when all of the simple cells are firing? With an OR rule, the
complex cell will fire, too, which means that this complex cell will be active
when a single line is at any or all of the positions we’re including in
the inputs. That’s a little weird. We’d like to tell the difference between one
line and three lines. We can fix this, though, with a slightly different logic
gate called XOR or exclusive OR:
Figure 8 - An XOR gate combines inputs in a slightly different way than an OR gate: Now two active inputs doesn't produce an output response. This helps us distinguish between one line at different positions and multiple lines that are present in the image all at once.
You can see here that XOR only leads to a response when
exactly one of the inputs is responding, but doesn’t do anything if more or
fewer inputs are active.
So this is one step towards solving a complex problem –
logic gates provide a way to combine information from different cells in some
neat ways to create different kinds of responses out of simpler pieces. We can
even build logic gates like these out of simple electronic switches and LEDs to
see that these are real rules we can use to change the electrical activity in
simple components based on what their inputs are doing. We’re far from done
with perceptual constancy, however – as we keep going, we’ll see that there are
many cases where we’re going to have to think about other ways to keep some measurements stable
when images change. More on that in a new (and old) domain in our next few
posts.
Comments
Post a Comment