Motion perception and sampling in space and time
The next property of real objects that we want to be able to
recover from images involves introducing a feature of your experience that we
haven’t considered before: change over
time. Besides having reflectance properties that are independent of the
light falling on them and occupying positions in 3D space, objects also change
the way they look over time. That is to say, objects can move. One of those objects that can move is you, which is another
way that the images you receive on your retina change from moment to moment.
We’d probably like to know something about the real movements that are
happening out there in the world that give rise to the changing images that we
measure with our eyes and our brain, so how do we interpret change over time in
a way that allows us to make guesses about motion?
To motivate this discussion, I want to start thinking about
this problem by considering a simple model system that represents a basic
attempt to measure something about motion by interpreting changes in images
across time and space. Specifically, I want to start by considering how the
device below measures whether or not you’re speeding on the highway.
Figure 1 - To measure your speed on the highway, a camera like this takes two pictures to calculate position change over time.
Speed depends on two different things: (1) How much distance
did you travel? (2) How long did it take you to do that? Once we know both of
those things, we can divide the first number by the second and calculate your
speed in units miles per hour. To measure both of those things, we have to make
sure we’re able to measure position at different points in space and that we’re
able to make measurements that distinguish between different times. The device
you see on the side of the road does this by taking pictures, two of them, that
carry information about space and time. Specifically, as long as we can measure
where your car is in each picture, we can calculate how much distance you
covered. If we also know how much time passed between those two pictures, then
we’re done! We divide the two numbers and the highway patrol decides whether or
not to send a ticket to your house (they probably took a picture of your
license plate, too). This device is a good starting point for thinking about
how your visual system might measure motion because it contains the most basic
pieces we need: Measurements that distinguish between different positions, and
a way to keep track of time. To continue thinking about how your brain might
make a calculation like this to infer motion from changes in images, I want to
introduce an abstract version of this device that we’re going to try and use to
detect motion using some of the principles we’ve discussed previously regarding
how to combine information that comes from different cells in your nervous
system. Remember, if the device on the highway has to take two pictures, that
means we need to think about how to compare at least two different pieces of
information to arrive at a response that reflects motion in the environment. So
far, one of the best tools we’ve seen to do this is a logic gate that only produces an output response if the input
responses it listens to meet a specific rule (e.g. AND, OR, XOR). We know we
need to take measurements that reflect changes in position, so I’m going to
assert that we better have at least two input cells that have receptive fields
in different parts of the visual field. This way we can measure whether
something is at position A and we can measure whether something is at position B. We also need some way to combine these with a logic gate, and I’m going to
further assert that AND is a good rule to consider using. We need to know if an
object showed up at A and B while it moved, so surely our gate should only
fire if both inputs are active. This gives us the diagram below:
We have a big problem, though. The problem is that we
haven’t thought about time yet. We
know that an object that’s moving left-to-right will show up at A and at B,
but not at the same time. That’s a
problem because that AND gate will only fire if the inputs from those two
sensors are simultaneously active! A moving object would start at A, making
that cell and only that cell produce
a response. Next, it would move to B, making that cell and only that cell produce a response. At no point are both cells
active, so at no point does the logic gate produce an output. To improve
matters, we need to adjust our machine to take into account the passage of time
between the object arriving at locations A and B. One way to do that is to
introduce a new bit of machinery into our diagram (see below): A time-delay
module. The little box I’ve added on the left side of the diagram represents a
sort of waiting area for the signal coming from A. As soon as A makes a
response, the signal arrives at this box, where it has to wait until ‘t’
seconds have passed. After that, it gets to move on to the logic gate. What
consequences does this have for our device? Now if an object takes ‘t’ seconds
to move from A to B, the two input signals will arrive at the logic gate at the
same time! B will get there as soon as the object arrives there, and the
signal from A will arrive simultaneously because it was waiting in the
time-delay module during the ‘t’ seconds it took for the object to get from A to B. This device, which is called a Reichardt
detector, will now produce a signal if an object moves to the right from A to B in ‘t’ seconds, meaning we can use it’s activity as a signal that
something moved.
There are several observations I want to make about this detector.
First, you might notice that once I pick the
locations for A and B, I could put the time-delay on either side to
measure motion from A to B or motion in the opposite direction from B to A.
In fact, I could build a more complicated compound detector that involves two
delays, two AND gates, and a means of comparing which AND gate is responding
more to measure motion in either of these directions with one machine. This is
an important hint that perceiving motion in one direction may be closely linked
with seeing motion in the opposite direction.
The other thing I’d like to point out is that the behavior
of this machine depends directly on how we chose to measure space and time.
Specifically, we’re doing something with this detector called sampling. Rather than measure the entire
continuous change associated with a moving object, we’re using something like
snapshots of its behavior to try and guess what’s really going on. To be more
precise, the cells at A and B are only capable of measuring the objects
position at two spots in the visual field, which means we are only sampling, or measuring a small portion
of, the different locations the objects occupies. Similarly, the time-delay in
our extra module represents a choice of how to sample in time: We’ll only
compare the two measurements we take across points in time that are ‘t’ seconds
apart. The machine still works, but an important thing to remember is that
whenever we sample a more complicated event, we are intentionally ignoring
things that happen outside of our samples. The object could do all kinds of
stuff between one moment and the next (or one position and the next) and we’d
never notice because we’re not measuring that stuff. Critically, this means
that sampling introduces ambiguity.
Because we’re not measuring the entire continuous motion of the objects we’re
seeing, different real movements out in the world can look the same to us
through these kinds of motion detectors. In turn, this means we can make a
number of systematic mistakes in trying to infer motion from image change. In
the remainder of this post, we’ll discuss two specific kinds of error that your
visual system makes, and introduce strategies to help mitigate these errors as
much as possible.
Sampling in space and
the aperture problem.
We’ll begin by thinking more carefully about what kinds of
things we might see if we’re looking at a moving object using a cell (like A)
that’s “looking at” a particular spot of the visual world with its receptive
field. Specifically, let’s imagine that there’s a diamond moving from
left-to-right out in the world, but we only get to see if by looking at it
through the circle in black below. We’ll call this circle, which represents the
receptive field of a single cell, an aperture.
As the diamond moves left-to-right, the edge we see through that RF changes
position systematically from t1 to t3. So far, so good, right?
Figure 5 - As the diamond really moves from left to right, we'd see the edge in the purple circle change in appearance at Time 1, 2, and 3.
Unfortunately, we have a big problem. Suppose that diamond
wasn’t really moving from left-to-right, but was instead moving upwards. What
will we see through that aperture? The
same thing. This is really bad because it means that the same change in
images can be a consequence of different real-world movements. In turn, that
means that we can’t use the images we see through one aperture to make a good
guess about what direction an object was really moving in!
This problem, which is called the aperture problem, has a few different solutions that help save us a
little bit. The first one is a fairly simple hack that helps a bit, but can
lead to some different errors in motion perception. Imagine that you weren’t
stuck with those apertures on the edges of the diamond, but instead got to look
through an aperture where one of the diamond’s corners was sitting. Now what
would you see at different points in time? In this case, you could definitely
follow where the corner went from one moment to the next, meaning that you
wouldn’t be uncertain about what the real motion of the object was. The
aperture problem in its worst form relies on ambiguity in matching parts of the
object across different points in time, so if you can look at something like a
corner or another unambiguous part of the object, you can save yourself some
trouble. However, there’s ways that this can get you into some trouble, too.
Consider the barber pole in the image below. In the real-world, these patterns
actually spin around a vertical axis so that the red-white-and blue stripes are
moving horizontally. You probably see this pattern as moving upwards or
downwards, however. What’s happening? Your visual system is tracking what it
thinks are meaningful corners in the image where the stripes meet the vertical
boundary of the pole. As the pattern actually moves horizontally, the position
of those corners changes vertically. This same thing is happening at the top
and bottom edges of the pole, where the corners appear to change horizontally,
but the fact that there are more vertical corners than horizontal corners means
that the vertical motion signaled by the majority gets to win. Tracking
unambiguous features thus isn’t a bad idea, but it’s one that can still lead to
mistakes.
Another way to try and improve your situation with regard to
the aperture problem is to expand your measurements: Don’t just use one
aperture, use two! This helps because while the change you observe through one
aperture could have been caused by many different real motions, most of those
changes won’t be consistent with what you’d see through a second aperture.
Here’s a recipe for using this insight to make a better guess about the way an
object was really moving based on what you’d see through two apertures:
1) First, imagine that we’re still looking at the diamond, and
that it’s really moving rightwards.
2) Next, imagine that you get to see what’s happening through
two apertures that are positioned on different edges of the diamond.
3) Pick one of these to start with and place a dot somewhere on
the first edge that you saw. Where could this dot have actually moved to yield
the second edge? Draw arrows from the dot to each of those places that it could
have moved to, and you’ll notice that they all end on an imaginary line that we
can fill in with dashes. This is the constraint
line provided by this aperture – we don’t know which of these arrows was
the true motion of the dot, but all the possibilities end on this dashed line.
4) Repeat this process for the second edge. You’ll get a new
dashed line that should point in a different direction. Figures 8 and 9 show this process carried out with to yield the two dashed lines.
Now, line up the two dots with each other along with the two
dashed lines. The dashed lines should cross at some point, and this point
represents the only way that each dot could have moved to make you see the
images in each aperture! Draw an arrow from the dot to that intersection and
now you know how the diamond really moved!
Figure 10 - Lining up and superimposing the dots from each aperture allows us to see where the dashed lines cross. The arrow between the dot and the intersection of the dashed lines is the only motion that explains what happened in both apertures.
This method is called the intersection of constraints method for solving the aperture
problem, and it’s also not too bad. Can it still lead to mistakes? Oh, sure. As
usual when we’re trying to make a guess about the real world from limited image
data, we have to make assumptions to use these procedures to make our guesses.
If those assumptions are violated, then we’re toast. However, a lot of the time
these assumptions are true, so these
methods tend to work. For now, we have some idea how to sample motion in space
and get to a good estimate of what really happened out in the world based on
our limited set of measurements.
Sampling in time and
the problem of aliasing.
Remember that we’re not just sampling in space, we’re also
sampling in time when we try to detect motion. What consequences does this have
for the way we perceive movement? Really we’re going to run into the same kind
of problem over again: Lots of different motions could have led to the same
things we measure with our different samples in time. What does this mean in
the temporal domain, though? What kinds of mistakes can we make, and how can we
try to fix them? To motivate this discussion, I want you to think about
sampling in time as though we’re constructing a sort of flip-book of motion.
Each time we take a sample in time, we’re really adding one page to the
flip-book. Our visual system gets to flip through the pages and decide what
motion we’re seeing. How does it do this? By filling in, or interpolating, what’s happening between
pages in the simplest way possible. But what do I mean by simple?
Figure 11 - In a flipbook, you have a picture for each moment of time that depicts some more complex event. Each page is a sample of that event in time, and your visual system has to work out what happened in between to estimate the motion associated with that event.
Consider a very boring flipbook that shows a wheel turning.
If the wheel is really turning clockwise, the first page of our flipbook might
show it in the position indicated by the blue lines below, and the second page
might show it in the position indicated by the purple lines. What kind of
motion is the simplest explanation for how the wheel got from the blue image to
the purple one?
Figure 12 - If our flip book of this spinning wheel only had images of these two positions of the wheel spokes, you'd probably guess that the real motion of the wheel was slow and clockwise and you'd be right! Lots of other things could have happened though...
I hope you’ll agree that a small clockwise turn would do the
trick. But wait…a lot of other things could have also happened: (1) The wheel
could have turned all the way around, plus that little bit more, between image
1 and image 2. This would lead to the same two pictures if it did all this
spinning between our two photographs. (2) Wait, the wheel has four spokes!
Maybe it didn’t turn all the way around – maybe instead it turned 180 degrees
clockwise (plus a little bit). That would also lead to the same two images. (3)
Hang on – if it could turn all the way around between our two images, could it
turn all the way around twice? That would also
lead to the same two pictures.
This is rapidly getting bad – these two images
could have happened because of any of these things, so how do we know which one
happened? One rule of thumb your visual system seems to use is that it picks
the slowest motion that could have
led to successive pages in the flip-book. This spares you from having to
consider an infinite number of increasingly rapid spins that could have placed
the wheel in these two positions. However, it also complicates matters a bit if
we consider another real-world motion of the wheel depicted below:
Figure 13 - A wheel that's really spinning quickly clockwise could still give you these two images of the spokes in your 'flip book.' The slowest motion that explains this change is a slow rotation in the wrong direction! Sampling in time means you can miss what really happened and make incorrect guesses as a result.
In this case, the wheel is still spinning clockwise, but it
doesn’t quite get all the way around between picture 1 and picture 2. What does
that mean? Well, first you see the spokes at the blue positions and then you
see them at the purple positions. What’s the slowest motion that would get you
there? A small counter-clockwise turn!
This means that a simple rule for resolving an ambiguous movement between two
samples in time (Go with the slowest motion) leads you to see a fast
counter-clockwise spin as a slower counter-clockwise spin! This mistake is an
example of what we call aliasing and
it happens all the time when you watch movies. Any movie is essentially a long
flip-book with fairly small gaps in time between samples of the action, but
that still means you can end up with situations like the one depicted above.
Next time you watch a good car chase onscreen, keep an eye on the wheels and
see if they’re spinning the right way (or look like they’re holding still) – if
you see something weird, it’s because the motion signal has been a victim of aliasing.
How do you fix this one? Truthfully, you don’t. All you can
do is sample more frequently to eliminate errors that can occur due to action
happening between the samples. Any rate of sampling in time leaves room for
something to happen in the gaps, though, so you’ll always be making some
mistakes – you’ll just change which ones.
To sum this all up, motion is another example of us trying
to guess at some property of the world by making some limited measurements on
images. Those measurements make it possible for us to do a lot with a little,
but subject to some important constraints to understand. We get a lot of things
about motion right, but at the expense of getting some specific things wrong.
The key is to remember what these different computations can and can’t do, and
to identify the way they all embody some set of assumptions about the relationship
between the world and images of the world.
Comments
Post a Comment