Skip to main content

Motion perception and sampling in space and time

Motion perception and sampling in space and time
The next property of real objects that we want to be able to recover from images involves introducing a feature of your experience that we haven’t considered before: change over time. Besides having reflectance properties that are independent of the light falling on them and occupying positions in 3D space, objects also change the way they look over time. That is to say, objects can move. One of those objects that can move is you, which is another way that the images you receive on your retina change from moment to moment. We’d probably like to know something about the real movements that are happening out there in the world that give rise to the changing images that we measure with our eyes and our brain, so how do we interpret change over time in a way that allows us to make guesses about motion?

To motivate this discussion, I want to start thinking about this problem by considering a simple model system that represents a basic attempt to measure something about motion by interpreting changes in images across time and space. Specifically, I want to start by considering how the device below measures whether or not you’re speeding on the highway.


Figure 1 - To measure your speed on the highway, a camera like this takes two pictures to calculate position change over time.

Speed depends on two different things: (1) How much distance did you travel? (2) How long did it take you to do that? Once we know both of those things, we can divide the first number by the second and calculate your speed in units miles per hour. To measure both of those things, we have to make sure we’re able to measure position at different points in space and that we’re able to make measurements that distinguish between different times. The device you see on the side of the road does this by taking pictures, two of them, that carry information about space and time. Specifically, as long as we can measure where your car is in each picture, we can calculate how much distance you covered. If we also know how much time passed between those two pictures, then we’re done! We divide the two numbers and the highway patrol decides whether or not to send a ticket to your house (they probably took a picture of your license plate, too). This device is a good starting point for thinking about how your visual system might measure motion because it contains the most basic pieces we need: Measurements that distinguish between different positions, and a way to keep track of time. To continue thinking about how your brain might make a calculation like this to infer motion from changes in images, I want to introduce an abstract version of this device that we’re going to try and use to detect motion using some of the principles we’ve discussed previously regarding how to combine information that comes from different cells in your nervous system. Remember, if the device on the highway has to take two pictures, that means we need to think about how to compare at least two different pieces of information to arrive at a response that reflects motion in the environment. So far, one of the best tools we’ve seen to do this is a logic gate that only produces an output response if the input responses it listens to meet a specific rule (e.g. AND, OR, XOR). We know we need to take measurements that reflect changes in position, so I’m going to assert that we better have at least two input cells that have receptive fields in different parts of the visual field. This way we can measure whether something is at position A and we can measure whether something is at position B. We also need some way to combine these with a logic gate, and I’m going to further assert that AND is a good rule to consider using. We need to know if an object showed up at A and B while it moved, so surely our gate should only fire if both inputs are active. This gives us the diagram below:

Figure 2 - This simple detector can measure the presence of a stimulus at both position A and B. That "Compare" box at the bottom could easily be an AND gate.

We have a big problem, though. The problem is that we haven’t thought about time yet. We know that an object that’s moving left-to-right will show up at A and at B, but not at the same time. That’s a problem because that AND gate will only fire if the inputs from those two sensors are simultaneously active! A moving object would start at A, making that cell and only that cell produce a response. Next, it would move to B, making that cell and only that cell produce a response. At no point are both cells active, so at no point does the logic gate produce an output. To improve matters, we need to adjust our machine to take into account the passage of time between the object arriving at locations A and B. One way to do that is to introduce a new bit of machinery into our diagram (see below): A time-delay module. The little box I’ve added on the left side of the diagram represents a sort of waiting area for the signal coming from A. As soon as A makes a response, the signal arrives at this box, where it has to wait until ‘t’ seconds have passed. After that, it gets to move on to the logic gate. What consequences does this have for our device? Now if an object takes ‘t’ seconds to move from A to B, the two input signals will arrive at the logic gate at the same time! B will get there as soon as the object arrives there, and the signal from A will arrive simultaneously because it was waiting in the time-delay module during the ‘t’ seconds it took for the object to get from A to B. This device, which is called a Reichardt detector, will now produce a signal if an object moves to the right from A to B in ‘t’ seconds, meaning we can use it’s activity as a signal that something moved.

Figure 3 - By adding a time delay to one of the input connections, we can ensure that the signal from A arrives at the gate at the same time as the signal from B if the dot moves from A to B after some fixed time delay.

There are several observations I want to make about this detector. First, you might notice that once I pick the  locations for A and B, I could put the time-delay on either side to measure motion from A to B or motion in the opposite direction from B to A. In fact, I could build a more complicated compound detector that involves two delays, two AND gates, and a means of comparing which AND gate is responding more to measure motion in either of these directions with one machine. This is an important hint that perceiving motion in one direction may be closely linked with seeing motion in the opposite direction.


 Figure 4 - We can make more out of the measurements we're taking at positions A and B. By using delayed and non-delayed connections to different AND gates (the first row of "Compare" boxes) we can ask which of the two gates is more active with a final "Compare" box - this would allow us to detect both leftward and rightward motion with one device.

The other thing I’d like to point out is that the behavior of this machine depends directly on how we chose to measure space and time. Specifically, we’re doing something with this detector called sampling. Rather than measure the entire continuous change associated with a moving object, we’re using something like snapshots of its behavior to try and guess what’s really going on. To be more precise, the cells at A and B are only capable of measuring the objects position at two spots in the visual field, which means we are only sampling, or measuring a small portion of, the different locations the objects occupies. Similarly, the time-delay in our extra module represents a choice of how to sample in time: We’ll only compare the two measurements we take across points in time that are ‘t’ seconds apart. The machine still works, but an important thing to remember is that whenever we sample a more complicated event, we are intentionally ignoring things that happen outside of our samples. The object could do all kinds of stuff between one moment and the next (or one position and the next) and we’d never notice because we’re not measuring that stuff. Critically, this means that sampling introduces ambiguity. Because we’re not measuring the entire continuous motion of the objects we’re seeing, different real movements out in the world can look the same to us through these kinds of motion detectors. In turn, this means we can make a number of systematic mistakes in trying to infer motion from image change. In the remainder of this post, we’ll discuss two specific kinds of error that your visual system makes, and introduce strategies to help mitigate these errors as much as possible.

Sampling in space and the aperture problem.
We’ll begin by thinking more carefully about what kinds of things we might see if we’re looking at a moving object using a cell (like A) that’s “looking at” a particular spot of the visual world with its receptive field. Specifically, let’s imagine that there’s a diamond moving from left-to-right out in the world, but we only get to see if by looking at it through the circle in black below. We’ll call this circle, which represents the receptive field of a single cell, an aperture. As the diamond moves left-to-right, the edge we see through that RF changes position systematically from t1 to t3. So far, so good, right?

Figure 5 - As the diamond really moves from left to right, we'd see the edge in the purple circle change in appearance at Time 1, 2, and 3.

Unfortunately, we have a big problem. Suppose that diamond wasn’t really moving from left-to-right, but was instead moving upwards. What will we see through that aperture? The same thing. This is really bad because it means that the same change in images can be a consequence of different real-world movements. In turn, that means that we can’t use the images we see through one aperture to make a good guess about what direction an object was really moving in!



Figure 6 - If the diamond moves upwards, we'd see the same change in edge appearance through the same aperture! Different motions give rise to the same image change over time.

This problem, which is called the aperture problem, has a few different solutions that help save us a little bit. The first one is a fairly simple hack that helps a bit, but can lead to some different errors in motion perception. Imagine that you weren’t stuck with those apertures on the edges of the diamond, but instead got to look through an aperture where one of the diamond’s corners was sitting. Now what would you see at different points in time? In this case, you could definitely follow where the corner went from one moment to the next, meaning that you wouldn’t be uncertain about what the real motion of the object was. The aperture problem in its worst form relies on ambiguity in matching parts of the object across different points in time, so if you can look at something like a corner or another unambiguous part of the object, you can save yourself some trouble. However, there’s ways that this can get you into some trouble, too. Consider the barber pole in the image below. In the real-world, these patterns actually spin around a vertical axis so that the red-white-and blue stripes are moving horizontally. You probably see this pattern as moving upwards or downwards, however. What’s happening? Your visual system is tracking what it thinks are meaningful corners in the image where the stripes meet the vertical boundary of the pole. As the pattern actually moves horizontally, the position of those corners changes vertically. This same thing is happening at the top and bottom edges of the pole, where the corners appear to change horizontally, but the fact that there are more vertical corners than horizontal corners means that the vertical motion signaled by the majority gets to win. Tracking unambiguous features thus isn’t a bad idea, but it’s one that can still lead to mistakes.


 Figure 7 - A Barber pole really has spinning stripes that drift horizontally, but tracking the corners at the edge of the pattern leads you to see vertical motion.

Another way to try and improve your situation with regard to the aperture problem is to expand your measurements: Don’t just use one aperture, use two! This helps because while the change you observe through one aperture could have been caused by many different real motions, most of those changes won’t be consistent with what you’d see through a second aperture. Here’s a recipe for using this insight to make a better guess about the way an object was really moving based on what you’d see through two apertures:

1) First, imagine that we’re still looking at the diamond, and that it’s really moving rightwards.

2) Next, imagine that you get to see what’s happening through two apertures that are positioned on different edges of the diamond.

3) Pick one of these to start with and place a dot somewhere on the first edge that you saw. Where could this dot have actually moved to yield the second edge? Draw arrows from the dot to each of those places that it could have moved to, and you’ll notice that they all end on an imaginary line that we can fill in with dashes. This is the constraint line provided by this aperture – we don’t know which of these arrows was the true motion of the dot, but all the possibilities end on this dashed line.

4) Repeat this process for the second edge. You’ll get a new dashed line that should point in a different direction. Figures 8 and 9 show this process carried out with to yield the two dashed lines.


 Figure 8 - What you see through each aperture means that a range of motions could have happened. Combining the information from each aperture will help us work out what really did happen.

Figure 9 - The possible places the dot could have gone to end up on the second edge helps us draw a line of constraint (dashed line) that limits the possibilities for what motion led to the change you saw through each aperture.

Now, line up the two dots with each other along with the two dashed lines. The dashed lines should cross at some point, and this point represents the only way that each dot could have moved to make you see the images in each aperture! Draw an arrow from the dot to that intersection and now you know how the diamond really moved!

Figure 10 - Lining up and superimposing the dots from each aperture allows us to see where the dashed lines cross. The arrow between the dot and the intersection of the dashed lines is the only motion that explains what happened in both apertures.

This method is called the intersection of constraints method for solving the aperture problem, and it’s also not too bad. Can it still lead to mistakes? Oh, sure. As usual when we’re trying to make a guess about the real world from limited image data, we have to make assumptions to use these procedures to make our guesses. If those assumptions are violated, then we’re toast. However, a lot of the time these assumptions are true, so these methods tend to work. For now, we have some idea how to sample motion in space and get to a good estimate of what really happened out in the world based on our limited set of measurements.

Sampling in time and the problem of aliasing.
Remember that we’re not just sampling in space, we’re also sampling in time when we try to detect motion. What consequences does this have for the way we perceive movement? Really we’re going to run into the same kind of problem over again: Lots of different motions could have led to the same things we measure with our different samples in time. What does this mean in the temporal domain, though? What kinds of mistakes can we make, and how can we try to fix them? To motivate this discussion, I want you to think about sampling in time as though we’re constructing a sort of flip-book of motion. Each time we take a sample in time, we’re really adding one page to the flip-book. Our visual system gets to flip through the pages and decide what motion we’re seeing. How does it do this? By filling in, or interpolating, what’s happening between pages in the simplest way possible. But what do I mean by simple?


Figure 11 - In a flipbook, you have a picture for each moment of time that depicts some more complex event. Each page is a sample of that event in time, and your visual system has to work out what happened in between to estimate the motion associated with that event.

Consider a very boring flipbook that shows a wheel turning. If the wheel is really turning clockwise, the first page of our flipbook might show it in the position indicated by the blue lines below, and the second page might show it in the position indicated by the purple lines. What kind of motion is the simplest explanation for how the wheel got from the blue image to the purple one? 

Figure 12 - If our flip book of this spinning wheel only had images of these two positions of the wheel spokes, you'd probably guess that the real motion of the wheel was slow and clockwise and you'd be right! Lots of other things could have happened though...

I hope you’ll agree that a small clockwise turn would do the trick. But wait…a lot of other things could have also happened: (1) The wheel could have turned all the way around, plus that little bit more, between image 1 and image 2. This would lead to the same two pictures if it did all this spinning between our two photographs. (2) Wait, the wheel has four spokes! Maybe it didn’t turn all the way around – maybe instead it turned 180 degrees clockwise (plus a little bit). That would also lead to the same two images. (3) Hang on – if it could turn all the way around between our two images, could it turn all the way around twice? That would also lead to the same two pictures. 

This is rapidly getting bad – these two images could have happened because of any of these things, so how do we know which one happened? One rule of thumb your visual system seems to use is that it picks the slowest motion that could have led to successive pages in the flip-book. This spares you from having to consider an infinite number of increasingly rapid spins that could have placed the wheel in these two positions. However, it also complicates matters a bit if we consider another real-world motion of the wheel depicted below:

Figure 13 - A wheel that's really spinning quickly clockwise could still give you these two images of the spokes in your 'flip book.' The slowest motion that explains this change is a slow rotation in the wrong direction! Sampling in time means you can miss what really happened and make incorrect guesses as a result.

In this case, the wheel is still spinning clockwise, but it doesn’t quite get all the way around between picture 1 and picture 2. What does that mean? Well, first you see the spokes at the blue positions and then you see them at the purple positions. What’s the slowest motion that would get you there? A small counter-clockwise turn! This means that a simple rule for resolving an ambiguous movement between two samples in time (Go with the slowest motion) leads you to see a fast counter-clockwise spin as a slower counter-clockwise spin! This mistake is an example of what we call aliasing and it happens all the time when you watch movies. Any movie is essentially a long flip-book with fairly small gaps in time between samples of the action, but that still means you can end up with situations like the one depicted above. Next time you watch a good car chase onscreen, keep an eye on the wheels and see if they’re spinning the right way (or look like they’re holding still) – if you see something weird, it’s because the motion signal has been a victim of aliasing.

How do you fix this one? Truthfully, you don’t. All you can do is sample more frequently to eliminate errors that can occur due to action happening between the samples. Any rate of sampling in time leaves room for something to happen in the gaps, though, so you’ll always be making some mistakes – you’ll just change which ones.


To sum this all up, motion is another example of us trying to guess at some property of the world by making some limited measurements on images. Those measurements make it possible for us to do a lot with a little, but subject to some important constraints to understand. We get a lot of things about motion right, but at the expense of getting some specific things wrong. The key is to remember what these different computations can and can’t do, and to identify the way they all embody some set of assumptions about the relationship between the world and images of the world.

Comments

Popular posts from this blog

Lab #4 - Observing retinal inhomgeneities

Lab #4 - Observing retinal inhomgeneities Back-to-back lab activities, but there's a method to the madness: In this set of exercises, you'll make a series of observations designed to show off how your ability to see depends on which part of your retina you're trying to see with. Here's a link to the lab document: https://drive.google.com/file/d/1VwIY1bDNF4CI4CUVaY5WSvQ0HxF9Mn6Y/view When you're done here, we're ready to start saying more about the retina and how it works. Our next posts will be all about developing a model that we can use to describe the retina's contribution to your vision quantitatively, so get ready to calculate some stuff!

Lab #3 - Photopigments

Lab #3 - Photopigments Our next task is to work out how you translate the image formed in the back of a pinhole camera into some kind of signal that your nervous system can work with. We'll start addressing this question by examining photopigments  in Lab #3. To complete this lab, you'll need access to some sunprint paper, which is available from a variety of different sources. Here's where I bought mine:  http://www.sunprints.org . You can find the lab documents at the link below: https://drive.google.com/file/d/17MVZqvyiCRdT_Qu5n_CtK3rVcUP0zoOG/view When you're done, move on to the Lab #4 post to make a few more observations that will give us a little more information about the retina. Afterwards, we'll try to put all of this together into a more comprehensive description of what's happening at the back of the eye.

Color Constancy: Intro

Color Constancy: Estimating object and surface color from the data. In our last post, we introduced a new kind of computation that we said was supposed to help us achieve something called perceptual constancy . That term referred to the ability to maintain some kind of constant response despite a pattern of light that was changing. For example, complex cells in V1 might be able to continue responding the same way to a line or edge that was at different positions in the visual field. This would mean that even when an object changed position over time because you or the object were moving, your complex cells might be able to keep doing the same thing throughout that movement. This is a useful thing to be able to do because your visual world changes a lot as time passes, but in terms of the real objects and surfaces that you’re looking at, the world is pretty stable. Think about it: If you just move your eyes around the room you’re sitting in, your eyes will get very different pattern