Relative Depth from Binocular Vision

Relative Depth from Binocular Vision

Discussing the phenomenology of color constancy led us to consider something new about visual perception. Besides simply measuring patterns of light with the various parts of your visual system, you also use those patterns of light to make inferences about the light sources, objects and surfaces in the world that produced them. In the case of color constancy, this meant thinking about how we might observe the pattern of light coming from an object or surface and use that to make a guess about its reflectance and the illumination shining on it. The former property should be a stable property of the object, while the latter might change under different conditions. We saw that there were techniques for estimating the reflectance of an object, but these relied on making assumptions about the world that might not always be true. In general, this is going to be the case as we continue to think about how we can recover properties of the world from patterns of light. The relationship between these two things tends to be underconstrained, meaning we don’t have enough information to come up with a unique solution. The only way forward is to make some guesses about the problem before we even start.

Our next step is to think about a new inverse problem that involves a different property of objects that we might like to try and infer from an image: Namely, depth. What I mean by depth is something very simple – how far away is an object that we are looking at? Actually, we’re going to be thinking about a slightly different problem instead. What we’ll be interested in is determining the relative depth of objects that we are looking at. This means we’d like to be able to look at (or fixate) some object in a scene and determine which other objects are closer to us than that one or further away. When you consider your own visual experience, this probably (but not necessarily!) seems like a fairly easy thing to make judgments about. Still, this also turns out to be a tricky inverse problem that is underconstrained. To understand some of the difficulties, consider the image below and imagine that you’re looking at the green circle. What is the relative depth of the red and yellow circles? Are they closer to you than the green circle, or further away?

Figure 1 - Which circle is closest to you? It's probably tough to decide on an answer from just this image.

My guess is that you don’t feel very confident about answering that question. From this image alone, you don’t have enough information to say what’s going on! Another way to say this is to point out that there are many different versions of these circles in the real world that could have given rise to this image. What if the red circle is incredibly small but very close to you? Alternatively, what if the yellow one is extremely large but far away? Any of these things could be true for any of the circles, so you can’t say what the relative position of any of them might be. The key problem here is that the real 3-D world has been projected onto the 2-D surface of your retina, which means that the information about the distance between you and the objects (and the distances between the objects and each other) along what we usually refer to as the z-axis is gone (Figure 2).

Figure 2 - The light from 3D objects in the world ends up projected onto the 2D plane of your retina. This means we lose information about the z-axis.

It’s easy to figure out how 3D objects get projected onto a 2D plane, but we want to work backwards: What 3D objects (in which relative positions) produced the 2D image we want to look at? There’s a few things we can do with just one 2D image to help us make some guesses about this, but for now we’ll focus on a fact about your visual system that we haven’t talked about yet: You don’t just have one 2D image to work with, you have two of them.

Having two eyes is good for a number of reasons, but we’re going to focus on two aspects of having two eyes (or binocular vision) that give us some traction on the problem we’ve set ourselves: Working out the relative depth of objects. The first of these has to do with a physiological phenomenon called vergence. The second has to do with an optical/geometrical property of the images in your two eyes called binocular disparity. In both cases, we’ll see that these two features of binocular vision allow us to do some amount of work calculating the relative depth of objects in a scene.

We’ll start with vergence, which refers to the way your two eyes move in a coordinated fashion to ensure that an object you’re fixating on ends up in the center (or fovea) of both retinas. To see vergence in action, hold up a pencil or pen in front of a friend’s face and ask them to look right at the tip of it as you move it nearer to their face and then further away. You should notice that their eyes cross more and more as the object moves closer to them, and then diverge as you move the object further away. That coordinated turning of the eyes inward or outward is what we mean by vergence. How does vergence give us a cue to depth? If we can measure the angle that the eyes have to turn through to fixate the object we’re interested in, we can use the distance between our two eyes (or a decent estimate of this) to solve for the distance to the object with a little basic trigonometry (Figure 3). To keep it simple, turning your eyes a lot means that an object must be close, while turning them just a little must mean that an object is further away. We can be more precise if we want to, but that’s the gist of how we can use this simple visual behavior to make a guess about depth using your two eyes.

Figure 3 - Your eyes cross a little bit when you try to fixate on an object close to you. By measuring the angle your eyes turn inward to do so, we can estimate how far away something is.

To understand our second cue to depth, binocular disparity, try the following little demo. Hold one finger upright in front of you, and hold a pen or pencil upright some distance behind your finger. While keeping your eyes on the tip of your finger, alternately close your left eye and then your right eye? What do you see? Chances are that you see your finger holding still, but the pencil should jump around as you alternate between your two eyes. Now bring the pencil forward so that it’s in front of your finger and try the same thing. You still should have seen the pencil jumping around while your finger held still. Why is this happening? The most basic thing to observe about this is that your two eyes see different versions of the same scene. The object that you’re fixating on stays in the same position on your retina as you alternate which eye you’re using to see, but other objects end up at one place on one retina and somewhere different on the other. To see why, take a look at the diagram below where we’ve drawn a little sketch of this situation.

Figure 4 - Light from objects at different depths relative to a fixation point (F) ends up at different places in the two retinas depending on how near or far the objects were.

With your eyes turned inward to look at your finger, light from the pencil can only get to each retina through your eye’s pinhole (or pupil). This means we can draw a straight line from the pencil through the pupil and onward to the retina for each eye. You’ll notice that when we do this for a pencil that’s further away than your finger, the pencil’s light ends up to the right of the fovea in the left eye, and to the left of the fovea in the right eye. When we do the same thing for a pencil that’s closer than your finger, the same procedure leads to the opposite outcome – the pencil’s light is to the left of the fovea in the left eye, and to the right of the fovea in the right eye. The difference in the pencil’s position on the left and right retina is what we mean when we talk about binocular disparity. Specifically, binocular disparity refers to the difference in horizontal position for the same object in the two retinas.

Now here’s the neat part: With your eyes turned inward to look at some point of fixation (which we’ll call F), we can easily draw lines from any point in 3D space into each retina. This would be the forward problem of relative depth from binocular disparity. The situation that the visual system finds itself in is the opposite, however: It gets the two images on the retina and has to try and work out the depth of an object relative to F. Luckily, we can still draw the same lines to solve this inverse problem as we drew to solve the forward problem. Starting with the left eye, we can draw a line from the object’s location on the retina, out through the pupil and out into 3D space. At this point, all we can say is that the object has to be somewhere on this line, but we don’t know where! If we do the same trick with our right eye image, however, we get a second line that must cross the first one somewhere, which tells us where the object must be (Figure 5)

Figure 5 - The two views of an object in the left and right eye can be examined for their disparity, then used to estimate the relative depth of objects relative to a point of fixation.

This is neat, but a quick caveat about being precise about what we can say here. What we’re really learning is something about the position of this object relative to the point of fixation – is it closer to us than F, or further from us than F? We’re going to make one more observation about the geometry of this set-up that will make this point more clear. We’ve assumed so far that objects you’re not looking at (objects that aren’t F) will end up at different places on your two retinas. But what would happen if there was a different object that was at the same position on both retinas? We can still draw our two lines, and we can still identify a position for this object in 3D, but because it doesn’t have disparity (there’s no difference in its position on the two retinas), we must see it at the same depth as F. All of the points in 3D space that have zero disparity lie on a circular curve we call the horopter (Figure 6). Again, everything on the horopter will look to you like it’s at the same depth as F, so they neither look closer or further. The consequences of this for our earlier diagrams are this: What we really get to say when we find the position of a point in space from our two rays of light is whether it lies inside the horopter, or outside of it. The former case we refer to as an object having crossed disparity (you’d have to cross your eyes to fixate on it), and the latter case we refer to as having uncrossed disparity.

Figure 6 - The horopter is the locus of points at equal depth relative to fixation.

Remember: What this relationship between binocular disparity on the retinas and relative depth in the world means is that your visual system can estimate depth from just the two images in your eyes. The visual system doesn’t actually draw lines the way we do, but it still uses the position of the same object in the two eyes to make a guess about depth. Seeing the same object at different positions in the two eyes leads to the perception of an object being closer or further from us. Something that follows from this is that if we can create 2D images that have the right disparity information in them, we can make pictures that look 3D as long as we get the right relationships between binocular disparity and real-world depth. That is, if we could make an image of objects in the world that looks like what your left eye would see AND an image that looks like what your right eye would see, all we have to do is make sure each eye sees the right picture to make you see in 3D. This is exactly what’s happening with 3D glasses, whether they’re red/cyan, polarized, or some other type of technology. In all these cases, you’re watching two images superimposed on each other in such a way that you only see one of these images with your left eye and only see the other with your right eye. The combination of the disparity information in the two images and the separate presentation of each image to one of your eyes leads your visual system to infer that there really must be depth in the scene you’re looking at. Check out the figure below to see that this really works.

Figure 7 - 3D anaglyphs work by layering separate images (here in red and cyan) that have the right disparity information to signal real-world depth. By wearing glasses, you guarantee that each image is only delivered to one eye, after which your visual system uses the disparity cues to estimate depth.

Binocular disparity is pretty neat – it gives us a means of understanding how your brain perceives relative depth and allows us to make cool movies. That said, I’ve made it sound much easier than it is, so I’d like to close by pointing out something that’s quite difficult about using binocular disparity to compute depth in real scenes. Specifically, there’s an assumption hidden behind everything we’ve said above about how to compute depth from disparity. We’ve assumed throughout that we know how to match up the image of an object in the left eye with the image of the same object in the right eye. This might sound easy, but in general it’s quite hard. Formally, we call this the correspondence problem, and to see why it’s hard, consider the figure below. If you see four dots in the left eye and four in the right, surely they must match up in the obvious way, right? Not nccessarily! What if they don’t all have a match in the other eye? Maybe the 3 rightmost dots in the left eye go with the 3 leftmost in the right eye, for example. Or maybe it’s only the 2 right/leftmost dots that go together. Each of these choices would lead to different solutions for the relative depth of the dots, so what one do we pick? There’s not really a single good answer to this question, but there are more assumptions we can make to help your visual system along. We won’t talk about those now, but I want you to at least know that they exist.

Figure 8 - Using disparity for depth estimates relies on knowing which features from the left eye match up with things on the right eye. In general, this isn't always easy to solve.

Here’s something you might be thinking, though: Why should it be hard to match objects up between the left eye and the right eye? Why can’t you just recognize each object in one eye (here’s a cat) and find it in the other (there’s a cat)? That is, maybe you recognize shapes and objects first, and do all your binocular disparity work afterwards to take advantage of knowing what everything is. Not a bad idea (assuming you know how to recognize shapes, but we’ll deal with that later), but apparently not how your visual system does it! To be convinced of that, take a look at the image below with some red/cyan glasses – even though there’s not any objects there for you to recognize, your visual system still uses the disparity information to solve for depth. This means you mustn’t need shape recognition to match objects up for disparity computations. Instead, you can do something else to solve the correspondence problem, but for our discussion we’re going to leave it at that.

Figure 9 - The fact that you can still see a floating square in the image at the top suggests that depth perception works more like the flow chart on the right than on the left. You don't need to solve for shapes and objects to solve the correspondence problem.

I can’t leave you without pointing out that best application of the above phenomenon (depth from images without recognizable structure) is the autostereogram, or Magic Eye image. These abstract images are actually just two images placed side by side that differ in terms of disparity – some features are in slightly different places in the left and right image. If you can either cross or uncross your eyes by the right amount, you can make those two halves of the larger picture overlap, which will leave you with the disparity information you need to see the 3D information that reveals the object hiding in the autostereogram. This trick of crossing or uncrossing your eyes by the right amount so that you can separately deliver the two images to the right eye is called free-fusing and it’s not so hard to learn to do! If you can, practice it a bit until you see a few autostereograms, and you’ll have a new appreciation of how much your visual system gets out of binocular vision.

Figure 10- Autostereograms are just two images with different disparity information baked into them set side by side. By fusing the two halves, you can see a hidden 3D object.

Now would be a great time to try out Lab #7, which involves making some 3D images by introducing disparities into images you draw. Check it out, and we’ll come back to talk about some more cues to depth in complex images.

Seeing and Perceiving

Search This Blog

Relative Depth from Binocular Vision

Comments

Post a Comment

Popular posts from this blog

Monocular cues for depth perception

What is Light?

Observing the retina (and what it can do)