Skip to main content

Monocular cues for depth perception


Monocular cues for depth perception
In our last post, we discussed how you can use the information from your two eyes to estimate the relative depth of objects in the visual field. Both vergence and binocular disparity provided cues to where objects were situated in depth relative some fixation point, allowing us to obtain some information about the 3D arrangement of objects in space. Clearly, two eyes are helpful in resolving the ambiguity that follows from projecting the light coming from a three-dimensional scene onto a two-dimensional surface. We said that this projection of light onto the retina made it much harder to make good guesses about depth from one image, and that using two images was a necessary step towards making it possible for you to recover this information from the retinal data. However, consider the picture below:



Figure 1 - A boid in some trees. Some things here look closer to you than others, but how do you tell that when you don't have binocular information to help you?

This image is a flat picture sitting on your screen, which means that your two eyes don’t end up with useful disparity information to make guesses about the 3D layout of these objects in the world. Nonetheless, my guess is that you have some ideas about what things in this image are closer to you than others. This must mean that there actually is some information in a single retinal image that tells you something about depth! In fact, there are multiple monocular depth cues that allow you make reasonable inferences about relative depth even if you don’t have two eyes to work with. We’ll discuss these here, describing these different cues using some important distinctions between where the information comes from and what you can use it for. Specifically:

  • Cues can either be object-based, light-based, or geometric depending on the underlying source of the information that allows you to estimate depth.
  • Cues can either provide ordinal or metric information about depth, which means that you either learn about the order of objects in depth (closest, next-closest, furthest, etc.) or you get an actual number that tells you about how far away something is, respectively.

A critical thing to remember about all of these cues is that like all attempts to reason about an underconstrained problem, our use of these cues relies on making assumptions about the nature of the solution, or the information we can use to estimate the property we’re interested in. All of these cues can easily lead us astray if those assumptions are violated, so keep careful track of what we’re assuming, why we’re assuming it, and how that assumption may turn out to be false.

Light-based cues to monocular depth
We’ll begin by talking about some light-based cues to depth. By calling these cues light-based, what I really mean is that we will be using some lawful properties of how light behaves as a function of how far away it’s coming from to make guesses about the depth of objects in a scene. Another way to put this is to say that because of these things that light does, objects that are far away tend to look different than objects that are closer in specific ways. That means we can use those changes in appearance to estimate how far away different objects are.

Scattering of light
A great example of such a property of light is the tendency of light to scatter or change direction as it moves through a medium like the air in our atmosphere. By scattering, we mean pretty much exactly what you think: Light particles don’t just move straight through the air, but instead deviate from a straight path when they encounter molecules in the air or changes in density along their path (Figure 2). This simple fact ends up meaning that a lot of things happen to the way an object that is reflecting light looks if the light coming from it has to travel through more air to get to our eye. More air between our eye and the object means more opportunities to scatter, which changes multiple properties of that light as it makes its way towards us. For example:






Figure 2 - Light scatters when it encounters atmospheric particles. More air between you and an object means more opportunities to scatter.


  • Light that scatters in random directions leads to edges that look blurry or fuzzy. Another way to say this is that the high spatial frequencies available in the pattern of light are reduced, leaving you with only coarse, low spatial frequency information.
  • As light scatters more and more, you end up seeing increasingly equal amounts of light that actually came from the object and light that came from the background around it. Besides making edges blurry, this also reduces the difference between the brightness of the object relative to the brightness of the background, which is also known as the contrast of the object. Objects thus become harder and harder to distinguish from the background as distance increases.
  • Light with shorter wavelengths (blue light) scatters more than light with longer wavelengths (red light). This means that light that comes from far away will tend to look more blue to you.

Running all of these cues backwards means that you can make the following guesses about an object’s depth based on its appearance: Objects that are blurrier, lower contrast, and more blue, are probably further away from you, just based on what probably happened to the light on its way to your eye from the object. Are these cues ordinal or metric? One thing to keep in mind is that while we can probably get a robust depth order out of these differences in appearance, actually guessing at the metric depth of an object will involve assuming a lot about the air between us and the object on a given day. The presence of particulate matter in the air, the temperature, and other environmental factors will change exactly how an object looks as a function of its distance. As it is, we’re also already assuming that objects don’t vary in how blue they are, how blurry or fuzzy they should look, and what their contrast is with the background! Regardless, these light-based cues are useful ways to make guesses about the relative depth of objects in the scene with information from one eye.


Figure 3 - A great example of aerial perspective. The faraway hills look fainter, blurrier, and bluer than the closer ridges and objects.

Shadows as a cue to depth
Another simple property of light that we can use to make guesses about depth is the presence of shadows in an image. Shadows can be cast or attached, meaning that the shadow lies on a surface that is separate from the object, or that the shadow lies on the surface of the object itself. In both cases, shadows can provide cues for depth. Cast shadows, for example, will look different as the distance between an object and a surface changes: (1) The size of the shadow relative to the object will change, (2) the blurriness of the shadow’s edges will change due to the diffraction of light around the edges of the object, and (3) The position of the shadow relative to the object may also change depending on the source of the light, and (4) The contrast between the shadow and the background may also tell you about the distance between the object and the surface. (Figure 4)



 Figure 4 - Fun with drop shadows in Powerpoint. As we manipulate the distance between the object and its cast shadow, the blurriness of the shadow, its size, and its contrast, we manipulate how far it appears to be floating above the surface below.

Attached shadows are also interesting cues to depth, but what they tell you about a scene tends to be somewhat different. Because attached shadows result from an object occluding itself from a light source, their appearance tends to tell you more about the object’s shape than about the distance between the object and another object or surface. This kind of shape is depth information, however, because understanding the 3D form of an object means understanding how different parts of the object are positioned in depth relative to one another. The inferences you make about object shape based on attached shadows tend to rely on a simple assumption, however, that is true a lot of the time but can be easily violated. Specifically, you tend to assume that light comes from above when interpreting shape using attached shadows. To see a simple demonstration of this, consider the figure below: Here, we’ve filled circles with the same transition between light and dark values, but flipped them upside-down on the right. This probably makes it look to you like the two sets of circles are either sticking out of the page (bumps) or sunk into the page (dents), because that’s how light from above would behave if those objects really were shaped that way. 


Figure 5 - Attached shadows signal 3D shape, and therefore depth. We assume that light comes from above, however, leading to different interpretations of these discs as bumps (left) or dents (right) if the same pattern is inverted.

In all of these cases, knowing some things about the way light behaves independently from our eye allows our visual system to make reasonable guesses about relative depth using the appearance of light coming from objects. Next, we’ll see how there are properties of the objects themselves that allow us to make similar guesses, but these also rely on critical assumptions regarding the things we’re looking at.

Object-based cues to depth
Our previous discussion of monocular depth cues focused on how light behaves when it has to come from further away, or when it interacts with objects at different depths. Next, we’ll consider a slightly different set of cues that are based on how object appearance may also change as depth relationships change in a complex scene. That is how do objects tend to look different when they are positioned further away from us than other objects? As above, these different cues all rest on different assumptions about the objects we’re looking at, which each represent opportunities for us to get the wrong estimate if those assumptions have been violated. These cues also differ in terms of whether we get ordinal or metric information out of the image. We’ll begin by considering a very simple object-based cue to depth.

Occlusion
How do objects at different distances from you tend to look different from one another? One very basic property of object appearance that tends to change with depth is the extent to which parts of an object are occluded, or blocked from view, by something that is between you and the object. Consider the simple shapes below, for example, and see which one you think is in front (Figure 6). My guess is that you’re pretty confident that the square is in front of the circle, because it looks like part of the circle is hidden from view behind the square. Simple, no? Occlusion is a very robust cue to monocular depth that provides an ordinal cue to object position in 3D. You can’t say exactly where the two objects are, but you can put them in some kind of order from closest to furthest.






Figure 6 - The square is probably in front of the circle. That is, as long as you're sure that it's a circle.


This relies on a simple assumption, though, that may seem so basic that you didn’t realize you were assuming it! In reasoning that the circle is further from you than the square, you “completed” the part of the circle that you couldn’t see. That is, you assumed that you were looking at an intact circle that happened to be behind a square rather than a ¾ circle that was positioned right against the edges of the square. We haven’t talked much about the way you make inferences like this about the shape of objects, but it’s worth saying that there’s a lot of evidence that you do. In most cases, you’re probably right about it, but all it takes is an irregularly shaped object to lead you down the wrong path. Even something as simple as object occlusion is based on assuming something to overcome the ambiguity in the data, which opens the door to error and misinterpretation.

Object size
Here’s another simple property of object appearance as a function of depth: Objects tend to take up less space on the retina as they get further away. Two objects that are the same size in the world will look smaller/larger than one another if they appear closer/further to you. If you happen to know the size of an object in the real world, you can even use a little bit of trigonometry to work out the distance between you and that object (Figure 7). That makes size potentially a metric cue to depth, but again, make sure you check your assumptions! The quality of your metric estimate will be directly related to the accuracy of your guess regarding the true size of the object – if you’re wrong about how big a car really is, for example, you’ll guess wrong about how far away it is. Even if you’re comparing objects that you’re unfamiliar with and trying to order them in depth, you’re still assuming something: namely that they’re likely to be about the same size in the real world! 





Figure 7 - The relationship between visual angle, object size, and distance allows us to estimate the distance to an object if we know its real size and the angle it subtends on the retina.


This matters a great deal when you use object size as a cue for interpreting things like texture patterns, because in this case the “objects” you’re working with are the texture elements that make up the pattern (the checks in a checkerboard, for example). You tend to assume that those elements are really the same size, leading to systematic guesses about the 3D layout of a scene or surface. If they’re not the same size, you’ll get the answer wrong, leading to distorted perceptions of distance or size. Below, you can see how bower birds tend to take advantage of these assumptions to make impressive-looking dens to try and charm potential mates with – by choosing object size and position carefully, the birds can manipulate the depth cues signaled by the texture gradient leading up to the nest, changing the interpretation of that object in their favor.


Figure 8 - When using object/texture element size to estimate depth relationships we tend to assume that similar-looking objects are the same physical size. This means that we can use regular patterns to estimate depth (left), but that this assumption can be challenged, leading to errors of depth perception and subsequent size perception (right).

Geometric cues to depth
Finally, I want to briefly discuss a set of cues that aren’t quite directly related to light nor directly related to objects, but more broadly affect the way objects in scenes tend to look as a function of distance. For lack of a better way to describe these, I’ll call these geometric cues to depth, because they’re primarily related to the way the projection of 3D scenes onto 2D surfaces tends to work regardless of whether we’re considering light independently or objects in particular.
            Though there are other cues we could talk about, I’m really going to focus here on one of these cues in particular: Linear perspective. My guess is that you have some familiarity with this cue, possibly from an art class you took at some point, but may need a bit of a refresher on the details. The most important thing for you to know about linear perspective is this: Lines that are really parallel in the world will meet at a vanishing point in the projection of that scene. Below is an example of a grid drawn with vanishing points in place to guide artists in rendering real-world parallel lines appropriately in an image.



Figure 9 - An empty grid with vanishing points used to establish linear perspective.


Linear perspective is a useful cue to depth because it allows us to link position in the image to depth in the world. For example, consider the simple illusion depicted below: My guess is that the monsters probably look like they’re different sizes to you, even though they’re physically the same size. What’s going on? You’re using the depth information provided by the converging lines in the scene to make inferences about which regions in the image are closer to you (the bottom left) and which regions are probably further from you (the center). This leads you to work out how big the objects must really be if they’re that far away and take up that much space, which in turn induces the illusory percept. Note the various assumptions that you’re making though: (1) The lines meet at that vanishing point because they’re really parallel, but receding in depth. (2) The position of the objects in the image faithfully reflects where they are in 3D – nothing is floating off of the ground, for example.  There’s more where those came from, but hopefully you get the gist: Like all of our  other cues, using linear perspective to infer depth depends on assumptions, any of which can be violated. One fantastic violation of these assumptions in particular is the Ames Room, which is a carefully constructed room that is built so that perspective relationships lead to a specific interpretation of the scene from a single vantage point (Figure 10). That interpretation leads you to make some incredibly wrong-headed inferences about object size, however, because of the way in which your assumptions about perspective are being challenged.



 Figure 10 - In the Ames Room, careful control of the vantage point and the way projected lines look from that position (left - public domain image from wikipedia) lead you to make erroneous estimates of depth that lead to misestimations of size (right - Image copyright: Tony Marsh)
Across the board then, you really can use just one eye (or monocular cues) to make guesses about relative depth. There’s lots of good information in scenes that provides hints about how objects and surfaces are positioned in 3D. The key is to never forget that these are guesses based on cues that aren’t perfect and that rely on assuming things about the world that might not be true. Regardless, some information is better than none, and your visual system tends to take what it can get.

We’ll continue to confront these kinds of issues as we move on to understanding our last “mid-level” property of visual perception – the perception of motion. Now would be a great time to do Lab #8 to acquaint yourself with some interesting phenomenology related to how you estimate the way things are moving based on how patterns of light change over time.

Comments

Popular posts from this blog

Lab #4 - Observing retinal inhomgeneities

Lab #4 - Observing retinal inhomgeneities Back-to-back lab activities, but there's a method to the madness: In this set of exercises, you'll make a series of observations designed to show off how your ability to see depends on which part of your retina you're trying to see with. Here's a link to the lab document: https://drive.google.com/file/d/1VwIY1bDNF4CI4CUVaY5WSvQ0HxF9Mn6Y/view When you're done here, we're ready to start saying more about the retina and how it works. Our next posts will be all about developing a model that we can use to describe the retina's contribution to your vision quantitatively, so get ready to calculate some stuff!

Lab #3 - Photopigments

Lab #3 - Photopigments Our next task is to work out how you translate the image formed in the back of a pinhole camera into some kind of signal that your nervous system can work with. We'll start addressing this question by examining photopigments  in Lab #3. To complete this lab, you'll need access to some sunprint paper, which is available from a variety of different sources. Here's where I bought mine:  http://www.sunprints.org . You can find the lab documents at the link below: https://drive.google.com/file/d/17MVZqvyiCRdT_Qu5n_CtK3rVcUP0zoOG/view When you're done, move on to the Lab #4 post to make a few more observations that will give us a little more information about the retina. Afterwards, we'll try to put all of this together into a more comprehensive description of what's happening at the back of the eye.

Color Constancy: Intro

Color Constancy: Estimating object and surface color from the data. In our last post, we introduced a new kind of computation that we said was supposed to help us achieve something called perceptual constancy . That term referred to the ability to maintain some kind of constant response despite a pattern of light that was changing. For example, complex cells in V1 might be able to continue responding the same way to a line or edge that was at different positions in the visual field. This would mean that even when an object changed position over time because you or the object were moving, your complex cells might be able to keep doing the same thing throughout that movement. This is a useful thing to be able to do because your visual world changes a lot as time passes, but in terms of the real objects and surfaces that you’re looking at, the world is pretty stable. Think about it: If you just move your eyes around the room you’re sitting in, your eyes will get very different pattern