Skip to main content

Introduction to high-level vision

Introduction to high-level vision
With this blog post, we begin to talk about a new and challenging topic in human vision: high-level vision. Before, we go any further, we need to address an important question: How is this different than what we’ve been talking about before?

 To date, we’ve spent a lot of time talking about some of the basic principles regarding the way your visual system measures the light in your environment. For example, you have photoreceptors that transduce light and provide a signal that carries information about the wavelength of the light you saw. Further on in your visual system, cells in the LGN and V1 measure various patterns of light and send information on about the dots, edges, and lines you may have seen. These kinds of processes that are mostly concerned with making measurements that describe the incoming light are what I’d refer to as low-level vision.


Figure 1 - Measuring the wavelength content of a light or the pattern of light that we're seeing are examples of low-level visual processes.

Next, we encountered another set of processes that we said were carried out in parts of your visual system that were even further along. The difference between these processes and the low-level measurements that came before them was that in each case, we were interested in recovering some property of the things in the world that gave rise to the pattern of light you could measure with low-level tools. For example, we’ve talked about using that raw information to make good guesses about the color or lightness of the objects in a scene, or the position of those objects in 3D space, or most recently, the way in which things are moving around you. All of these processes are what I’d refer to as mid-level vision. These processes rely on the information we measure about light, but then we use that information to start making inferences about what things there are out in the world.

Figure 2 - Looking at complex patterns of light (upper left) and using low-level information to estimate the depth (upper right), object reflectance (lower left), and motion (lower right) are examples of mid-level processes.

Now that we’re moving on, what are we moving on to? When we talk about high-level vision, we’re talking about processes that involve attaching labels to parts of our visual world, or using the information we get from low- and mid-level processes to plan behaviors involving the things we see. This encompasses a lot of different visual tasks and many different interesting computational problems. We’ll spend our time discussing just three aspects of high-level vision: (1) recognizing objects, (2) visual search, and (3) using vision to plan actions. In each case, I have to point out something important: We don’t really know how these processes work! That is, while I can give you some ideas about how they might work, these are all active areas of research that remain fairly open even regarding some basic questions. This isn’t to say that we’ve completely figured out low-level and mid-level vision, but I think it’s fair to say that we have more specific models of how those processes work that rely on mathematical concepts that aren’t so hard to understand. For high-level problems, we’re still working towards coming up with good behavioral, neural, and computational evidence that points towards a clearly-described mechanism supporting what you see and do. This means that I can’t give you detailed procedures describing how we do these things, but I can tell you about some good candidates. This is the plan from here on in, and in each case, I’m going to do my best to tell you about the strengths and weaknesses of particular ideas while being as specific as I can about how they would actually work.

Let’s begin by considering a question that’s easy to ask, but hard to answer: How do you recognize the things that you see?



Comments

Popular posts from this blog

Monocular cues for depth perception

Monocular cues for depth perception In our last post, we discussed how you can use the information from your two eyes to estimate the relative depth of objects in the visual field. Both vergence and binocular disparity provided cues to where objects were situated in depth relative some fixation point, allowing us to obtain some information about the 3D arrangement of objects in space. Clearly, two eyes are helpful in resolving the ambiguity that follows from projecting the light coming from a three-dimensional scene onto a two-dimensional surface. We said that this projection of light onto the retina made it much harder to make good guesses about depth from one image, and that using two images was a necessary step towards making it possible for you to recover this information from the retinal data. However, consider the picture below:   Figure 1 - A boid in some trees. Some things here look closer to you than others, but how do you tell that when you don't have bino...

Lab #4 - Observing retinal inhomgeneities

Lab #4 - Observing retinal inhomgeneities Back-to-back lab activities, but there's a method to the madness: In this set of exercises, you'll make a series of observations designed to show off how your ability to see depends on which part of your retina you're trying to see with. Here's a link to the lab document: https://drive.google.com/file/d/1VwIY1bDNF4CI4CUVaY5WSvQ0HxF9Mn6Y/view When you're done here, we're ready to start saying more about the retina and how it works. Our next posts will be all about developing a model that we can use to describe the retina's contribution to your vision quantitatively, so get ready to calculate some stuff!

What is Light?

What is light (and what does it do?) To begin talking about vision, we need to talk about light. What I mean is that we need to work out a way of thinking about what light is as a physical stimulus so that we can understand how it interacts with the physiology in our eye and brain. Our goal is to come up with a description of what light is that I’ll be calling a model . A good model allows us to account for some properties of a complex system and predict how it should behave under different circumstances. Good models also frequently leave out some aspects of a complex system in favor of being easier to work with, which means we have to remember that our descriptions of complex systems have limits. For now, we’ll have a limited set of phenomena that we’re trying to understand, but even so we’ll see that we’re leaving out some properties of light that are physically meaningful in some circumstances, but not terribly meaningful given our physiology. What is light?...