Introduction to high-level vision
With this blog post, we begin to talk about a new and
challenging topic in human vision: high-level vision. Before, we go any
further, we need to address an important question: How is this different than
what we’ve been talking about before?
To date, we’ve spent
a lot of time talking about some of the basic principles regarding the way your
visual system measures the light in your environment. For example, you have
photoreceptors that transduce light and provide a signal that carries information
about the wavelength of the light you saw. Further on in your visual system,
cells in the LGN and V1 measure various patterns of light and send information
on about the dots, edges, and lines you may have seen. These kinds of processes
that are mostly concerned with making measurements that describe the incoming
light are what I’d refer to as low-level
vision.
Figure 1 - Measuring the wavelength content of a light or the pattern of light that we're seeing are examples of low-level visual processes.
Next, we encountered another set of processes that we said
were carried out in parts of your visual system that were even further along.
The difference between these processes and the low-level measurements that came
before them was that in each case, we were interested in recovering some
property of the things in the world that gave rise to the pattern of light you
could measure with low-level tools. For example, we’ve talked about using that
raw information to make good guesses about the color or lightness of the
objects in a scene, or the position of those objects in 3D space, or most
recently, the way in which things are moving around you. All of these processes
are what I’d refer to as mid-level vision.
These processes rely on the information we measure about light, but then we use
that information to start making inferences about what things there are out in
the world.
Figure 2 - Looking at complex patterns of light (upper left) and using low-level information to estimate the depth (upper right), object reflectance (lower left), and motion (lower right) are examples of mid-level processes.
Now that we’re moving on, what are we moving on to? When we
talk about high-level vision, we’re
talking about processes that involve attaching labels to parts of our visual
world, or using the information we get from low- and mid-level processes to
plan behaviors involving the things we see. This encompasses a lot of different
visual tasks and many different interesting computational problems. We’ll spend
our time discussing just three aspects of high-level vision: (1) recognizing
objects, (2) visual search, and (3) using vision to plan actions. In each case,
I have to point out something important: We don’t really know how these
processes work! That is, while I can give you some ideas about how they might work, these are all active areas
of research that remain fairly open even regarding some basic questions. This
isn’t to say that we’ve completely figured out low-level and mid-level vision,
but I think it’s fair to say that we have more specific models of how those
processes work that rely on mathematical concepts that aren’t so hard to
understand. For high-level problems, we’re still working towards coming up with
good behavioral, neural, and computational evidence that points towards a
clearly-described mechanism supporting what you see and do. This means that I
can’t give you detailed procedures describing how we do these things, but I can
tell you about some good candidates. This is the plan from here on in, and in
each case, I’m going to do my best to tell you about the strengths and
weaknesses of particular ideas while being as specific as I can about how they
would actually work.
Let’s begin by considering a question that’s easy to ask,
but hard to answer: How do you recognize the things that you see?
Comments
Post a Comment