Two paths to color
constancy
Between our last post and the exercises in the Color
Constancy lab, I hope you have a sense that your perception of color is
definitely not just based on the wavelengths that are present in a mixture of
lights. Instead, your visual system is doing something to adjust those
wavelengths to give you a different estimate of color. But what is that
estimate really for and how does your visual system arrive at it? Here, we’re
going to start by trying to formalize the problem within a perceptual constancy framework.
This will involve describing a particular goal for the visual system and then
pointing out why this goal is particularly challenging. Next, we’ll discuss two
different ways to try and achieve this goal that rest on some specific
assumptions about the way images of real world objects might work. We’ll see
that these assumptions aren’t always true, but they’re true enough of the time
that these procedures work.
To describe what the visual system might be trying to do
when it makes these estimates of color based on (but not limited to) the raw
wavelengths that are present in an image, let’s remind ourselves where the
light that we measure at the retina comes from. Specifically, let’s think about
the world beyond our visual system as being composed of light sources and surfaces.
When you see light and transduce it with your photoreceptors, you’re really
measuring the outcome of a process that began at the interface of those two
types of stimulus. An object is only
visible to you because some light source
was shining light on it and some of that light reflected off of the object
towards your eye. The wavelengths of light that get to your eye therefore
depend on two things: (1) The wavelengths that were in the light source, and
(2) The proportion of each wavelength that was reflected by the object.
Figure 1 - Out in the world, the pattern of light you see from an object is the product of the light that shines on it and the light that it reflects.
What does this mean? It means that the same light coming in
to your eye could have come from many different combinations of light source
and object. Suppose I shine a red laser pointer on a whiteboard. The light source only had long wavelengths
of light in it, but the object was
capable of reflecting light at all wavelengths. The result is that I end up
with long-wavelength light at my retina. But now consider this: Suppose I shine
a white light onto a bright red apple. Now the light source has light at all wavelengths, but the object is only reflecting long
wavelengths of light. This is very different than the first scenario, but it
ends the same way – I end up with long-wavelength light at my retina.
Figure 2 - In both of these images, I end up with long-wavelength light getting to my eye. Where that light came from differs a lot though when I consider both illumination and reflectance.
Mathematically, you already know how to think about this
interaction between light sources and objects out in the world. The contents of
the light can be described with a spectrum
and the proportion of light that the object reflects can be described with a reflectance spectrum. This latter list
of numbers is essentially the same as a filter – each entry in the list
contains a number between 0-1 that tells us what proportion of light at that
wavelength reflects off of the object rather than getting absorbed. If I gave
you a light spectrum and a reflectance spectrum, it’s no problem to work out
what light gets to your eye: You just multiply all the corresponding pairs of
numbers together and take a look at your new list.
Figure 3 - Revisiting Figure 1 with a final step - the light that reaches our eye can be thought of as a literal product (multiplication) of the illumination and the reflectance.
So what does this have to do with what your visual system is
doing when you perceive color in complex images? One way to think about the
phenomenology of color constancy is
to say that the reflectance spectrum we’re talking about describes a property
of the object that shouldn’t change very much: A red car should always look
red. On the other hand, that light spectrum describes a property of a visual
scene that can change a lot: The amount of each wavelength of light that hits
an object depends on the time of day, the type of light source we’re thinking
about, the presence/absence of shadows, and many other factors. If we want to
know about the real visual world, we’d really like to start with the
measurement of light at our retina and work out what the light spectrum was and
what the reflectance spectrum of the object was. That way, we’d be able to make
a good guess about the stable properties of the object and the changing.
There’s a problem, though. What we’re really asking for is
something that we don’t have enough information to determine. We’re trying to
solve an inverse problem, which means
there was a process by which illumination and reflectance combined to make a
pattern of light, and we’d like to start with the pattern of light and work
backwards to the things that produced it. However, this inverse problem is also
underconstrained. That is, there’s no
way to work backwards to a single, best solution to our question. To see why,
consider how we said you could get to the light at the retina from the light
source spectrum and the reflectance spectrum – you’d multiply those two lists
of numbers together to get a new list. What we’re trying to do when we work
backward is more or less the same as the following problem:
“I multiplied two numbers together and the product was 120.
What are the two number?”
It should be obvious why this is an underconstrained
problem! The two numbers could be 12 and 10, 120 and 1, or 120/pi and pi.
Without knowing more, we can’t know which of these answers is correct. So what
can we do? How do we get the two numbers, and similarly, how does the visual
system come up with an estimate of reflectance to give us some measure of color
constancy?
The answer to both questions is the following: We make
assumptions about the nature of our answer, which allows us to narrow down the
possible solutions. For example, if we were thinking about the x*y=120 problem,
we could make our lives easier by making the following assumptions:
1)
Both x and y are positive integers.
2)
Both x and y are less than 20.
Now that we have these constraints, there’s fewer solutions
that fit the bill (only 12/10 and 8/15, specifically) which means we’re not
totally hopeless at coming up with an answer. To solve the problem of getting
the illumination and reflectance out of a luminance pattern, our visual system
also relies on some assumptions that provide constraints on the possibilities
we have to consider. We’ll discuss two different sets of assumptions that help
us draw some conclusions about reflectance and illumination from raw images,
one of which we’ll talk about in the context of color images, and the other
we’ll talk about in the context of grayscale images.
Correcting for global
color with “anchoring.”
Consider the two images below. Like many of our other
examples in this post and the previous one, we see that regions that would get
the same color label in both pictures nonetheless have very different
wavelength coming from them. The yellow parts look yellow in both images, for
example, but in the rightmost picture there’s obviously much more green light
getting to your retina. Our first look at using assumptions to help us recover
reflectance from images involves making one simple assumption about the reflectance
spectrum in the image that helps us “correct” for the overall greenish cast of
the picture.
To discuss how this one works, I need you to understand
something about the color images you see on your computer. Any such image is
actually three images in one – one that tells you how much red light is at each
location, another that tells you about the green light at each location, and
still one more that tells you how much blue light is at each location. At each
location (or pixel) in the image, you therefore have 3 numbers, R, G, and B, to
describe the amount of each kind of light there. For our purposes, we’re going
to assume that these three numbers are
good stand-ins for the LMS-numbers we talked about recording with your
cone photoreceptors. This means that those numbers are telling us about the
product of illumination and reflectance, and we’d like to know what those are
separately for R, G and B. To do so, we make a simple assumption:
1)
The largest intensity of each kind of light that
we can find in the image comes from something that reflects 100% of that light.
What does this mean? It means that if we look at all of the
R-values in the image, we’re going to assume that the biggest one we find came
from a part of the image that had a reflectance value of 1. Similarly, if we
look at all of the G-values or B-values, we’d make the same assumption. What
does that mean for our problem? It means that we can use the largest R, G, and
B values we find as an “anchor” or a point of reference, to work out what the
other reflectance values are in the rest of the image. Specifically, if we’d
like to know what the real reflectance is of a pixel at some position in the
image (which I’ll call row I and
colun j of the picture), then we can
use our largest R, G, and B values (which I’ll call Rmax, Gmax, and Bmax) to
calculate the reflectance at that pixel as follows:
R(i,j)reflectance
= R(i,j)original/Rmax
G(i,j)reflectance
= G(i,j)original/Gmax
B(i,j)reflectance
= B(i,j)original/Bmax
This should work because we’re assuming that each max value
sets an upper bound for the reflectance at each pixel, and we’re basically
adjusting the scale of R,G, and B reflectances everywhere else to agree with
that bound. Remember, these numbers that you get back are reflectances, so
they’ll be between 0 and 1. To see what color they would look like to you, you
can just multiply each number by 255 and feed the R,G, B numbers you get back
to a color picker to see the result.
Figure 5 - A simple RGB color picker will let you turn the new reflectance values you calculate into a color you can visualize.
Solving for
illumination and reflectance by classifying edges
Our first technique for estimating reflectance depended on
correcting something global about the colors in the image. Our second technique
will work a little differently, and involves thinking about local changes in
the image rather than global corrections. Specifically, rather than try to use
the whole image at once to correct for reflectance values, we’re going to
examine small parts of an image to help us obtain a record of illumination and
reflectance separately. To see how this works, we’re going to think about this
algorithm (which is called Retinex)
in the context of grayscale images. What’s more, we’ll make our lives even
simpler by thinking about 1-D descriptions of grayscale images – we’ll assume
that our images are made of vertical stripes, in other words, and only worry
about what changes in the picture as we move from left to right across the
picture. This means pictures like the ones on top in the figure below turn into
line graphs like the plots underneath. The x-axis tells us where we are in the
original image and the y-axis tells us what the intensity is at that location.
Figure 6 - A look at how illumination and reflectance make a luminance image in grayscale (top). If we plot the intensity of the image from left-to-right across the red line in each image, we can make a 1-D plot of the image that helps illustrate how the Retinex algorithm works.
Again, that number on the y-axis is the product of two
things: The intensity of the light falling on that location and the proportion
of that light being reflected by the surface. We only get to measure one
number, but we’d like to get both of those original numbers back. How do we do
it? We make another assumption, but this one is about the changes we see across
the image;
1)
Changes that are due to illumination are
gradual.
2)
Changes that are due to reflectance are abrupt.
It may be a little harder to see what this buys us, but
let’s walk through what this means for the image we’re seeing in the figure
above. At every location in the image, we can see how quickly the image is
getting either brighter or darker by looking at the slope of the line at that
point. If that slope is very steep, then we’ll assume that it’s because the
reflectance is changing a lot at that point. If that slope is relatively
shallow, we’ll assume that the reflectance is staying the same and the
illumination is changing. Because we’re interested in the reflectance, let’s
just mark down where there are very steep changes in the image – we’ll note
whether the line is going up or down and also how much it’s changing by. Such a
record would look like the image below.
Figure 7 - By only writing down where we see very steep changes in intensity in the image at the top, we can end up with a record of only reflectance changes in the image below. Note that this is subject to our assumption that only reflectance changes are "steep" in these plots.
But wait! If this is a record that tells us only where the
reflectance is changing, then we’ve managed to more or less remove the
illumination from the picture! That means we can treat this record of
reflectance changes like a set of instructions for how to make a version of the
image that only has reflectance information in it by moving left-to-right
across the picture. If we imagine that we start with a particular value of
gray, the changes in reflectance tell us when to change to something lighter or
something darker and also how much to change by. This means we can start
“painting” at the left edge of the picture and fill in different grayscale
values as we move to the right. Below, I’m showing you a new line graph that
I’m creating from the record of reflectance changes at left by treating those
changes in reflectance as instructions to change the intensity of my paint.
Figure 8 - Now that we have that record of reflectance changes, we can use it to create a version of the image that has no illumination changes. We move left-to-right adding grayscale values to our image according to our instructions from the reflectance record about when we need to change. When we encounter a positive line, we increase the gray value we're using to paint by that much. When we encounter a negative-going line, we decrease the gray value by that much. The result (at bottom) is a new version of the original image that has the gradual illumination changes removed.
This re-created picture that I end up with is what the image
looks like without the gradual changes in appearance that I thought were due to
illumination. That is, it’s what the surface “really” looks like in terms of
its reflectance! By making an assumption about how illumination and reflectance
should change across an image, I can end up with estimates of both of them from
an ambiguous starting point.
Now here’s the thing about making assumptions: You can very
easily be wrong. Both of these algorithms rely on assuming that something is
true out in the world, and there’s no reason it has to be. Take our last
algorithm, for example: You can absolutely apply paint to an object so that
it’s lightness changes gradually over its surface, or have an object that casts
a sharp shadow onto the ground. In both of these cases, the algorithm can still
be applied, but the answers it comes up with likely won’t be very good ones.
This situation, in which we have a way to solve an underconstrained problem by
making some guesses about the world, is something we’ll have to deal with again
and again as we continue studying perception. Most of the time, our assumptions
will be true and we’ll do alright estimating properties of the world from
measurements of the image. However, there will always be room for us to be
tricked by a world that’s not playing by our rules. Keep this in mind as we
keep going, and be particular mindful of what assumptions we’re making and
where they come from. These hold the key to understanding both how our vision
works and how our vision fails
sometimes.
Comments
Post a Comment