Skip to main content

Computing Color Constancy

Two paths to color constancy
Between our last post and the exercises in the Color Constancy lab, I hope you have a sense that your perception of color is definitely not just based on the wavelengths that are present in a mixture of lights. Instead, your visual system is doing something to adjust those wavelengths to give you a different estimate of color. But what is that estimate really for and how does your visual system arrive at it? Here, we’re going to start by trying to formalize the problem within a perceptual constancy  framework. This will involve describing a particular goal for the visual system and then pointing out why this goal is particularly challenging. Next, we’ll discuss two different ways to try and achieve this goal that rest on some specific assumptions about the way images of real world objects might work. We’ll see that these assumptions aren’t always true, but they’re true enough of the time that these procedures work.

To describe what the visual system might be trying to do when it makes these estimates of color based on (but not limited to) the raw wavelengths that are present in an image, let’s remind ourselves where the light that we measure at the retina comes from. Specifically, let’s think about the world beyond our visual system as being composed of light sources and surfaces. When you see light and transduce it with your photoreceptors, you’re really measuring the outcome of a process that began at the interface of those two types of stimulus. An object is only visible to you because some light source was shining light on it and some of that light reflected off of the object towards your eye. The wavelengths of light that get to your eye therefore depend on two things: (1) The wavelengths that were in the light source, and (2) The proportion of each wavelength that was reflected by the object.


Figure 1 - Out in the world, the pattern of light you see from an object is the product of the light that shines on it and the light that it reflects.

What does this mean? It means that the same light coming in to your eye could have come from many different combinations of light source and object. Suppose I shine a red laser pointer on a whiteboard. The light source only had long wavelengths of light in it, but the object was capable of reflecting light at all wavelengths. The result is that I end up with long-wavelength light at my retina. But now consider this: Suppose I shine a white light onto a bright red apple. Now the light source has light at all wavelengths, but the object is only reflecting long wavelengths of light. This is very different than the first scenario, but it ends the same way – I end up with long-wavelength light at my retina.


Figure 2 - In both of these images, I end up with long-wavelength light getting to my eye. Where that light came from differs a lot though when I consider both illumination and reflectance.

Mathematically, you already know how to think about this interaction between light sources and objects out in the world. The contents of the light can be described with a spectrum and the proportion of light that the object reflects can be described with a reflectance spectrum. This latter list of numbers is essentially the same as a filter – each entry in the list contains a number between 0-1 that tells us what proportion of light at that wavelength reflects off of the object rather than getting absorbed. If I gave you a light spectrum and a reflectance spectrum, it’s no problem to work out what light gets to your eye: You just multiply all the corresponding pairs of numbers together and take a look at your new list.


Figure 3 - Revisiting Figure 1 with a final step - the light that reaches our eye can be thought of as a literal product (multiplication) of the illumination and the reflectance. 

So what does this have to do with what your visual system is doing when you perceive color in complex images? One way to think about the phenomenology of color constancy is to say that the reflectance spectrum we’re talking about describes a property of the object that shouldn’t change very much: A red car should always look red. On the other hand, that light spectrum describes a property of a visual scene that can change a lot: The amount of each wavelength of light that hits an object depends on the time of day, the type of light source we’re thinking about, the presence/absence of shadows, and many other factors. If we want to know about the real visual world, we’d really like to start with the measurement of light at our retina and work out what the light spectrum was and what the reflectance spectrum of the object was. That way, we’d be able to make a good guess about the stable properties of the object and the changing.

There’s a problem, though. What we’re really asking for is something that we don’t have enough information to determine. We’re trying to solve an inverse problem, which means there was a process by which illumination and reflectance combined to make a pattern of light, and we’d like to start with the pattern of light and work backwards to the things that produced it. However, this inverse problem is also underconstrained. That is, there’s no way to work backwards to a single, best solution to our question. To see why, consider how we said you could get to the light at the retina from the light source spectrum and the reflectance spectrum – you’d multiply those two lists of numbers together to get a new list. What we’re trying to do when we work backward is more or less the same as the following problem:

“I multiplied two numbers together and the product was 120. What are the two number?”

It should be obvious why this is an underconstrained problem! The two numbers could be 12 and 10, 120 and 1, or 120/pi and pi. Without knowing more, we can’t know which of these answers is correct. So what can we do? How do we get the two numbers, and similarly, how does the visual system come up with an estimate of reflectance to give us some measure of color constancy?

The answer to both questions is the following: We make assumptions about the nature of our answer, which allows us to narrow down the possible solutions. For example, if we were thinking about the x*y=120 problem, we could make our lives easier by making the following assumptions:

1)   Both x and y are positive integers.
2)   Both x and y are less than 20.

Now that we have these constraints, there’s fewer solutions that fit the bill (only 12/10 and 8/15, specifically) which means we’re not totally hopeless at coming up with an answer. To solve the problem of getting the illumination and reflectance out of a luminance pattern, our visual system also relies on some assumptions that provide constraints on the possibilities we have to consider. We’ll discuss two different sets of assumptions that help us draw some conclusions about reflectance and illumination from raw images, one of which we’ll talk about in the context of color images, and the other we’ll talk about in the context of grayscale images.

Correcting for global color with “anchoring.”
Consider the two images below. Like many of our other examples in this post and the previous one, we see that regions that would get the same color label in both pictures nonetheless have very different wavelength coming from them. The yellow parts look yellow in both images, for example, but in the rightmost picture there’s obviously much more green light getting to your retina. Our first look at using assumptions to help us recover reflectance from images involves making one simple assumption about the reflectance spectrum in the image that helps us “correct” for the overall greenish cast of the picture.

 Figure 4 - The two images differ a lot in terms of the wavelengths reaching your eye, but you probably would call these colors the same thing in both images.

To discuss how this one works, I need you to understand something about the color images you see on your computer. Any such image is actually three images in one – one that tells you how much red light is at each location, another that tells you about the green light at each location, and still one more that tells you how much blue light is at each location. At each location (or pixel) in the image, you therefore have 3 numbers, R, G, and B, to describe the amount of each kind of light there. For our purposes, we’re going to assume that these three numbers are  good stand-ins for the LMS-numbers we talked about recording with your cone photoreceptors. This means that those numbers are telling us about the product of illumination and reflectance, and we’d like to know what those are separately for R, G and B. To do so, we make a simple assumption:

1)   The largest intensity of each kind of light that we can find in the image comes from something that reflects 100% of that light.

What does this mean? It means that if we look at all of the R-values in the image, we’re going to assume that the biggest one we find came from a part of the image that had a reflectance value of 1. Similarly, if we look at all of the G-values or B-values, we’d make the same assumption. What does that mean for our problem? It means that we can use the largest R, G, and B values we find as an “anchor” or a point of reference, to work out what the other reflectance values are in the rest of the image. Specifically, if we’d like to know what the real reflectance is of a pixel at some position in the image (which I’ll call row I and colun j of the picture), then we can use our largest R, G, and B values (which I’ll call Rmax, Gmax, and Bmax) to calculate the reflectance at that pixel as follows:

R(i,j)reflectance = R(i,j)original/Rmax
G(i,j)reflectance = G(i,j)original/Gmax
B(i,j)reflectance = B(i,j)original/Bmax

This should work because we’re assuming that each max value sets an upper bound for the reflectance at each pixel, and we’re basically adjusting the scale of R,G, and B reflectances everywhere else to agree with that bound. Remember, these numbers that you get back are reflectances, so they’ll be between 0 and 1. To see what color they would look like to you, you can just multiply each number by 255 and feed the R,G, B numbers you get back to a color picker to see the result.


Figure 5 - A simple RGB color picker will let you turn the new reflectance values you calculate into a color you can visualize.

Solving for illumination and reflectance by classifying edges
Our first technique for estimating reflectance depended on correcting something global about the colors in the image. Our second technique will work a little differently, and involves thinking about local changes in the image rather than global corrections. Specifically, rather than try to use the whole image at once to correct for reflectance values, we’re going to examine small parts of an image to help us obtain a record of illumination and reflectance separately. To see how this works, we’re going to think about this algorithm (which is called Retinex) in the context of grayscale images. What’s more, we’ll make our lives even simpler by thinking about 1-D descriptions of grayscale images – we’ll assume that our images are made of vertical stripes, in other words, and only worry about what changes in the picture as we move from left to right across the picture. This means pictures like the ones on top in the figure below turn into line graphs like the plots underneath. The x-axis tells us where we are in the original image and the y-axis tells us what the intensity is at that location.


Figure 6 - A look at how illumination and reflectance make a luminance image in grayscale (top). If we plot the intensity of the image from left-to-right across the red line in each image, we can make a 1-D plot of the image that helps illustrate how the Retinex algorithm works.


Again, that number on the y-axis is the product of two things: The intensity of the light falling on that location and the proportion of that light being reflected by the surface. We only get to measure one number, but we’d like to get both of those original numbers back. How do we do it? We make another assumption, but this one is about the changes we see across the image;

1)   Changes that are due to illumination are gradual.
2)   Changes that are due to reflectance are abrupt.

It may be a little harder to see what this buys us, but let’s walk through what this means for the image we’re seeing in the figure above. At every location in the image, we can see how quickly the image is getting either brighter or darker by looking at the slope of the line at that point. If that slope is very steep, then we’ll assume that it’s because the reflectance is changing a lot at that point. If that slope is relatively shallow, we’ll assume that the reflectance is staying the same and the illumination is changing. Because we’re interested in the reflectance, let’s just mark down where there are very steep changes in the image – we’ll note whether the line is going up or down and also how much it’s changing by. Such a record would look like the image below.

Figure 7 - By only writing down where we see very steep changes in intensity in the image at the top, we can end up with a record of only reflectance changes in the image below. Note that this is subject to our assumption that only reflectance changes are "steep" in these plots.

But wait! If this is a record that tells us only where the reflectance is changing, then we’ve managed to more or less remove the illumination from the picture! That means we can treat this record of reflectance changes like a set of instructions for how to make a version of the image that only has reflectance information in it by moving left-to-right across the picture. If we imagine that we start with a particular value of gray, the changes in reflectance tell us when to change to something lighter or something darker and also how much to change by. This means we can start “painting” at the left edge of the picture and fill in different grayscale values as we move to the right. Below, I’m showing you a new line graph that I’m creating from the record of reflectance changes at left by treating those changes in reflectance as instructions to change the intensity of my paint.

Figure 8 - Now that we have that record of reflectance changes, we can use it to create a version of the image that has no illumination changes. We move left-to-right adding grayscale values to our image according to our instructions from the reflectance record about when we need to change. When we encounter a positive line, we increase the gray value we're using to paint by that much. When we encounter a negative-going line, we decrease the gray value by that much. The result (at bottom) is a new version of the original image that has the gradual illumination changes removed. 

This re-created picture that I end up with is what the image looks like without the gradual changes in appearance that I thought were due to illumination. That is, it’s what the surface “really” looks like in terms of its reflectance! By making an assumption about how illumination and reflectance should change across an image, I can end up with estimates of both of them from an ambiguous starting point.

Now here’s the thing about making assumptions: You can very easily be wrong. Both of these algorithms rely on assuming that something is true out in the world, and there’s no reason it has to be. Take our last algorithm, for example: You can absolutely apply paint to an object so that it’s lightness changes gradually over its surface, or have an object that casts a sharp shadow onto the ground. In both of these cases, the algorithm can still be applied, but the answers it comes up with likely won’t be very good ones. This situation, in which we have a way to solve an underconstrained problem by making some guesses about the world, is something we’ll have to deal with again and again as we continue studying perception. Most of the time, our assumptions will be true and we’ll do alright estimating properties of the world from measurements of the image. However, there will always be room for us to be tricked by a world that’s not playing by our rules. Keep this in mind as we keep going, and be particular mindful of what assumptions we’re making and where they come from. These hold the key to understanding both how our vision works and how our vision fails sometimes.

Comments

Popular posts from this blog

Monocular cues for depth perception

Monocular cues for depth perception In our last post, we discussed how you can use the information from your two eyes to estimate the relative depth of objects in the visual field. Both vergence and binocular disparity provided cues to where objects were situated in depth relative some fixation point, allowing us to obtain some information about the 3D arrangement of objects in space. Clearly, two eyes are helpful in resolving the ambiguity that follows from projecting the light coming from a three-dimensional scene onto a two-dimensional surface. We said that this projection of light onto the retina made it much harder to make good guesses about depth from one image, and that using two images was a necessary step towards making it possible for you to recover this information from the retinal data. However, consider the picture below:   Figure 1 - A boid in some trees. Some things here look closer to you than others, but how do you tell that when you don't have bino...

Observing the retina (and what it can do)

Observing the retina (and what it can do) Now that we’ve seen how images are formed inside of a pinhole camera, we have a sense of how patterns of light from the environment become patterns of light inside the eye. The next question is how those patterns become signals that can be sent from the eye to the brain. This process is called transduction , and within the eye, the structure that actually transduces light is called the retina . How does this bit of tissue sense light? Something must be happening that turns light into an electrical signal, but what? We’ll develop a quantitative model of how this works, but first, we’ll try to develop a basic understanding of the retina based on some simple observations. Compared to some of our previous discussions, this is going to be a little trickier – the retina is inside our eye, for example, so we can’t just look at the parts of it the way you were able to look at your own pupil. Instead, we’re going to adopt a dual strategy of (1) Makin...

Introduction to high-level vision

Introduction to high-level vision With this blog post, we begin to talk about a new and challenging topic in human vision: high-level vision. Before, we go any further, we need to address an important question: How is this different than what we’ve been talking about before?   To date, we’ve spent a lot of time talking about some of the basic principles regarding the way your visual system measures the light in your environment. For example, you have photoreceptors that transduce light and provide a signal that carries information about the wavelength of the light you saw. Further on in your visual system, cells in the LGN and V1 measure various patterns of light and send information on about the dots, edges, and lines you may have seen. These kinds of processes that are mostly concerned with making measurements that describe the incoming light are what I’d refer to as low-level vision . Figure 1  - Measuring the wavelength content of a light or the pattern of li...