Skip to main content

Visual Search - What makes it hard to find things?

Visual Search - What makes it hard to find things?

For our last post (at least, I think it is), we’re going to discuss another problem in high-level vision: visual search. By visual search, I mean more or less what you probably think: The problem of searching for something in a cluttered display. For example, where is “Waldo” in the image below?


Figure 1 - Finding an object in clutter can be challenging. "Where's Waldo?" books play with search difficulty by manipulating a number of properties of search displays.

Naively, you might think that a problem like this more or less boils down to carrying out your procedures for object recognition a bunch of times. To look for Waldo (or your keys, or a particular street corner on a map), don’t you just have to look around a bunch within the scene and try to recognize him as you go? To some extent, yes. However, there are several ways in which visual search seems to have different properties than we’d expect if we were really just using our object recognition over and over again. Our goal in this post is to do a couple of different things to discuss this in more depth: (1) Define ways to measure search performance carefully, with special attention to describing when a search task is easy and when it is difficult, (2) Point out some easy/difficult search tasks that make it hard to conclude that search is “just” object recognition, (3) Propose a model of visual search that allows us to make good guesses about when a search task will be easy or difficult. Ideally, this model should allow us to talk about what it’s like to search for Waldo, to look for information in a cluttered user interface (See picture below), or to predict what people will do in lab settings.


Figure 2 - Search problems are common in real life. Your desk, your closet, or your vehicle (like this cockpit) all may contain objects you want to find amidst a lot of distracting elements.

Let’s begin by formalizing visual search tasks more carefully and identifying ways to measure performance and evaluate different ways that search tasks can be made more or less difficult. A typical visual search task in a laboratory will probably look a lot like the figure below: On any individual trial of an experiment, you would be shown some kind of array of objects (in this case, dots of different colors) and asked to make a judgment about the presence of a target within that array. By a target we’re referring to an object or objects that differ in some objective way from the other objects in the array, which we will refer to as distractors. Your job in a typical search might be to locate the target, perhaps by pointing to it or using a mouse to click on it. Alternatively, you could also be asked to report the presence of absence of a target without actually locating it. That is, the experimenter may choose to present you with some arrays that have a target and other arrays that don’t have a target. In this case, the question is how accurate you are at distinguish between these two scenarios. In either case, there are a number of things the experiment may choose to vary across different trials to present you with arrays of objects that may be more or less difficult for you to evaluate: (1) The experimenter may change how similar the target is to the distractors, (2) The experimenter may change how many distractors there are, (3) The experimenter may change how different the distractors are from one another. There are lots of other factors they may also change, but these will be the basis for a lot of important stuff for us to try and explain.


Figure 3 - Laboratory search tasks typically look like this: Participants need to either locate a target object, or report its presence or absence in an array of non-targets (distractors).

Now that we know how to run experiments on visual search, the next question is how we measure which tasks are easy and which tasks are hard. Broadly speaking, there are two ways we can measure your skill at finding targets in an array of distractors: (1) We can measure how accurate you are, (2) We can measure how fast you are. While both of these are good ways to measure search difficulty, easily the most widely used measure of search performance is something called the set-size slope, which is a way of relating your speed at finding targets to the number of items in the array. Specifically, the set-size slope is defined as the slope of the line we obtain if we plot your response time to correctly find a target against the number of distractors in the array of objects. The logic here is that in an easy search task, adding more distractors shouldn’t slow you down very much – each new object incurs a small cost to your speed, so the line described above rises slowly. On the other hand, a difficult search task has the opposite property – each new object costs you a lot in terms of speed, so that line rises much more quickly. The nice thing about this measure is that we can easily compute it for any task in which we can vary the number of distractors in an array, and it can vary continuously from very easy tasks to very difficult ones.

Remember, our starting point (a null hypothesis of sorts) is that maybe we can understand visual search just by understanding object recognition. Maybe visual search is just the repeated application of object recognition to tell a target apart from distractors. Almost immediately, however, we run into some difficulty with this story once we do a little work to establish how good people are at search in different settings. Consider the picture below, which shows you the set-size slope for a simple search task: finding a black target in an array of white distractors.


Figure 4 - The set-size slope relates your speed at finding a target to the number of items in the array you needed to search within. A flat slope like this one indicates that adding more items didn't make the task more difficult: The target pops out even among many non-target elements. 

This search is so incredibly easy that the set-size slope is zero! This kind of search is often referred to as a pop-out or parallel search to reflect the fact there is a sort of all-at-once quality to how you see the target in the larger display. You don’t really have to look for it, it just sort of appears immediately regardless of how many distractors there are. Contrast this with the picture below, that shows the set-size slope for a more difficult search task: finding the black-on-white X amid the white-on-black distractors.


 Figure 5 - A harder search task may lead to a positive set-size slope like this one. In this case, adding more distractors makes the search take longer and longer.

This search is much harder – adding more distractors does make you take longer and longer, indicating that you might be doing something like looking at each item one at a time to see if it’s the target or not. This kind of search is called serial search to reflect that one-item-at-a-time quality.

Together these, two displays have already given us some things to think about. First, whatever is happening in the first kind of task suggests that you are able to do something to evaluate the entire array of objects at once, which seems a little different than the way we thought about object recognition working before. Here’s another thing that’s a little odd: If search was just object recognition, we’d think that search tasks would be easier or harder based on how similar the target is to the distractors – a similar target should be harder to tell apart from the distractors around it. But how similar is the target to the distractors in the easy search task above? The only difference between them is the color of the two lines, right? OK, now what’s different about the two kinds of X’s in the harder search display? The color of the two lines, right? So why are these two tasks so different in terms of how difficult they are? We either need to think more carefully about what similarity is (and that’s possible) or we need to consider that search is subject to different principles than object recognition.

Here’s another search task that turns out to be pretty tough. Compare the array on the left to the array on the right. In both cases, you’re looking for the same target (a vertically-oriented white rectangle), but it tends to be a pop-out, parallel search on the left and a slower, serial search on the right. Again, this is a little strange: If the issue is how similar the target is to each of the distractors, each of the comparisons between a single target and a distractor in the array on the right is just as easy as the target/distractor comparisons on the left. What makes the display on the right harder to work with then? This kind of search (called a conjunction search, to reflect the fact that the target is defined by two attributes together) is a good hint that it’s not just the similarity between the target and the distractors that matters, there must also be some property of all the distractors together that also affects what we’re doing when we try to search for a target. Distractors that are different from one another seems to be harder to deal with than distractors that are all very similar.


Figure 6 - Feature search tasks, like the one on the left, are easy because there is only one piece of visual information you need to know to identify the target. Conjunction search tasks, like the one on the right, are harder because you need to know about the conjunction of two pieces of information (white and vertical) to find the target.

 How can we use these various observations do develop a technique for guessing how hard a search task will be? We have a number of things to think about that we know affect search difficulty, so let’s recap: (1) The number of distractors may matter, (2) The similarity between the target and the distractors will almost certainly matter (see below), and (3) How different the distractors are from one another may also matter. What we’d like is a way to combine these three properties of an array of objects into some kind of measure that tells us how different the target is likely to look from the distractors – an answer of “very different” should imply that the search task will be easy, and an answer of “very similar” should imply that the search will be difficult.


 Figure 7 - In general, the similarity between the target and the distractors will make search easier or harder. This is only one factor that governs how difficult a search task is however. To understand search more completely, we need a way to take these other factors into account.

The key insight we will use to introduce such a measure is this: Using these properties to quantify how different a target is from a distractors sounds an awful lot like doing statistics. In many statistical tests, we’re asking questions very much like this: How different is a distribution from a single value? How different are two distributions from each other? In the context of statistics, these questions are typically answered by calculating a particular test statistic that encapsulates information about things like the difference between the single value and the average value of a sample (like or target/distractor similarity above), the number of samples (like our number of distractors), and the variability within the sample (or our need to consider how different distractors are from one another). What we’re going to do then, is adapt a simple technique from statistics to make guesses about how difficult search displays should be – specifically, we’re going to imagine that visual search is really a single-sample t-test.


 Figure 8 - The t-statistic is a way to quantify how different a group (or distribution) of values is from a single value. The magnitude of this quantity depends on the similarity or distance between the target value and the average value of the distribution (the numerator) and the 'spread' of the distribution around it's mean. A high t-statistic implies a distribution that is quite different from the single value.

Consider the formula above, which is the expression for the test statistic, ‘t’, that one uses to determine if a distribution of values is different from a single value. First, note that this is more or less what we want to do, except we’d say it a little differently: We want to know if a single item (the target) ‘stands out’ from a sample (the distractors). Second, recall from the statistics class that I hope you’ve taken (and no doubt done very well in, too) that large values of ‘t’ imply just this – that there is a significant difference between the single value and the distribution of values in the sample. So how will we use this to evaluate how difficult a search display is?
            We have to begin by having some means of describing the target and the distractors numerically, and this will depend on the search task at hand. If we’re looking for a red target among green distractors, for example, we may want to use the LMS values for each color. If we’re talking about looking for a vertical line among tilted lines, we may use the orientation of each item instead. Regardless, we need to be able to do two things: Say what value belongs to each item (target or distractor) and calculate the difference between those values. That difference is easy to compute when we use a single number per item – we just subtract. If we’re using something like LMS values that assign 3 numbers to each item, we’ll have to do something more complicated like compute the distance between elements in the array rather than just a simple subtraction. Luckily, these are all things you know how to do by now!
            So what are all these terms? Let’s start with the easy ones in the numerator: The symbol m stands for the value assigned to the target, so that’s straightforward. Now, what about the X with a bar over it? This stands for the average value of the distractors. To calculate this, you will add up all the distractor values and divide by the total number of distractors. Again, for a single-valued distractor this is easy, but for a multi-valued one it’s not much harder: You’d just average each LMS number separately, for example. Now that you’ve got an average value for your distractors and a single value for your target, the numerator is just the difference (or distance) between those two. In this way, the numerator captures how the similarity between the target and the group of distractors contributes to the overall difficulty of the search task.
            What about the denominator? This gets a little trickier, but not by much. First, the easy part – that ‘n’ just stands for the number of distractors, so it should be no big deal to take the square root of that. The ‘s’ stands for the sample standard deviation, which is a measure that tells you how ‘spread out’ the distractor values are around their own average. You calculate ‘s’ using the following expression (see below) or you use something like Excel, Matlab, or R to tell you what it is for a list of elements. Either way, it’s also defined either for single-valued distractors or multi-valued distractors. Once you have it in hand, you’ve now got a denominator that tells you how similar the distractors are to each other, which also governs how hard or easy a search task is.

Figure 9 - The sample standard deviation is calculated using this formula, where 'x' is the value of each element in the distribution of distractors and 'n' is the number of distractors. 

With the numerator and the denominator in hand, you can just divide and come up with a value of ‘t’ – the larger it is, the easier the search task will be. Ideally, this will let us predict something like the set-size slope for a range of different tasks, including the ones we’ve already seen and new ones like the one pictured below:


Figure 10 - How hard will it be to find the pink dot in the midst of all these other dots? If we knew the LMS values of each dot, we could use these to calculate a t-statistic that would predict how hard this task should be.

Let’s walk through a simple example of how we might calculate the value of ‘t’ for two different search displays. In each case, we’re going to be looking for a medium-white dot among 4 distractors that are different brightnesses. In the first case, all 4 distractors will mostly be dark grey. In the second case, the 4 distractors will have a range of different gray values. Which task should be more difficult?


Figure 11 - We can use the t-statistic to determine which search task should be harder. In both cases, your target is the dot with a gray value of 192.

We need to start by describing each item numerically, and in this case, the intensity of the light being reflected by the dot is probably our best bet. I’ve written these gray-values next to each dot in the figure above so you can see them, and these will be the numbers we’re using to calculate the values in our t-statistic. First, the easy stuff, the target in each case has a gray-value of 192, so we just plug that in. Second, the average value of the distractors is also easy to calculate, so let’s just do that – in the first case, we get an average value of 58.75, and in the second case, we get an average value of 112.5, so we can plug those in as well and subtract to get our numerators (133.25 in the first case, and 79.5 in the second). Now for the denominators – we need to calculate the sample standard deviation for each set of distractors. In the first case, this comes out to 6.29, and in the second case, this comes out to 75.44. (I’m not spelling out how to use the formula for this because you can either ask Excel to do it for you, or just plug and chug based on the expression above. Now these get plugged in as well, along with the value of ‘n’ (being 4, this gives us a nice square root of 2). My two denominators are thus 3.15 and 37.72. The final step is to just divide each numerator by each denominator, yielding a value of ‘t’ – I get 42.30 in the first case, and 2.1 in the second case. This suggests that first search should be much easier than the second, since the target is statistically more different. Without actually measuring performance, we have a good guess regarding how difficult each task might be.

            Here’s the thing, though, this model isn’t perfect and like all of our algorithms it relies on a number of assumptions. Here at the end of all things, I’m not going to point these out for you because I hope you’ve come far enough along that you can look at something like this procedure and think through what might make it do something a little strange that doesn’t match with your visual experience! If nothing else, however, this is one last example of how we can use some simple calculations to make estimates of how our vision will work: What it can do, what it can't, and how difficult it will be to make certain judgments. 

Comments

Popular posts from this blog

Motion perception and sampling in space and time

Motion perception and sampling in space and time The next property of real objects that we want to be able to recover from images involves introducing a feature of your experience that we haven’t considered before: change over time. Besides having reflectance properties that are independent of the light falling on them and occupying positions in 3D space, objects also change the way they look over time. That is to say, objects can move . One of those objects that can move is you, which is another way that the images you receive on your retina change from moment to moment. We’d probably like to know something about the real movements that are happening out there in the world that give rise to the changing images that we measure with our eyes and our brain, so how do we interpret change over time in a way that allows us to make guesses about motion? To motivate this discussion, I want to start thinking about this problem by considering a simple model system that represents a basi

Visual processing in the retinal ganglion cells and the LGN

Visual processing in the retinal ganglion cells and the LGN To continue discussing how your vision works, we’re going to have to abandon structures that are relatively easy to see (like your retina – which is tricky, but not impossible, to see directly) and start talking about parts of your visual system that aren’t so accessible. Our next stop will be cells in two different locations: We’ll consider cells called retinal ganglion cells (or RGCs) and cells within a structure called the lateral geniculate nucleus (or LGN). We’ll be talking about these cells together because it turns out that they have very similar computational properties even though they’re located in different parts of your visual system. The retinal ganglion cells are located in a layer just “above” your photoreceptors if you’re looking at a cross-section of your eye, and receive inputs directly from the rods and cones. The lateral geniculate nucleus is a good bit further along – the retinal ganglion cells send p

What does light do?

What does light do? In the first set of lab exercises, you should have been able to observe a number of different behaviors that light can exhibit. What do these different behaviors tell us about the nature of light, and do they provide any insights regarding what exactly is different about lights that are different colors? Let’s talk a little bit about what exactly light did in each of these different situations and what we can conclude about the nature of light as a result. Reflection This set of observations probably didn’t hold too many surprises for you, but we should still talk carefully about what we saw. First of all, what did light do when it encountered a mirror? When we measure the various angles that light makes relative to the surface of a mirror on the way in and the way out, we should find that the incident angles of the light are equal (see Figure 1). Figure 1 - When light reflects, it makes equal incident angles relative to the surface. By itself,