Visual Search - What makes it hard to find things? For our last post (at least, I think it is), we’re going to discuss another problem in high-level vision: visual search. By visual search , I mean more or less what you probably think: The problem of searching for something in a cluttered display. For example, where is “Waldo” in the image below? Figure 1 - Finding an object in clutter can be challenging. "Where's Waldo?" books play with search difficulty by manipulating a number of properties of search displays. Naively, you might think that a problem like this more or less boils down to carrying out your procedures for object recognition a bunch of times. To look for Waldo (or your keys, or a particular street corner on a map), don’t you just have to look around a bunch within the scene and try to recognize him as you go? To some extent, yes. However, there are several ways in which visual search seems to have different properties than we’d expect if w
Object Recognition The first problem in high-level vision that we’ll consider is the problem of object recognition . What I mean by object recognition is the ability to look at an image of something and say what it is. Are you looking at a cat, a dog, a person, a car, or something new that you’ve never seen before? Coming up with some kind of label to assign to a part of an image based on what you think it is more or less encompasses what object recognition is. There are several different kinds of labels we might try to assign to the things we look at, and it’s worth distinguishing between these before we start to think about this problem. First, we might think about naming objects using the sort of common labels you use all the time to refer to the things around you: mug, pencil, shoe, etc. Identifying objects using these kinds of labels is what we call basic-level categorization . The definition of what the basic level is is honestly a bit hand-wavy: We usually define it as th