Department of Philosophy
103 Manchester Hall U-2054
University of Connecticut
Storrs, CT 06269-2054
Early Content Conference, University of Maryland, 22 April 2006
Three different ways to understand the representational content of the feature maps employed in early vision are compared. First is Stephen Kosslyn's claim, entered as part of the debate over mental imagery, that such areas support "depictive" representation, and that visual perception uses them as depictive representations. Reasons are given to doubt this view. Second, an improved version of what I call "feature-placing" is described and advanced. Third, feature-placing is contrasted with the notion that the representational content of those feature maps could be conveyed in a list of sentences about visual objects. Some problems with this last alternative are described.
On the periphery of the vast battle over mental imagery there occurs a minor skirmish that is worthy of our attention. In his recent contributions, Stephen Kosslyn has claimed that visual mental imagery uses some of the same brain mechanisms as does visual perception (in particular V1), and (more importantly for our purposes) that neuroscience shows that those mechanisms use depictive representation. This second claim is worth scrutinizing; it prompted this paper. Kosslyn says:
Without question, topographically organized cortical areas support depictive representations that are used in visual perception. These areas are not simply physically topographically organized, they function to depict information. For example, scotomas--blind spots--arise following damage to topographically organized visual cortex; damage to nearby regions of cortex results in blind spots that are nearby in the visual field. Moreover, transcranial magnetic stimulation of nearby occipital cortical sites produces phosphenes or scotomas localized at nearby locations in the visual field. These facts testify that topographically organized areas do play a key role in vision, and that they functionally depict information. (Kosslyn, Thompson & Ganis 2002, 200)
What is "depictive"?
Let us try to be clear on what we take to be the central issue: does visual mental imagery rely (in part) on a distinct type of representation, namely, one that depicts rather than describes? By "depict" we mean that each portion of the the representation is a representation of a portion of the object such that the distances among portions of the representation correspond to the distances among the corresponding portions of the object (as seen from a specific point of view; see Kosslyn 1994...) (Kosslyn, Thompson & Ganis 2002, 198)
A depictive representation is a type of picture, which specifies the locations and values of configurations of points in a space. ... In a depictive representation, each part of an object is represented by a pattern of points, and the spatial relation among these patterns in the functional space correspond to the spatial relations among the parts themselves. Depictive representations convey meaning via their resemblance to an object, with parts of the representation corresponding to parts of the object... (Kosslyn 1994, 5)
Is it true that topographically organized "feature maps" engage in depictive representation? Kosslyn says it is true "without question". Is he right?
The question turns out to engage many old and still outstanding questions about sensory processes. One oldie but goodie is simply: Are sensory processes intentional, or are they at best pseudo-intentional? The question dates back at least to Aquinas, if not to the Stoics. Is a sensation of warmth about warmth in the same way that a thought of Mary is about Mary?
Now I would suggest as a working hypothesis that our cognitive architecture employs at least as many different forms of internal representation as does, say, the switching network of the newly reborn AT&T. Many different kinds of data structures are allowed, and variables and values in them likewise are allowed to range over entities and properties of the most diverse sorts. It is not the case that the only form of representation employed is the registration of propositions, and it is not the case that the variables in these data structures always and only range over material objects. We have not only propositions but arrays of one, two, or n dimensions; linked lists, binary trees, dynamic data structures, stacks, queues, and so on. A node in a dynamic data structure can record values for any entity x such that a programmer finds it useful to think of x as an entity. So these "objects"--the values of these variables--might be such "things" as the digitized packets into which a phone conversation is chopped up these days; traffic levels in the neighbors of this switch; routing retries; dropped packets; IP addresses. Somewhere in there we have reconstitute the sequence of packets, and it would be useful to have a linked list of the sequencing. These get modified massively and on the fly; different packets of the same conversation might take very different routes through the network from point A to point B and back; but the end result is a sequence of sounds at the other end of the "line" that sounds like a person talking.
Traditionally the contrast is framed as one between sensory processes and linguistically mediated thought. The touchstone for intentionality is natural language, with its full apparatus of sortals, identity, definite description, pronominal cross reference, and indirect discourse. But if we allow that organisms might employ systems of mental representation distinct from sentences in a natural language, the contrast between "intentional" and "non-intentional" has to be recast. Instead of asking "it is intentional or not?" a more useful question is "in what ways is it intentional?" That is, there is a third variable in play here, namely: what sort of system of representation is in play. Lesser systems of representation will yield some distinctive pattern of capacities and incapacities. Sensory processes might have these hallmarks of intentionality but not those. It is not a simple yes/no question; the answer depends on what kind of representational system, with what sorts of intentional, pseudo-intentional, and non-intentional capacities one has in mind.
My plea is that we should not think of vision as being more constrained than is AT&T when it comes to the variety of data structures that it might employ, and the variety of entities over it which it might be useful to quantify. This is a biological system after all, and if you spend even a second considering the varieties of lifeforms found inhabiting even the tiniest of ecosystems, you'll realize that there is good reason to expect there to be many different forms--the representational equivalents of everything from viruses, fungi, molds, slime molds, algae, bacteria, and so on, all the way up to the most complicated creatures, with all their weirdly different life styles. The representational forms found in visual systems in the wild are likely to be no less variegated than the life forms you find in your garden. There is no good reason to assume that they are all of the same kind, or that they all work in the same way.
The word "object" can of course be used as a all-encompassing ontological placeholder, meaning: any entity over which one cares to quantify. In this sense there is no disputing that visual representation is always about objects. (But in that same sense, all the data structures in AT&T's computers are about objects; and all "object oriented programming" is always and only about objects. It is universal only because it says nothing.)
A second preliminary point: it is also a truism that the entities represented by visual representations, of any sort, almost always turn out in fact to be physical objects (or at least physical phenomena) in front of the eyes. So for example I will argue that cells in layer III of V1 represent orientations of edges--one might register a short edge, at orientation theta, thereabouts. These edges usually turn out to be edges of objects in front of the eyes; that particular one, for example, might be a portion of the edge of your desk. So is that cell in layer III of V1 representing a portion of the edge of your desk? Well, yes and no. Yes in the sense that that is what it turns out to be. No in the sense that it does not represent it as an edge of a desk. All it represents is edginess, oriented like so, thereabouts. V1 does not know about desks. It does not know about edges of physical objects. At this point these "edges" are nothing more or less than discontinuities of intensity. There is a change in the luminance thereabouts. It could be a luminance edge (a shadow), a reflectance edge (a change in the surface, under constant luminance), an occlusion edge. V1 just represents them all as edges.
So in order for us to say that visual representation is always and only the representation of objects, what we mean is that always and only the entities represented are represented as objects. It is not sufficient to show that they turn out to be objects; that is admitted on all sides. The non-trivial thesis is that there is enough in the content of the representations for us to be confident that the entity represented is represented as an object.
I suggest that the "ontology" of vision is not yet known, and that it need not all be of a piece. That is, take the values of the variables over which it quantifies in all of the various systems of representation it uses at all its different levels. What are those entities?
It is clear that at some stage or level we will get variables whose values are persistent, bounded, separately movable, distinct from the background, objects. But there are lots of other stages too. And yes, one can force them all into one mold, if we allow sufficient latitude to our notion of what is an "object". So for example if the sky counts as an object, then perception of the sky can be treated as perception of an object. (But it's a pretty strange one.) But then there are plenty of other weird visual entities that we say we see. Light and shadow. Glares, glints, reflections. Haze and mists. A blur of moving fingers. Afterimages. Floaters. Light and shadow are weird enough, ontologically speaking. You see the sunlight on the trees as well as seeing the trees. You see the branch of the tree, the color of the branch, and the light on the branch, all more or less, if not exactly, in the same place. Even this simple scene seems hard to capture if all the variables must range over objects.
V1 is a portion of cortex said by neuroscientists to contain many different "feature maps". We can now consider Kosslyn's proposition: that "without question" they support depictive representation.
The core notion of a "feature map" in neuroscience is, I think, a region of cortex organized topographically. But everything hangs on how one understands the term "topographical". The simplest interpretation is anatomical. The fibers coursing into the cortical area come from some source region or regions, also within the nervous system. In a "topographical" organization, there are local regions in the source within which neighboring cells project, more or less, to neighboring cells in the destination. There might be several such local regions, between which there can be abrupt discontinuities in the projections. A prominent example is found in the retina: the left side of each retina projects to the left side of the brain, and the right to the right. So we find a topological "tear" right down the middle of the retina. But within each region, neighborhood relations are (pretty much) retained.
It should be obvious that mere topographic organization is not by itself sufficient to show that the cortical region in question employs pictorial or depictive representation. That way of organizing the fiber bundles can be better ascribed to physiological economy (fewer crossovers and shorter bundles) or neural development (easy ways to grow the things) than to features of our cognitive architecture. Furthermore, the cortical region may be representing something other than location altogether. For example, an auditory feature map can be topographically organized, respecting neighborhood relations on the basilar membrane, but this makes it a tonotopic map, of different frequencies, not different places. Mustached bats have auditory maps across which we get systematic variation in Doppler shift (see Suga 1990). It is not mapping space, but rather relative velocities.
What then is needed for these regions of cortex to be, also, maps of space? This conclusion is not automatic! A second necessary condition can be put as follows: the region must provide the wherewithal for some variety of spatial discrimination. It carries information about spatial properties and relations of its targets in such a way as to allow the organism to navigate. Without this it wouldn't contribute to what I think of as "feature-placing".
This second condition is certainly true of V1. Damage to V1 causes scotomas, or "blind" portions of the field, within which unprompted visual discrimination can become entirely impossible. A precise perimetry of the boundaries of the scotoma can help the neuropsychologist diagnose where the cortical damage took place. So it is clear that, thanks to V1, the creature can make certain spatial discriminations that it otherwise cannot make. If you doubt this, just consider what it loses in those scotomas.
There is a third and stronger condition satisfied by visual areas V1-V4. In them, and in many other sensory areas, cells in a feature map have identifiable receptive fields: regions in circumambient space within which stimuli of a specified kind can affect, and are reliably correlated with, the activation level of the given cell. Furthermore, cells that are neighbors in the cortical region in question often have receptive fields that are neighbors in circumambient space. When they do, one can see the strong analogy that inclines neuroscientists to call the thing a "map": it is a topographically organized array within the organism that seems to represent places outside and around the organism.
The problem with any analogy is that one never knows which features of it are meant to be taken literally, and which not. Is V1 a "map" in the sense in which a road map, or a weather map, is a map? Different places on such maps represent different places in the world, and there is a system of projection, a homomorphism, which enables one to move back and forth between the map and the world. Relations between some places in the world are represented by relations between tokens with distinct places on the map. Kosslyn's description of depiction is one way to characterize that set of relations; it is not a bad way to describe such useful artefacts as road maps or weather maps. I'll also call the latter depictions "maps of space".
Now we can confront the question. Are these regions of cortex "without question" depictive? If we consider V1, for example, the best possible case for calling it "depictive" give us three premises. First, that we have an orderly projection of fiber bundles from its source (mostly LGN) to V1. Second, the region must enable some spatial discriminations. Third, neighbors in V1 typically have receptive fields that are neighbors. And it functions in accord with this principle, as Kosslyn points out.
These three premises, so far, do not imply that the map is depictive, i.e., that points and distances within V1 map homomorphically onto points and distances within the ambient optic array. For it to be literally a map of space, it would have to sustain those spatial discriminations in just one way, via a homomorphism with spatial properties. As Kosslyn puts it, it must be such that "distances among portions of the representation correspond to the distances among the corresponding portions of the object" (Kosslyn, Thompson & Ganis 2002, 198).
That V1 is required for certain sorts of spatial discriminative capacities shows that information in V1 is used by the organism to improve its steerage. It does not show that the information in V1 is organized just like a map or a picture. The structure might enable spatial discriminations (of some particular sort) without itself modeling space. If you look at its finer structure, I think it's pretty clear it does not model space. In fact, probably no feature maps are maps of space in the "depictive" sense. V1 is certainly a big array of measurements, but values in adjacent cells are not invariably measurements of adjacent places.
Details of the structure of V1 make this clear. The details in question are not subtle or contentious; most of them have been known since the work of Hubel & Wiesel. In particular, the ocular dominance pattern, and the arrangement of "orientation slabs", royally messes up the neighborhood relations. See Fig 1.1.
Fig 1.1. Orientation slabs in V1. Each of the highlighted slabs registers features in the same region of visual perimetry, but they are maximally sensitive to edges at different orientations. "L" slabs receive input from the left eye, "R" from the right (From Hubel & Wiesel 1979).
In a given orientation "slab" within (layer III of) a cortical column, all the cells will fire maximally to a edge, bar, or slit of a given orientation. See Fig 1.2.
Fig 1.2. An electrode track across the cortex, showing the orientations to which cells in the given column were maximally responsive. The vertical track shows how all the cells in a given column "prefer" the same orientation. Adjacent columns are maximally sensitive to slightly different orientations. (From Hubel & Wiesel 1979).
Cells in the neighboring slab do not register the same orientation in neighboring receptive fields, but instead a different orientation (in different receptive fields). And we have a block of orientation slabs for the left eye immediately adjacent to a block for the right eye. See Fig 1.3. These are the left eye view and the right eye view of the same location in external space.
Fig 1.3. Ocular dominance pattern across the surface of V1. Columns colored black receive input from the right eye, in white from the left eye. (From Hubel & Wiesel 1979)
The critical point: if you move half a millimeter in one direction, you might not change the receptive field at all, but instead move to a region receiving input from that same receptive field, but from the other eye. Move in another direction and the receptive field will shift, but so will orientation. Move in a third direction and only the optimal orientation shifts. These distances do not map uniformly onto distances in the ambient array. Ergo, homomorphism fails. V1 is not depictive.
How then does a feature map represent? One minimal but plausible description of the content of a feature map is: it indicates the spatial incidence of features. It might do more than this, but it does at least this. That is, it registers information about discriminable features, in such a way as to sustain varieties of spatial discrimination that can serve to guide the organism. The conditions focus on downstream consumers of the information, not what causes it. Registration of information in a feature map endows the creature with some spatial discriminative capacity. If that map is used, the steerage can improve. To carry on its other business, the animal relies on the constellation of features being as therein represented.
One way to get at the spatial content of a feature map, guaranteed to work for every feature map, is to ask: what sort or sorts of spatial discrimination does this particular feature map make possible? For some cortical regions dubbed "feature maps" by neuro-scientists, the answer could well be "none"--in which case the map is not a representation of the spatial incidence of features at all. (Such a map will not employ the representation form I identify below as "feature placing".) The idea: if feature map M is representing the spatial incidence of features, then it is being used as a representation of the spatial incidence. The information in it about spatial properties and relations is exploited. One way to show that it is exploited is to show that certain kinds of spatial discriminations could not be made without it; without map M working normally, the guidance system and steerage--the navigational and spatial competence of the organism--suffers some decrements.
The focus on downstream consumers is a way of showing that the registration of information is used as a representation; that it has a content that is used. To tie representations to the world, show that they improve the capacity to get around. But feature maps can do this without necessarily being pictorial or depictive; they can satisfy the condition without being, literally, maps or inner pictures.
Psychological theory right now lacks any deductive proofs, or even compelling arguments, that establish how information must be organized to endow creatures with some new spatial discriminative capacity. In particular, there is no compelling reason to think that information must be organized depictively in a feature map if that feature map enables a creature to make spatial discriminations which it otherwise could not.
What then does V1 represent? To answer this question, analyze what use downstream consumers make of the information registered in it. A first stab: these cells in layer III of V1 represent "(edginess of orientation theta) (thereabouts)". Edginess is the feature; "thereabouts" indicates its incidence. Those cells in layer III of V1 have the job of registering differences in orientations, in such a way as to allow spatial discrimination of them. If they do that job, the animal can rely upon those indicators, and thereby steer a bit more successfully than if it lacked them.
More generally, I have proposed that we call this form of representation "feature-placing" (see Clark 2000). It "indicates the incidence of features" in the space surrounding the organism. The name is partly in honor of Sir Peter Strawson's work (1954, 1974) on "feature-placing languages", which contain just a few demonstratives ("here" and "there") and non-sortal universals (feature terms, like "muddy" or "slippery".) A paradigm feature-placing sentence is "Here it is muddy, there it is slippery." Such sentences indicate the incidence of features in regions. Strawson argued that these languages could proceed without the individuation of objects. The same seems true of the representations employed in feature maps. It seems a bit much to claim that V1 "refers" to places, "identifies" regions, or "demonstrates" locales. All the latter locutions arguably invoke some portion of the apparatus of individuation. Feature-placing is prior to, and can provide the basis for, the introduction of that rather heavy machinery.
Another way to put it is that feature maps in V1-V4 transact their business in a location-based way. A particular feature map can endow a creature with new spatial discriminative capacities without also endowing it with an ontology of objects. It can get the spatial discriminative job done without investing in that sort of machinery. A skimpy basis can suffice; the business can be run on an ontological shoe-string. It is also important to insist that the regions visually discriminated are not inner, or mental, ones. They are not inside the organism or inside the mind. If the job is to guide spatial discriminations then representing those places will not help. Visual "thereabouts" always reside, resolutely, in the ambient array, not in the retina. The cortical feature map might be retinocentric (it uses an "eye centered" reference frame) but it is not retinotopic. It is not about the states of the retina, but instead about features in the world.
If V1 were representing places on the retina, then it should represent the blind spot as empty. But patterns are completed "across" the blind spot, as shown by Gatass and Ramachandran's experiments on scotomas and "filling in" (see Churchland & Ramachandran 1994). The filling in across the optic disk can give a veridical "perception" of the distal place, even though it would be a non-veridical representation of what is going on at the retina. V1 cells in the "Gatass condition" fire just as they would if there were a stimulus stimulating the non-existent receptors in the optic disk. If we were representing places on the retina, this would be a non-veridical representation (Churchland & Ramachandran 1994, 82)
So I think there is good reason to say that what these parts (of layer III) in V1 are representing is something akin to "(edginess of orientation theta)(thereabouts)." "Thereabouts" indicates a region of circumambient space--a region of visual perimetry, in the ambient optic array. "Edginess of orientation theta" indicates a feature discriminable in some portion of that space. The orientation is of an edge in external space, not across the eyeball. It is feature-placing, and both the features and the places are distal.
A brief look at a few other examples of feature maps will drive home the point that these maps are not generally maps of space. They are not "depictive" in the sense that Kosslyn suggests.
My favorite examples are the auditory maps used by the mustached bat to echolocate its prey (see Suga 1990). This creature emits a complicated ultrasonic screech. (See Fig 2.1). The first portion of it is of constant frequency and has four harmonics, starting at 30,000 hertz and ramping up to 120,000, labeled "CF1" to "CF4". At the end of the screech each frequency drops, and those portions are called the "frequency modulated" or "FM" portions. The sound echoes off of nearby objects, and the bat compares auditory characteristics of the echo to those of the first harmonic of the outgoing sound. The time it takes for an echo to return can most reliably be extracted from comparison of the FM portions.
If for example one could register the delay between the outgoing FM1 signal and the receipt of FM2 echoes, that information could be used to determine the distance between the bat and the object in question.
Fig 2.1. Echolocation signals and processing areas in the mustached bat. The right panel shows sound characteristics of a typical echolocating signal, starting with constant frequency components (CF)and ending with modulated (FM) portions in which the frequencies drop. The left panel shows areas of the cortex of the mustached bat corresponding to the various feature maps it employs in echolocation. The FM-FM area gives echo delay; CF-CF area registers doppler shift. (From Suga 1990)
Sure enough, in the cortex of the bat we find a topographically organized map of FM/FM comparisons, in which cells found rostrally have smaller FM/FM delays, and hence lesser distances; and cells found caudally are maximally active for greater FM/FM delays, and hence greater distances. See Fig 2.2.
Fig 2.2 The FM/FM maps. (A) shows their location, (B) the "topographic" organization. Cells at different places on these maps are maximally responsive to different echo delay times, as shown in (C). The caption for (B) notes distances at which the bat will search, pursue, and finally execute the terminal "scoop" maneuver.(From Suga 1990)
This is a map of echo delay times, in which travel across a meridian of the "map" corresponds to increasing or decreasing time. That time in turn corresponds to a datum of critical importance: the distance between the bat and the object that returned the echo.
Even more impressive is the use made of the constant frequency portions of the signal. Since the bat produces the CF1 signal (and also perceives it, through bone conduction), the outgoing frequency is a known quantity. Now if a moving object produces sound of a constant frequency, the frequency of the sound one hears will be shifted upwards if it is moving towards you, and downwards if it is moving away. This is the well known "doppler shift", illustrated when an ambulance goes by with its siren screeching. The same principle applies to the frequency of a sound echoing off of a moving object. If that object is moving towards you, the frequency will be shifted upwards, and if away, downwards. If it is stationary, there is no shift in the frequency. If the bat could compare CF1 with the frequencies of sounds returned from nearby objects, it could register the degree to which the frequency of the echo is doppler shifted, and hence also determine the relative velocity of the object off of which the sound echoed: whether it coming towards the bat, or flying away.
So it is pleasing to discover that there are other portions of the auditory cortex of the mustached bat that "map" the frequency differences between the outgoing CF1 and the returning echoes of CF2 and CF3 respectively. See Fig 2.3.
Fig 2.3. The CF/CF maps. (A) shows the pairs of frequencies yielding the maximal response among cells at that point in the map. If there is no doppler shift, CF3 = CF1 * 3. (B) shows locations on the same map in terms of relative velocities that would produce the given doppler shift. (From Suga 1990)
These CF/CF maps are again organized "topographically", but here the different places correspond to different frequencies on the bat's basilar membrane. Along one axis we have differing frequencies for the outgoing CF1 signal; across the other differing frequencies for the returning CF3 echo. For example one particular swath of cells located rostrally on the cortical surface are all neurons that respond maximally for CF1 emissions of 30.0 kHz. Across that swath, in the dorsal/ventral direction, we find a progression of cells that respond maximally to differing CF3 echo frequencies. If the object were stationary, the returning echo should be exactly the third harmonic of the CF1 pulse. To the extent that cells other than just those are firing, the cortical region will register the extent to which the frequency of the echo has been shifted. In other words, this is again a "topographically organized" map, but this time it is a map of doppler shifts. It can be used to represent relative velocity.
I think these are (spectacular examples of) feature maps, and that they are doing feature-placing, but that they are not maps of space. It is not the case that different places in the cortical region are used to represent different places in front of the bat. Instead, the FM/FM maps are maps of echo delay (differences in time), and the CF/CF maps are maps of doppler shift (differences in frequency). Different cortical regions "correspond to" different values in these mapped features; but what this means is that in those different cortical regions one finds cells that are maximally active for different values of echo delay or frequency shift. Nearby cells will be maximally active for different, but nearby values of, those features.
The feature map does not in any way use cortical position to represent the spatial position of stimuli. What "topographic organization" means here is that the cells in those different cortical regions are maximally active for different values of the mapped features. Rostrally or caudally, or dorsally or ventrally, one finds an orderly sequence (which is not continuous, and not isotropic, but orderly) of the delay times, or of frequency shifts, that will serve maximally to activate the cells there found. Of course cortical location per se does nothing; it is always the cells found at those locations. The feature to which those cells respond can be any physically detectable quantity whose detection the forces of mutation and evolution might have stumbled upon. Here the features are delay times and doppler shift.
Now it is true that from delay time one can (given the speed of sound) derive distance, and from doppler shift one can (given a known base frequency) derive relative velocity. The latter quantities are clearly critical ones for a hunting bat to get right. Discriminations of distance and velocity clearly fall within the gamut of what psychologists call "spatial" discrimination. Nevertheless, I think it only confuses matters to say that these maps are maps of "spatial" features. Immediately, immanently, they are not; they are maps of time and of shifts in frequency. A clearer way to describe their spatial import is to say that the bat uses these registrations of time and shifted frequencies as representations of distance and relative velocity. In fact it is only because the bat can register delay times and frequency shifts that it can make the discriminations it does for distance and relative velocity. Downstream consumers use the feature map as a representation of spatial features, and it is only because the bat can register delay times and frequency shifts that it can make the spatial discriminations it does among distances and velocities. But otherwise, there is nothing directly spatial registered in the feature map itself. Spatial features are derived from it, both by the bat and by us. It is not a map of space, and it is not (directly) a map of spatial features. Nevertheless it can be used to make possible a variety or varieties of spatial discrimination.
From an anthropological or philosophy of science point of view, it is interesting that these cortical regions are called "maps" at all. We don't have an orderly progression of map locations mapping onto an orderly progression of locations mapped. Here we have somewhat orderly progression of cortical locations mapping onto values of delay times or doppler shifts to which the cells in those locations happen to respond maximally. The "map" analogy is at least one step removed. Differing places in the cortex do not map differing delay times, or differing doppler shifts. The condensation offered by the analogy is remarkable: what is literally meant is something about the activation patterns of cells found at those locations.
So if these are not literally maps of space, and the features mapped are not spatial, in what sense are they feature maps at all? In what sense are they doing feature-placing?
Here it is important to understand that the second condition described for a feature map--that it sustains some variety or varieties of spatial discrimination that would not occur without it--can be met in various ways. A "feature map" must serve somehow to aid the project of guiding the host. If it is to be something one can recognize as at least analogous to map, the guidance in question is taken more or less literally, as "showing the way", or yielding spatial discrimination. Prototypically, it will help the organism navigate, or make its way through the spatial array of objects and properties confronting it. The question is not answered by studying the internal structure, or even the causal origin of processes in the region; but rather by studying how those processes are used.
It is quite clear that the mustached bat uses its CF/CF and FM/FM maps when hunting. The registrations of FM/FM delay times are exploited for what they indicate about distance, and the bat's hunting behavior changes markedly depending upon that distance. (As it gets closer the ultrasonic screeches become more frequent, and of shorter duration; the timings of the CF and FM portions change; and the bat makes anticipatory adjustments to prepare itself to scoop up and grasp the prey.) Similarly, the information about relative velocity is used both to select a target to pursue (it pays particular attention to ones heading towards it), and then to adjust that pursuit as the moth attempts evasive maneuvers. (Moth audition is sufficiently acute that it too can tell the distance between itself and the bat. If it is far enough away, it will turn and fly straight away from the bat. If it is too close, it does a "beaming" maneuver to fool the doppler radar, and drops to the ground to try to get lost in the ground clutter of doppler returns from the ground.)
As far as I know, it is not yet understood exactly how the bat uses activity peaks at different points on the CF/CF maps to pick out and pursue a particularly tempting target. But it is very clear that it does do this, somehow. At any moment there will be many different activity peaks at different points on both the CF/CF and the FM/FM maps, indicating the various relative velocities and distances to potential targets. The bat picks one target, and uses activation patterns in these maps to indicate both distance and relative velocity of that one target. It is quite clear that it does this because of the agility of its maneuvering in pursuit, and the alteration of tactics as it closes. Furthermore, loss of one of the map types would be expected to make the bat effectively "blind" to that spatial feature. If it lost all its FM/FM maps, it could not discriminate distances to targets, and one would expect all the responses to changing distances to disappear. Its agility and success in pursuit would show marked decrements.
Such are the kinds of findings that can sustain the notion that the "feature map" makes possible some varieties of spatial discrimination even though it is not literally depictive, and it is not literally a map of space. The bat uses these registrations to pick out a moth that is near enough, and with a good relative velocity; and then it uses them to adjust its pursuit. So the feature map, in Ramsey's phrase, is used by the bat as a "map by which it steers". It can do this even though the "map" is not a map of space.
The fundamental idea of feature-placing is that a system of representation could still be informative, have contents, and have correctness conditions, even though it does not employ an apparatus of reference to objects. The "features" themselves are of course a restricted class (confined to non-sortal sensible qualities) but the more salient and controversial restrictions lie on the reference side: the idea is that this system of representation does not have the wherewithal to identify, individuate, or refer to objects. The apparatus of identity, the introduction of sortals, criteria of individuation: all are missing. Instead at best it "indicates" the incidence of features. It "places" them. The referential force of a feature-placing representation is something like the referential force of a demonstrative in a feature-placing sentence, such as "here it is slippery" or "it is misty there".
Now the question is: Given the details just described, In what sense if at all do neural feature maps place features? How is it that what they do is worthy of the title "feature-placing"? What I'd like to show has two parts. The first is that if these structures satisfy the previously specified conditions, then they also satisfy conditions sufficient to do the "placing" part of feature-placing. The rival hypothesis on the table is that if these structures are to be counted as housing representations, then the reference of such representations must be a matter of identification of, and reference to, objects. So part (b) is to show that those conditions are not sufficient to sustain the latter.
The three conditions previously described can be summarized as follows
1. The fiber bundles coursing into the structure give it a topographical organization. That is, at the origin there are local regions within which neighboring cells project, more or less, to neighboring cells in the destination.
2. The region must enable some variety of spatial discrimination, in some way improving the organism's capacities to steer. Typically, information registered in the structure is necessary for some kind of spatial discrimination, so that if the region is destroyed, those capacities of spatial discrimination are lost.
3. We can give some explanation of how, exactly, the information registered in the structure is used to make those spatial discriminations possible. Some plausible story can be told of how different activity levels at different places in the neural map suffice for those discriminations.
As the CF/CF and FM/FM maps show, we can satisfy the first two conditions without being able to satisfy the third. We know that those are maps that enable certain kinds of spatial discrimination, but how exactly they do that (how those registrations suffice for it) is not yet clear. One might say that we know that they do the business, but not how they do it. To show the conditions are sufficient to do the "placing" part of feature-placing, we need examples in which all three conditions are met; and for this we should revert to visual examples.
For V1 we have dramatic demonstrations of condition 2: in a scotoma the subject loses certain spatial discriminative abilities (and other discriminations too). That is, if unprompted, the stimulus cannot be located, and no spatial predicates at all can be applied to it. So unprompted discriminations of colinearity, parallelism, orientation, shape, size and other such spatial features all drop to zero. If these feature registrations are lost, all normal visual spatial discriminative abilities in that region of visual perimetry are lost. It shows that this way of organizing information about features really is critical to sustaining the perception of spatial relations.
And for V1, condition 3 can be fleshed out a bit more fully:
3*. Cells in the neural region have receptive fields in visual perimetry. By and large (except for some wholesale but systematic topological discontinuities or "tears") the neighbors of a cell in V1 have receptive fields that neighbor (and often overlap) its receptive field.
In general, condition 3 requires us to provide some story explaining how different patterns of activation in the given cortical region could be sufficient for some variety of spatial discrimination; for V1 3* is part of that story. We need to explain how the system uses the information registered in that cortical region as a map by which it steers. It is not a mere coincidence, and it is not representationally inconsequential that, neighbors of this cell have receptive fields that neighbor its receptive field. Those neighborhood relations, and the mapping of them, are part and parcel of how the thing works. They are vital in explaining how activity in this part of layer III of V1 is used to discriminate segment-orientations in a particular portion of visual perimetry.
The best example of this is visual hyperacuity: that visual spatial discriminations can resolve visual angles that are smaller than those between two receptors, even those found in the fovea. For example, tests of "vernier acuity" take a vertical line, break it in the middle, and shift one portion sideways by a minute amount. The visual angle of the offset that can be reliably discriminated is smaller than that between two foveal receptors. The standard interpretation is that the system is sensitive to, and exploits, patterns of activation across relatively wide portions of the input array, and uses the pattern, not just input from one point, to interpolate position. To do this it needs to "know" which cells are neighbors, and exploit the information found across a substantial portion of the neighborhood.
The local differencing operations employed in edge detection and enhancement provide further examples. An edge will register as a difference in intensity between neighbors, and to find it one must do many, many subtractions between the values registered by a cell and those of its neighbors. Lateral inhibition and center-surround antagonism also have the effect of "enhancing" such edges: the discontinuities of intensity are more pronounced if activity in any cell inhibits activity in all its neighbors. So the fact that neighbors map to neighbors is functionally significant, making edge enhancement and detection much easier than it would otherwise be. If we think of cells in layer III as indicating ((edge of orientation theta)(thereabouts)), the effect of these local differencing operations is to sharpen the boundary of the region indicated by "thereabouts".
It is, then, no mere coincidence then neighboring receptive fields map to neighboring cells. The system relies on information naturally contained in the patterns of overlap. The exploitation thereof shows that those patterns are representationally significant, helping to delimit the region in which features are indicated. They help focus the reference of "thereabouts" in representations of the form ((edge of orientation theta)(thereabouts)).
Now in a typical feature-placing sentence, the place indicated is indicated or demonstrated in a rather imprecise fashion. There might be focal regions that are definitely included in it; and outliers definitely excluded; but the boundary between them is not indicated determinately. The exact region to which "there" applies if one says "it is slippery there" need not be determinable by either the speaker or the hearer. The receptive fields of cells in layer III of V1, and the spatial extent of the region about which they indicate something, are indeterminate in the same way. This is one reason to say that what they are doing is feature-placing. This is part one of my two-part claim.
If a feature map does feature-placing, then the "placing" bit does not identify, name, or refer to places. It indicates them. Perhaps one could also say it "demonstrates" them; the reference is only as determinate as is the place to which the pointing finger in a demonstration points. That too we can determine (and sometimes need to determine) as precisely as our spatial discriminative capacities allow; beyond that no further precisification is warranted. (But the verb "demonstrates" can be misleading; one is apt to confuse the spatial indication of the demonstration--which helps fix the reference of the demonsrative--with the reference of the demonstrative itself. The demonstration is a gesture; the demonstrative is a term.) In any case the rival hypothesis is that if V1 is representing anything, it must be representing properties of objects. On this line the only kind of reference is the full-blooded kind found in natural language, with the concomitant auxiliaries of sortals, individuation, and numerical identity.
Could the content of this feature map be captured in a list of sentences? Each member of the list would identify a region and predicate some property of that region. I think that list would fail to capture the ways in which V1 does and does not delimit the regions about which it is indicating something. In particular, hyperacuity is very hard to understand if we have to parcel up the representational content into discrete sentences. One sentence talks of Fred, and the next of Harry; what corresponds to interpolating between them? Likewise, it is hard to see how relations between discrete members of the list could help to sharpen the boundaries to which "thereabouts" applies. Treating the thing as a list of sentences seems false to the way it actually works.
At the same time, the list of singular propositions makes the reference to each named region much more determinate than V1 could do. That list represents with the full apparatus of individuation. It can represent the distinction between numerical and qualitative identity, and distinguish "the same one again" from "a different one, but qualitatively identical". There is nothing in the operations of V1 that suggests, much less requires, that it is operating with a system of representation that powerful. (This is why it can be misleading to say a feature-placing representation "picks out" or "identifies" a region, or that it has the force of a demonstrative.)
In short, the "propositions about objects" line both fails to grasp how the spatial reference of representations in V1 is in fact constrained, and overstates the powers of that system to identify, distinguish, and re-identify its "referents". This is part two of my two-part thesis.
Whereas feature-placing is just right, on both scores. It is hard to describe the representational character of a feature map in natural language, but one locution that seems accurate is that the feature map "transacts its business in a location-based way". Features of its operations accord better with a location-based than an object-based business model. One such feature is the way it exploits information naturally contained in relations between neighbors. A second is the way it actually indicates the region in which discriminable features are characterized. It sustains spatial discriminations of a certain sort. Feature maps indicate the incidence of features, and that "indication" is no more or less precise than are the capacities of spatial discrimination enabled by the registrations in question.
Churchland, Patricia S. & Ramachandran, Vilayanur S. (1994). Filling in: Why Dennett is wrong. in Antti Revonsuo & Matti Kamppinen, (eds) Consciousness in Philosophy and Cognitive Neuroscience. Hillsdale New Jersey: Lawrence Erlbaum Publishers, 65-91.
Clark, Austen (2000). A Theory of Sentience. Oxford: Oxford University Press.
Hubel, David H. and Wiesel, Torsten H. (1979). Brain mechanisms in vision. Scientific American, September 1979. Reprinted in Jeremy Wolfe (ed), The Mind's Eye. New York: W.H. Freeman, 1986, 40-52.
Konishi, Masakazu (1992). The neural algorithm for sound localization in the owl. The Harvey Lectures, Series 86, 47-64.
Kosslyn, S. M. (1994). Image and Brain: The resolution of the imagery debate. Cambridge. MA: MIT Press.
Kosslyn, Stephen M., Thompson, William L., and Ganiz, Giorgio (2002). Mental imagery doesn't work like that. (Reply to Pylyshyn 2002). Behavioral and Brain Sciences 25(2), 198-200.
Strawson, P. F. (1954). Particular and general. Proceedings of the Aristotelian Society 54: 233-260.
Strawson, P. F. (1974). Subject and Predicate in Logic and Grammar. London: Methuen & Co. Ltd.
Suga N. (1990). Cortical computation maps for auditory imaging. Neural Networks 3:3-21.
Suga N, Olsen JF, and Butman JA (1990). Specialized subsystems for processing biologically important complex sounds: cross correlation analysis for ranging in the bat’s brain. In The Brain: Cold Spring Harbor Symposia on Quantitative Biology 55:585-97.
Back to Austen Clark online papers.
Back to the Philosophy Department home page.