Digital Processing of Quantum-Limited Images

Conjecture on the Relationship between Spatial and Temporal Visual Processes

Why do Stabilized Images Disappear?

A Simple Model for Filling-In, Contrast, Contrast Constancy and Assimilation

What is “True Color”?

 

 

A Simple Model for Filling-In, Contrast,
Contrast Constancy and Assimilation
Tom N. Cornsweet

INTRODUCTION

There is ample electrophysiological evidence that the receptive fields of most optic nerve fibers manifest center-surround antagonism (1). The fact that the firing rates of most cells in visual cortex are not significantly affected by the level of illumination when the illumination is uniform over their receptive field (2) implies that the strengths of the positive and negative signals within each field are approximately equal. This results in a contrast sensitivity function such that sensitivity approaches zero as spatial frequency approaches zero, and, in an ordinary scene, the expected "neural output image" will depart from a resting level only in regions containing luminance gradients or steps.

Figure 1b is the scene in Figure 1a after processing of the sort that might be expected from this kind of center-surround antagonism. If signals manifesting center-surround antagonism were monotonically related to our perception of brightness, then the scene in Figure 1a should look like the image in Figure 1b, and it clearly does not.


Figure 1a


Figure 1b

One possible reason for this discrepancy involves the difficult philosophical question of the relationship between physiological events and experience (3). I choose not to discuss that question here, but instead to accept the following as a working assumption: that there is a layer somewhere in the nervous system whose spatial arrangement is topologically related to the retina (and thus to the scene) and such that the activity level at each point is monotonically related to the apparent brightness at the corresponding point in the scene. I chose this assumption because it often seems to provide satisfying explanations for perceptual phenomena. Given this assumption, the fact that scenes do not appear as in Figure 1b is a puzzle.

It might be that fibers without center-surround antagonism provide the signals for brightness perception, center-surround fibers constituting a parallel channel for other uses, e.g. seeing edges. That possibility is refuted by a very large body of evidence about the appearance and disappearance of stabilized retinal images.

Suppose a subject views a uniform disk on a darker background. When the image of the disk is stabilized, it quickly disappears. That does not mean that the region of the disk looks black or like a hole. Instead, that region takes on the same brightness (or hue (4)) as the background. If the disk is then shifted very slightly across the retina, the entire disk will reappear (and then disappear again), even though the motion has only affected receptors near the its edges. The appearance of the center of the disk depends only on what happens at unstabilized edges around the disk. This is, of course, what happens during normal unstabilized viewing of a uniform disk. Very small eye movements, that produce changes in retinal illuminance only at the edges of the image of the disk, are sufficient to determine the brightness of the whole disk. There is no way that photoreceptors under the image of the center of the disk know whether it is stabilized or not; yet the center looks entirely different depending upon whether or not it is stabilized. Somehow the brightness (and color) of any uniform region depends exclusively on events at (unstabilized) edges.

That observation is consistent with the presumption that neural activity manifesting center-surround antagonism is in fact the activity that supports the perception of brightness, and that there is some additional mechanism that fills in between edges.

The term "filling-in" has been used to apply to a variety of visual conditions, including filling-in across the blind spot (5), across retinal scotomas (6), illusory contours(7), filling-in of texture (8), etc. In this paper, it refers to a process that causes the appearances of regions of uniform luminance to be derived from their surrounding contours.

Gerrits and Vendrik (9) proposed a qualitative model for filling-in for which Grossberg and his coworkers have developed a rigorous mathematical model, consisting of a set of differential equations (10,11,12). (See, also, related work in digital image processing, e.g. 13,14). Although that model does make reference to physiological processes, the direct relationship between the equations and physiologically realistic components is not easy to discern. Here, a new model is presented that is appreciably simpler than Grossberg's, and is constructed of physiologically plausible working parts.

THE MODEL

Figure 2 shows the stages of the model. The first stage is a two-dimensional layer of photoreceptors. As an illustration, the input illuminance pattern is taken as a uniform disk on a less bright background. A cross-section of the input pattern is plotted on the right. The output of each photoreceptor feeds into a neural network that generates receptive fields with center-surround antagonism, the total excitation in each field equaling the total inhibition so that the response to uniform illumination is zero. (This stage may actually be distributed among several layers in the visual system. In fact, if the mechanism is recurrent inhibition, it must be distributed because a single stage of recurrent lateral inhibition that is strong enough to reduce the response to a uniform field to zero cannot be stable. To keep the model as simple as possible, it is represented here as a single feed-forward stage.) The output of this stage will then be as illustrated.


Figure 2

The final stage performs filling-in; its output at each point is taken to be monotonic with the apparent brightness of the scene at the corresponding point.
Figure 3 diagrams one element (pixel) in the two-dimensional array of elements. Again, the first stage is simply a photoreceptor. For simplicity in the simulations to follow, the output of each photoreceptor is assumed to be linearly related to its illuminance.


Figure 3

The receptors feed input to a standard version of lateral inhibitory interaction, each pixel inhibiting its neighbors in proportion to its excitation as a decreasing function of distance. Each receptive field in this stage has an excitatory center and an inhibitory surround, and uniform illumination produces activity at a "resting" level. For the present discussion, it will be assumed that the sizes of all receptive fields are the same. That assumption will be replaced with a more realistic one later. (The model performs in essentially the same way if this stage is assumed to manifest Intensity-Dependent Spread (15) so long as the resting level is properly handled.)

The next stage creates an off-center unit in addition to the on-center unit. (There are surely other ways to generate on- and off-center units. However, the particular way is not relevant here.)

Each element in the final stage performs the following operations. Signals from the off-center channel are subtracted from the on-center channel signals, and the result is added to the average of the outputs of the immediately neighboring output pixels. Note that this part of the system is recurrent. That is, signals are fed back from the output of each element to add to the inputs of its neighbors. ("S/N" in the figure denotes that the signal fed back from each output pixel is divided by the number of pixels feeding the pixel in question, to obtain the average.)

(It will be argued in a separate paper that an additional stage must precede the filling-in stage, a stage that acts as a valve, passing the signals only for a short time after any change in input. Such a stage is necessary to account for the disappearance of stabilized images.)

RESULTS: REPRESENTATIVE INPUT PATTERNS

All of the results to be presented here are generated by simulation of the model. (Pseudocode for the simulation is presented in the appendix.)

The filling-in stage of the model provides interactions only among immediate neighbors, and the output pattern thus develops slowly and approaches equilibrium over a large number of iterations. Here are a few examples.

Figure 4 shows the action of the model for a uniformly illuminated rectangle on a darker background. At the top of the column labeled "input" is the illuminance pattern and below it is the illuminance cross-section. The image at the top of the column labeled "LI Output" is the output of the lateral inhibition stage, with its output cross-section plotted below. The image at the top of the column labeled "FI output (200)" is the output of the filling-in stage after 200 iterations, and its cross-section is plotted below. The rightmost column shows the filling-in output as it approaches equilibrium (1200 iterations).Figure 4

Note that the output patterns and their cross-sections have been scaled to span equal ranges in this and all subsequent similar figures. Thus the lightnesses and profiles cannot be compared among the various figures and numbers of iterations.

Next, consider what has come to be called the Craik-O'Brien-Cornsweet illusion. Cornsweet (16) pointed out that the output of the lateral inhibition stage must be essentially the same for that input pattern as for a step, which explains why this pattern and the step look the same. (This is also approximately true for Craik's version (17) and approximately approximately true for O'Brien's (18).) Why both patterns look like a step, according to the model, is shown in


Figure 4

A pattern with cumulative steps is shown in Figure 5, a kind of pattern that Grossberg's model does not properly handle (but Arrington's modified version does (19)). Note that, while the output pattern is developing, all the steps except the top one are scalloped, in agreement with the actual appearance of the luminance pattern. At equilibrium, the output is simply a slightly low-pass-filtered copy of the input.


Figure 5

Figure 6 shows the results at (dynamic) equilibrium when the input pattern is that of Figure 5 and has been present for a large number of iterations, but is shifted back and forth periodically in a way that mimics the effects of small saccadic eye movements. Here the scalloping appears and then gradually fades after each movement.


Figure 6

Figure 7 is a combination of a classical simultaneous contrast pattern and one that produces Mach Bands. The output after 200 iterations illustrates both results. Simultaneous contrast results when the two small patches of equal luminance are presented against differing backgrounds and the output of the model gives the same result. The luminance transition between the two backgrounds is of a form that generates Mach Bands and the output of the filling-in stage behaves similarly.


Figure 7

At equilibrium in Figure 7, the output pattern is simply a copy of the input (slightly low-pass filtered). If the model is essentially correct, then a comparison of the outputs after 200 iterations and at equilibrium with the actual appearance of the pattern suggests that, during normal viewing, equilibrium is not normally reached. Either it is precluded by an eye movement, or if not, the (stabilized) image will disappear.

Figure 8 shows the result for the same pattern as in Figure 7 except that the outer border has been removed. In the simulation, this is equivalent to making the pattern infinitely large by extending the illuminances all around the edges to infinity in directions perpendicular to the edges. Note that although under this condition, the equilibrium results have the desirable form, the condition itself is unrealistic. In general, simulations of the model in which the pattern of interest lies on a background or border yield results at equilibrium that are simply copies of the input. (See the CSF in Figure 12, below, which is essentially flat for low and middle frequencies at equilibrium.) These results are, in general, different from the equilibrium results when the border is omitted, and the same effect might well be expected for any model of filling-in. Therefore, it is important, in testing such models, that a realistic border be included.


Figure 8

The input in Figure 9 is a sine-wave grating, and so is the output. Here is an input pattern that contains no uniform regions, but at equilibrium the model preserves the sine-wave at the output. This property of the model holds for sine-waves of all frequencies and amplitudes (within the limits imposed by digitization). Note that for a sine-wave of this particular frequency, the equilibrium shape is approached very quickly, but because the profiles in the figure are scaled to have equal ranges, the fact that the amplitude rises as iterations progress is hidden.


Figure 9

RESULTS: CONTRAST SENSITIVITY FUNCTIONS

Imagine looking at a uniformly illuminated disk on a darker background. If the center of the disk looks brighter than the background even when the disk is larger than the largest receptive field it illuminates (it does (20)), then it must be true that the response of the visual system to zero spatial frequency is not zero. If it were zero, the center would have the same brightness as the background. This seems inconsistent with the fact that the human spatial contrast sensitivity function (CSF) approaches zero or close to it as frequency approaches zero. The model described here resolves that apparent inconsistency, as explained below.

Figure 10 shows the CSF for the lateral interaction (inhibition) stage of the model. It falls to zero at zero spatial frequency. (The high-frequency end of the CSF depends on optical properties and other factors, like the width of the excitatory region, that are not relevant to the present discussion. Therefore, high-frequency behavior will be ignored here.


Figure 10

Figure 11 shows the CSF for the filling-in stage alone, (sine waves are injected directly into the filling-in stage) at equilibrium and at two earlier times in its development as iterations proceed. Note that, at equilibrium, as spatial frequency approaches zero, sensitivity approaches infinity. That is a result of the positive feed-back in this stage.


Figure 11

Figure 12 shows the CSF for the complete system at equilibrium and at two earlier times. At equilibrium, as the spatial frequency approaches zero, the fall in lateral inhibitory stage output is balanced by the increasing sensitivity of the filling-in stage, resulting in a response curve that is fairly flat all the way to zero. (That the function shows finite gain at zero is evident from the fact that uniform regions of the input that are larger than the size of the receptive fields produce uniform non-zero outputs, e.g. Figure 4. At zero spatial frequency, the lateral inhibitory stage output, and thus the input to the filling-in stage, is zero; the filling-in stage multiplies zero by infinity, and the result is finite!) Prior to reaching equilibrium, the system CSF shows that it attenuates low frequencies more strongly.


Figure 12

DISCUSSION

There are some important similarities between the present model and that of Grossberg and coworkers. Both models assume a stage with center-surround antagonism and both assume on- and off-center units. Grossberg (10), following Gerrits and Vendrik (9), then proposes that the signals from these units diffuse over a homogeneous two-dimensional layer. The action of the elements in the filling-in stage of the present model can also be considered to represent diffusion of the incoming signals. That is, one way to simulate diffusion through a homogeneous layer in a system that actually consists of discrete elements is to cause the output of each element to be the sum of its input and the mean of its neighbors' outputs.

One difference between the present model and those of Grossberg and coworkers is that the present model is phrased in terms of working parts that are simply excitatory and inhibitory synapses (with neurons that connect them). A more important difference is this. Grossberg and coworkers, again based on Gerrits and Vendrik's qualitative model, assume that there is a mechanism by which diffusion is blocked by barriers at places corresponding to contours or edges in the scene. This idea appreciably complicates the equations, and it seems difficult to imagine a physiological mechanism that would generate the right kind of barriers. In the present model, no barriers are required. (The presence of barriers in Grossberg's model produces incorrect results when the scene contains a succession of luminance steps, like those in Figure 5, below. Arrington (19) solved this problem by modifying the basic Grossberg model to make the barriers selectively leaky. This further complicates the equations, but does make that model more similar to the one presented here.)

The CSF for the complete model (Figure 12) is relatively flat and thus appears inconsistent with the low frequency attenuation that is characteristic of the typical psychophysically determined human CSF. But note that the "standard" human CSF is based on measurements of threshold contrast, and no threshold was included in the model above. To account for threshold behavior, a threshold mechanism must be included in the model. It we accept the plausible assumption that the threshold for contrast is set between the lateral inhibition and the filling-in stages, then the CSF at threshold must have the characteristics of the CSF of the lateral inhibition stage (Figure 9) while the CSF measured at contrasts well above threshold will have the shape of the overall system CSF (Figure 12). The flattening of the CSF at low frequencies and above-threshold contrasts is well documented and is called "contrast constancy (21).

This model requires that center-on and center-off responses are both present (as does Grossberg's). (If neuronal firing rates could be negative, then the generation of the two classes of receptive fields would not be required for this model, as may be clear from the discussion in the appendix.) It is interesting to speculate that complementary classes of receptive fields might have evolved because they enable filling-in.

The iteration that naturally occurs in the filling-in stage imposes strong temporal properties on its output, properties only minimally addressed here. Work on such issues as temporal CSFs, temporal masking effects, etc, could be very interesting.

CONCLUSIONS

A simple model can be constructed, consisting only of excitatory and inhibitory synapses and their connecting neurons, that manifests behavior closely mimicking many phenomena of human brightness perception. It consists of a stage manifesting strong lateral inhibition or its equivalent, followed by a stage that combines the output of the first stage with recurrent summation among neighboring elements. When it is assumed that the threshold for contrast is determined between these two stages, both the low frequency attenuation of threshold CSF measurements and the flattening of the CSF at higher contrasts (contrast constancy) result.

BUTThe two gray squares in Figure 13(a) have identical reflectances. According to classical contrast theory, the gray square on the left should appear lighter than the one on the right, since the one on the left is bordered by dark squares and the one on the right by light ones. When this pattern is presented as input to the model described above, the result is in accordance with the classical contrast prediction. However, in fact, the gray square on the left appears darker than the one on the right.


Figure 13

Similarly, the two gray bars in Figure 1(b) have identical reflectances. By classical contrast theory and in the output of the model described above, the gray bar on the left should look darker than the one on the right. The appearance of these patterns clearly demonstrates that the theory described above is wrong.

Multi-Channel Plus Filling-in (MC+FI) Theory

Gilchrist (24) explains the phenomena in these figures and many related ones in term of what he calls "Anchoring Theory". He suggests that the visual system parses scenes into groups according to Gestalt principles, then assigns lightnesses relatively within each group, and the final lightness at each point is a weighted average of all the lightnesses computed for that point as a member of various groups.

The theory presented here is an explanation in terms of simple physiological mechanisms. There is surely no fundamental inconsistency between explanations in terms of physiological mechanisms and in terms of "Gestalt Principles". Many early Gestalt theorists looked for such physiological explanations for their principles. It may be correct to say that the theory presented here is a description of a set of plausible neural mechanisms that perform the computations required by Anchoring Theory.

It seems likely that some aspects of brightness perception are influenced by cognitive or "top down" processes. The model presented here does not seem to fit the idea of a "cognitive" process. Nor is it "top down", if that term implies that signals at some level feed back to modulate earlier levels. (There is feedback in the filling-in stage but that seems too local to be called "top down".)

The model presented earlier consists of a lateral interaction stage that performs band-pass filtering, a stage that creates plus-center and minus-center receptive fields, and a filling-in stage. The following modifications cause the model to produce correct predictions for the patterns in Figures 13(a) and (b).

First assume that there is a variety of receptive field sizes in the visual system, all of which exhibit center-surround antagonism, and all of which act upon the input image in parallel. That is, the input image is convolved with a set of masks; the algebraic sum of the weights of each mask is zero (to attenuate zero frequency to zero), and the masks vary in width. To put it another way, assume that there exists a set of parallel neural images, processed by filters with pass-bands whose peaks are spread along the frequency axis. This idea has a venerable history under the label "multiple channel theory" (25).

Next, assume that the outputs of all of these various filters are simply added, pixel by pixel to create a single summed image. This summed image is then input to the filling-in stage. This arrangement is diagrammed in Figure 14. In other words, the MC+FI model is essentially identical with the one presented above except that the input to the filling-in stage is the sum of the outputs of a set of band-pass filters with differing pass-band frequencies.


Figure 14

RESULTS

Figure 15 shows a plot of the output of MC+FI for the two gray squares in the checkerboard as a function of the number of iterations of the filling-in process. (The specific parameter values used to obtain all of the results presented here are given in the appendix.) The outputs are initially those predicted by classical contrast theory, but as the iterations proceed, the lightnesses reverse, and remain reversed as the process approaches equilibrium.

Figure 16 shows output patterns for the checkerboard. The set of six images across the middle are output patterns for six successively larger receptive fields and the pattern at the bottom is the output of the MC+FI theory after 300 iterations. (As discussed in the appendix, in this particular simulation the plus centers of all the receptive fields are set at the same size. Only the negative surrounds vary. If the sizes of the centers are made to increase with the surrounds, the results are qualitatively the same.)


Figure 16

Figures 17 and 18 show that the stripe pattern of Economou et al (23) produces a similar result


Figure 17


Figure 18

The result (Figure 19 and 20) is also similar for the classical contrast pattern, except that the reversal occurs only after a much greater number of iterations. Thus, if the instantiation of MC+FI theory with this particular set of parameters (as described in the appendix) is right, then the appearances of the gray areas in the three figures depend on the state of the system during a period of time that begins after about 100 iterations and ends after about 700 iterations.

In general, the initial behavior of the MC+FI model produces the same effects as the (single channel) model described above, as illustrated by the example in Figures 19 and 20. However, as the iterations proceed, the results change in ways and at speed that depend upon the stimulus configuration.


Figure 19


Figure 20

Because Economou et al attribute the strength of the illusion in Figure 13(b) to the degree to which the gray stripes are perceptually grouped with the flanking stripes, they varied the number of stripes that flank the test (gray) stripe, a process they describe as "articulation variation" and correctly predicted the strength (and direction) of the illusion (23). Their result is shown in Figure 21. Figure 22 shows the corresponding result for the MC+FI model.


Figure 21

As a general rule, it appears that test patterns lying on dissimilar backgrounds yield classical contrast while patterns imbedded in repetitive patterns tend toward reversed contrast or assimilation, and the result in Figure 22 can be thought of as demonstrating the transition between the two states.


Figure 22

The pattern in Figure 23 is a classical demonstration of what is usually called "assimilation", another example of a reversal of classical contrast. The plot in Figure 23 shows the results for the MC+FI model.


Figure 23

(In these figures, the high spatial frequency attenuation that results primarily from spatial summation in the band-pass stages produces very noticeable blur. However, the prominence of the blur depends upon the relationship between the widths of the features in the patterns and the widths of the receptive fields. The receptive fields in the simulations are 3,5,7,9,11,and 13 pixels in width. In figures 16 and 18, the stripes and checks are only about ten pixels in width. In Figure 23, the thinner stripes are only one pixel in width. If the patterns were enlarged, the stripes and checks would be wider but the blur width would remain the same, thus appearing less prominent. To try to predict, from the MC+FI model, the amount of blur apparent in a scene would require assumptions about the actual densities of cell assemblies in relation to retinal image size.)

Figure 24 shows a pattern modified from Adelson (26). The lighter squares within the darker circular region have the same luminance as the darker squares in the other regions, but they appear lighter. The model shows the same result during the first 650 iterations. (In Adelson's display, an object rests on part of the checkerboard and the darker region with blurred edges appears to be the shadow of the object. However, as is evident from the pattern in Figure 24, the object that casts the shadow is not required for the illusion, nor is the (cognitive) presumption that the darker region is in fact a shadow.)


Figure 24

In Figure 25, the input pattern is that in Figure 24 but the channel with the largest receptive field is eliminated, resulting in an increase in the number of iterations during which the illusion is maintained.


Figure 25

DISCUSSION

The patterns presented here represent only a very small sample of the essentially unlimited population of patterns that demonstrate contrast or assimilation, but one has to stop somewhere. In that regard, it should be pointed out that the model yields correct predictions for every one of the patterns tested so far.

The output of the model changes as the number of iterations of the filling-in stage increases, and, as was pointed out above, the predictions are only correct over a limited, although broad, range of numbers of iterations. Thus the model implies predictions about the development of the illusions over time. However, certain aspects of the model need to be more fully developed before such predictions are testable. For example, because the lateral interaction stages involve signals traveling laterally, the form of signals output from that layer, that is, the input to the filling-in layer, must change over time. Given that lateral spread must have a finite velocity, at the very first instant that a pattern replaces a uniform field, the output from the lateral interaction layer will not have been influenced by lateral interactions (unless we postulate something special, such as that the signals travel laterally much faster than they travel straight through). Thus the initial input to the filling-in layers will contain a strong zero-spatial frequency component, which will significantly affect the dynamics of the output of the model.

Another important aspect of the visual system that must be considered in connection with the dynamics of the MC+FI model is the disappearance of stabilized images. Suppose each iteration takes one millisecond and a pattern, if stationary, were to disappear in less than 0.7 seconds. Then, for the parameters (arbitrarily) chosen for the results displayed here, the pattern would disappear before the standard contrast pattern could reverse. In general, when the effects of continuous repetitive eye movements are introduced into the model, the output achieves a stable dynamic equilibrium that resembles the results of a stationary pattern after an intermediate number of iterations.

The general behavior of the MC+FI model is very robust in the sense that changes in the parameters, e.g. shapes or sizes of receptive fields, number of channels, etc. affect the number of iterations at which reversals occur but have little other effect. There is one exception, however. In the filling-in stage, the output of each element is the sum of its direct input and the average, that is, the sum divided by the number, of its neighbors. In the present simulations, for example, only four-connected neighbors are used and the 'number" thus equals 4. If that number is increased or decreased only slightly, e.g. 4.1, the shapes of the resulting output patterns are strongly affected.

CONCLUSIONS

1) Filling-in, that is, the restoration of low spatial frequencies, can be accomplished by a simple neural positive feedback circuit in which, at each point, the incoming band-pass filtered signal is added to the average of the outputs at neighboring points.

2) This is an iterative process. At equilibrium the output pattern mimics the input illuminance distribution, but during the build-up toward equilibrium, effects such as simultaneous contrast, the apparent scalloping of the lightness of stair-step patterns, and contrast constancy are reproduced. These same effects occur during the dynamic equilibrium that is established with small repetitive eye movements.

3) If the input to the filling-in process is taken, at each point, as the sum of the outputs of a set of band-pass filters (or receptive fields or "channels") of differing size, then the model seems to account for a wider range of brightness phenomena, including "contrast reversal" or "assimilation".

Note: I'm indebted to Julie Lindholm for her very perceptive comments on this work.

APPENDIX

The results above were all derived by simulation of the model. The following is pseudocode for simulation of the MC+FI model.

Set up four two-dimensional arrays of size xsize, ysize. Call them:
ILL( x,y) { illuminance }
LIOUT1(x,y) { output of each single channel }
LIOUT2(x,y) ( sum of outputs of multiple channels }
FIOUT( x,y) { filling-in stage output }

{ First, generate an input pattern. The pattern in Figure 4 will be illustrated here. }

for x = 1 through xsize
for y = 1 through ysize
ILL(x,y) = L1 { surround luminance }
next y
next x
for x = w through xsize - w { let the border by w pixels wide }
for y = w through ysize - w
ILL(x,y) = L2 { center luminance }
next y
next x

{ Now perform lateral inhibition for a series of six receptive fields of increasing size }
{ the surround pixels always have weight -1. The single center has a plus weight equal to the absolute value of the sum of the weights of the surrounding pixels}

for side = 3 through 13 incrementing by 2 { e.g. if side = 5 then kernel is 5x5 }
sside = (side-1)/2
for x = sside + 1 through xside - sside
for y = sside + 1 through yside - sside
negative_sum = 0
for xx = x - sside through x + sside { sum up the inhibitory signals }
for yy = y - sside through y + sside
negative_sum = negative_sum + ILL(xx,yy)
next yy
next xx
negative_sum = negative_sum - ILL(x,y) { the center pixel is excitatory. Subtract it}
LIOUT1(x,y) = (side^2 - 1)*ILL(x,y) - negative_sum
next y
next x

{ now sum up the channel just done with the preceding sum }

for x = 1 through xsize
for y = 1 through ysize
LIOUT2(x,y) = LIOUT1(x,y) + LIOUT2(x,y)
next y
next x
next side

{ the output of the lateral inhibition stage (and the input to the filling-in stage) is now complete }
{ the following code uses the four-connected neighbors. The results are the same using eight-connected neighbors }

for i = 1 through iter { number of iterations }
for x = 2 through xsize -1
for y = 2 through ysize - 1
neighbor_sum = FIOUT(x-1,y) + FIOUT(x+1,y) + FIOUT(x,y-1) + FIOUT(x,y+1)
FIOUT(x,y) = LIOUT2(x,y) + neighbor_sum/4
next y
next x
next i

The resulting array, FIOUT( ), represents brightness.

Notes:

1) The output of the lateral inhibition stage, as modeled in this code, is symmetrical around zero, that is, includes negative values. For a more literal simulation of the model, these values should be offset so that they never go below zero. In that case, the offset must be removed before the filling-in stage, and when that is done, the results are identical to those for the simulation above.

2) In the above code, the output (FIOUT) can also go negative, an unrealistic modeling. This can be corrected by inserting the following line immediately preceding the last "next y":

if FIOUT(x,y) < 0 then FIOUT(x,y) = 0This will have an effect only during very early iterations.

3) The above code implicitly assumes that the lateral inhibition stage forms its output instantaneously. Because lateral interactions require neural signals to travel laterally, it must actually take time for the equilibrium pattern to develop. For meaningful simulations of the temporal behavior of the model, this effect should be included.

4) As written above, the filling-in stage is unrealistic in that the recurrent signals from immediate neighbors are treated as though they required no travel time. Simulating travel time complicates the code somewhat and has a negligible effect on the results. (This travel time, as well as the dynamics of the lateral inhibition stage, does affect the dynamics of the output under certain conditions. Those effects will be the subject of a subsequent publication.)

5) The receptive fields of all sizes in this simulation all have a single central pixel as the positive input; only the negative surround size varies. Scaling the size of the center to stay in proportion with the surround changes the results in only very small ways.

REFERENCES

1) Kuffler, S.W. (1953) Discharge patterns and functional organization of mammalian retina. J. Neurophysiol. 16, 37-68.

2) Hubel, D and Wiesel, T. (1968) Receptive fields and functional architecture in the cat's visual cortex. J. Physiol. 195, 215-243.

3) Teller, D.Y. and Pugh, E.N.Jr.. (1983) Linking propositions in color vision. In Colour Vision: Physiology and Psychophysics. eds. J.D.Mollon and L.T.Sharpe. London:Academic Press.

4) Krauskopf,J. (1963) Effect of retinal image stabilization on the appearance of heterochromatic targets. J. Opt. Soc. Amer , 53, 741-44.

5) Walls, G. (1954) The filling-in process. Amer.J.Optom. 31, 329-340.

6) Gerrits, H.J.M and Timmerman,G.J.M.E.N. (1969) The filling-in process in patients with retinal scotomata. Vision Res. 9, 439-442.

7) Kanizsa, G. (1979) Organization in Vision: Essays in Gestalt Perception. New York:Praeger Press.

8) Ramachandran, V.S., Gregory, R.L.and Aiken,W. (1992) Perceptual fading of visual texture border. Vision Res. 33, 717-721.

9) Gerrits,H.J.M and Vendrik, A.J.H (1970) Simultaneous contrast, filling-in process and information processing in man's visual system. Exp. Brain Res. 11, 411-430.

10) Grossberg, S. (1983) The quantized geometry of visual space: The coherent computation of depth, form, and lightness . Behavioral and Brain Sciences. 6, 625-692.

11) Cohen, M.A. and Grossberg, S. (1984) Neural dynamics of brightness perception:Features, boundaries, diffusion, and resonance. Perception and Psychophysics. 5, 428-456.

12) Grossberg, S. and Todorovic, D. (1988) . Neural dynamics of 1-D and 2-D brightness perception: A unified model of classical and recent phenomena. Perception and Psychophysics. 43, 241-277.

13) Najand,S., Blough,D., and Healey,G. (1996). Forward and inverse model for the intensity-dependent spread filter. J.Opt.Soc.Amer. 13, 1305-1314.

14) Cornsweet, T.N. and Yellott, J.Y.Jr. (1985) Intensity-dependent spatial summation. J.opt.Soc.Amer,A, , 2, 1769-1786.

15) Crespo,J. and Schafer,R.W. (1997). Edge-based adaptive smoothing . Optical Eng., 36, 3081-3092.

16) Cornsweet, T.N. (1970) Visual Perception. New York: Academic Press.

17) Craik, K. (1940) Visual adaptation. Unpublished doctoral thesis, Cambridge University.

18) O'Brien, V. (1958) Contour perception, illusion, and reality. J.Opt.Soc.Amer. 48, 112-119.

19) Arrington, K.F. (1996) Directional filling-in. Neural Computation. 8, 300-318.

20) Davidson, M. and Whiteside, J. (1971) Human brightness perception near sharp contours.J.opt.Soc.Amer. 61, 530-536.

21) Georgeson, M. A. and Sullivan, G. O. (1975) Contrast Constancy: deblurring in human vision by spatial frequency channels. J. Physiol. 252, 627-656.

22) DeValois.R.L and DeValois, K. K. Spatial Vision (1988) Oxford University Press

23) Economou, E., Annan,V.J., and Gilchrist, A.L. (1998) Contrast depends on anchoring in perceptual groups. Abstract, Ivest. Ophthal. and Vis. Science., 39, S857.

24) Gilchrist,A., Kossyfidis,C., Bonato,F., Agostini,T., Cataliotti,J., Li,X., Spehar,B., and Szura,J. (1999) An anchoring theory of lightness perception. Psych. Rev. 106(4), 795-834.

25) Graham, Norma. (1989) Visual Pattern Analyzers. Oxford University Press

26) Adelson, D. (1998) On his web site www-bcs.mit.edu/people/adelson

Figure Captions

Figure 1. (a) An ordinary scene. (b) The same scene after processing that attenuates low spatial frequencies. The scene was processed by convolving it with the following mask:

-1 -1 -1 -1 -1
-1 -1 -1 -1 -1
-1 -1 24 -1 -1
-1 -1 -1 -1 -1
-1 -1 -1 -1 -1

Figure 2. Schematic representation of the model. If the input image has an illuminance profile like that at the top right, the profiles of the outputs of the next two stages will be as plotted.

Figure 3. A more detailed representation of the model, as explained in the text. Cell bodies are represented as circles. The output of the lateral interaction layer is assumed to yield signal levels above and below a resting level. In the second stage, resting activity is represented as if cells with that activity level contribute their signals through synapses. The "S/N" labels are intended to indicate that the signals between cells in the filling-in layer are attenuated so that the total signal arriving recurrently at each cell equals the mean of the outputs of its neighbors. Linearity is assumed in the sense that excitation is taken to be additive and inhibition subtractive.

Figure 4. A simple input pattern for the model is shown at the upper left and its profile (half way up the vertical dimension) is plotted below it. The second column shows the output of the lateral inhibitory layer. The third and fourth columns show the output of the entire model after 200 and 1200 iterations of the filling-in stage. Note that the lightness values and profiles are scaled so that all output patterns and profiles have the same range. Therefore, it is not apparent in the figure that, as iterations proceed, the absolute values generally increase. The same scaling in included in all subsequent figures of this kind.

Figure 5. Performance for an input pattern containing cumulative steps of illuminance.

Figure 6. The same input pattern as in Figure 5, but the pattern is presented for a large number of iterations and made to shift back and forth periodically, simulating small saccadic eye movements. The output pattern in the third column immediately follows a movement to the left of size equaling one pixel. The fourth column shows the result after an additional nine iterations. This set of patterns represents a dynamic equilibrium. That is, if, as the process is iterating this input pattern is shifted horizontally back and forth, one shift after each ten iterations, then, after an initial set of iterations (about 100 in the present simulation), the output patterns in this figure will be the same for all further iterations.

Figure 7. Simultaneous contrast and Mach Bands.

Figure 8. The input pattern in Figure 7 except that its border has been eliminated. This result looks nice but is misleading, as explained in the text.

Figure 9. A sinusoidal input produces a sinusoidal output.

Figure 10. Spatial Contrast Sensitivity Function for the lateral interaction stage. (The high frequency region has been omitted for simplicity.)

Figure 11. Spatial Contrast Sensitivity Function for the filling-in stage by itself after three different numbers of iterations. As equilibrium is approached, sensitivity to zero frequency approaches infinity.

Figure 12. Spatial Contrast Sensitivity Function for the model as a whole for three different numbers of iterations. At equilibrium, the attenuation of low frequencies by the lateral interaction stage is balanced by the increasing gain of the filling-in stage. (Note that, as is evident from the change in shape of the curves, the rate of approach to equilibrium varies with spatial frequency.)

Figure 13. Two patterns demonstrating a reversal of the contrast expected from classical theories and the model as described so far. The two gray squares in (a) and the two gray lines in (b) have equal luminances. Pattern (a) is from DeValois and DeValois (22) and (b) is a simplified version of one by Economou et al (23).

Figure 14 Schematic representation of one unit or pixel in the MC+FI model. All of the plus center receptive fields add their signals to the cell in the filling-in layer and all the minus center fields subtract their signals (that is, inhibit).

Figure 15 The output of the MC+FI model for the pattern shown as iterations proceed. The vertical axis is the output of the model (and presumed brightness). The larger dots plot the output at the center of the gray square on the left and the smaller ones at the center of the gray square on the right. Initially the square on the left has a higher output, as would be expected from classical contrast theory, but after about 100 iterations, the relationship reverses.

Figure 16 The output of the model for the checkerboard after 300 iterations. The series of six images across the center are the images output from each of the six channels before they are summed, receptive field size increasing from left to right. The blurred image at the bottom shows the output of the filling-in stage after 300 iterations. Its profile (through the centers of the two gray squares) is plotted at its right. The dotted vertical lines indicate the positions of the two gray squares, and the output values at the centers of the gray squares are displayed in the lower right corner. See the text for a discussion of the blurring.

Figure 17 The output of the model for the gray stripes in the pattern shown. Contrast starts as expected from classical contrast theory but after about 100 iterations, it reverses.

Figure 18 The output of the model after 300 iterations for the stripe pattern.

Figure 19 The performance of the model for a classical simultaneous contrast pattern. The outputs are initially in accord with classical contrast but they reverse after about 700 iterations.

Figure 20 The output after 300 iterations.

Figure 21 The effect of changing the number of stripes that flank the gray stripes in the pattern of Figure 13(b), as measured psychophysically. The vertical axis is a measure of the apparent difference in lightnesses of the two gray stripes, and the numbers below the bars are the numbers of flanking stripes. Values above zero represent the classical contrast expectation. From Economou et al (23).

Figure 22 The corresponding result for the MC+FI model.

Figure 23 The behavior of the model for a pattern that yields "assimilation". All gray bars have the same luminance but the gray bars separated by white lines appear lighter than those separated by dark lines and the model gives the same result after about 190 iterations. (The larger dots plot the output corresponding to the centers of the gray bars on the left, and the smaller dots, the gray bars on the right.)

Figure 24 This pattern is a simplified version of one by Adelson (26). The white squares within the darkened region have the same luminance as the dark squares outside that region, but the dark squares outside the region appear darker than the white ones within the region. The model shows the same behavior during the first 650 iterations.

Figure 25 In Figure 24, the number of channels is six. In this figure, the same pattern is input but the channel with the largest receptive field is omitted.