|
There is ample electrophysiological evidence
that the receptive fields of most optic nerve fibers manifest center-surround
antagonism (1). The fact that the firing rates of most cells in
visual cortex are not significantly affected by the level of illumination
when the illumination is uniform over their receptive field (2)
implies that the strengths of the positive and negative signals
within each field are approximately equal. This results in a contrast
sensitivity function such that sensitivity approaches zero as spatial
frequency approaches zero, and, in an ordinary scene, the expected
"neural output image" will depart from a resting level
only in regions containing luminance gradients or steps.
Figure 1b is the scene in Figure 1a after processing
of the sort that might be expected from this kind of center-surround
antagonism. If signals manifesting center-surround antagonism were
monotonically related to our perception of brightness, then the
scene in Figure 1a should look like the image in Figure 1b, and
it clearly does not.


One possible reason for this discrepancy involves
the difficult philosophical question of the relationship between
physiological events and experience (3). I choose not to discuss
that question here, but instead to accept the following as a working
assumption: that there is a layer somewhere in the nervous system
whose spatial arrangement is topologically related to the retina
(and thus to the scene) and such that the activity level at each
point is monotonically related to the apparent brightness at the
corresponding point in the scene. I chose this assumption because
it often seems to provide satisfying explanations for perceptual
phenomena. Given this assumption, the fact that scenes do not appear
as in Figure 1b is a puzzle.
It might be that fibers without center-surround
antagonism provide the signals for brightness perception, center-surround
fibers constituting a parallel channel for other uses, e.g. seeing
edges. That possibility is refuted by a very large body of evidence
about the appearance and disappearance of stabilized retinal images.
Suppose a subject views a uniform disk on a
darker background. When the image of the disk is stabilized, it
quickly disappears. That does not mean that the region of the disk
looks black or like a hole. Instead, that region takes on the same
brightness (or hue (4)) as the background. If the disk is then shifted
very slightly across the retina, the entire disk will reappear (and
then disappear again), even though the motion has only affected
receptors near the its edges. The appearance of the center of the
disk depends only on what happens at unstabilized edges around the
disk. This is, of course, what happens during normal unstabilized
viewing of a uniform disk. Very small eye movements, that produce
changes in retinal illuminance only at the edges of the image of
the disk, are sufficient to determine the brightness of the whole
disk. There is no way that photoreceptors under the image of the
center of the disk know whether it is stabilized or not; yet the
center looks entirely different depending upon whether or not it
is stabilized. Somehow the brightness (and color) of any uniform
region depends exclusively on events at (unstabilized) edges.
That observation is consistent with the presumption
that neural activity manifesting center-surround antagonism is in
fact the activity that supports the perception of brightness, and
that there is some additional mechanism that fills in between edges.
The term "filling-in" has been used
to apply to a variety of visual conditions, including filling-in
across the blind spot (5), across retinal scotomas (6), illusory
contours(7), filling-in of texture (8), etc. In this paper, it refers
to a process that causes the appearances of regions of uniform luminance
to be derived from their surrounding contours.
Gerrits and Vendrik (9) proposed a qualitative
model for filling-in for which Grossberg and his coworkers have
developed a rigorous mathematical model, consisting of a set of
differential equations (10,11,12). (See, also, related work in digital
image processing, e.g. 13,14). Although that model does make reference
to physiological processes, the direct relationship between the
equations and physiologically realistic components is not easy to
discern. Here, a new model is presented that is appreciably simpler
than Grossberg's, and is constructed of physiologically plausible
working parts.
Figure 2 shows the stages of the model. The
first stage is a two-dimensional layer of photoreceptors. As an
illustration, the input illuminance pattern is taken as a uniform
disk on a less bright background. A cross-section of the input pattern
is plotted on the right. The output of each photoreceptor feeds
into a neural network that generates receptive fields with center-surround
antagonism, the total excitation in each field equaling the total
inhibition so that the response to uniform illumination is zero.
(This stage may actually be distributed among several layers in
the visual system. In fact, if the mechanism is recurrent inhibition,
it must be distributed because a single stage of recurrent lateral
inhibition that is strong enough to reduce the response to a uniform
field to zero cannot be stable. To keep the model as simple as possible,
it is represented here as a single feed-forward stage.) The output
of this stage will then be as illustrated.

The final stage performs filling-in; its output
at each point is taken to be monotonic with the apparent brightness
of the scene at the corresponding point.
Figure 3 diagrams one element (pixel) in the two-dimensional array
of elements. Again, the first stage is simply a photoreceptor. For
simplicity in the simulations to follow, the output of each photoreceptor
is assumed to be linearly related to its illuminance.

The receptors feed input to a standard version
of lateral inhibitory interaction, each pixel inhibiting its neighbors
in proportion to its excitation as a decreasing function of distance.
Each receptive field in this stage has an excitatory center and
an inhibitory surround, and uniform illumination produces activity
at a "resting" level. For the present discussion, it will
be assumed that the sizes of all receptive fields are the same.
That assumption will be replaced with a more realistic one later.
(The model performs in essentially the same way if this stage is
assumed to manifest Intensity-Dependent Spread (15) so long as the
resting level is properly handled.)
The next stage creates an off-center unit in
addition to the on-center unit. (There are surely other ways to
generate on- and off-center units. However, the particular way is
not relevant here.)
Each element in the final stage performs the
following operations. Signals from the off-center channel are subtracted
from the on-center channel signals, and the result is added to the
average of the outputs of the immediately neighboring output pixels.
Note that this part of the system is recurrent. That is, signals
are fed back from the output of each element to add to the inputs
of its neighbors. ("S/N" in the figure denotes that the
signal fed back from each output pixel is divided by the number
of pixels feeding the pixel in question, to obtain the average.)
(It will be argued in a separate paper that
an additional stage must precede the filling-in stage, a stage that
acts as a valve, passing the signals only for a short time after
any change in input. Such a stage is necessary to account for the
disappearance of stabilized images.)
All of the results to be presented here are generated
by simulation of the model. (Pseudocode for the simulation is presented
in the appendix.)
The filling-in stage of the model provides interactions
only among immediate neighbors, and the output pattern thus develops
slowly and approaches equilibrium over a large number of iterations.
Here are a few examples.
Figure 4 shows the action of the model for a uniformly
illuminated rectangle on a darker background. At the top of the
column labeled "input" is the illuminance pattern and
below it is the illuminance cross-section. The image at the top
of the column labeled "LI Output" is the output of the
lateral inhibition stage, with its output cross-section plotted
below. The image at the top of the column labeled "FI output
(200)" is the output of the filling-in stage after 200 iterations,
and its cross-section is plotted below. The rightmost column shows
the filling-in output as it approaches equilibrium (1200 iterations).Figure
4
Note that the output patterns and their cross-sections
have been scaled to span equal ranges in this and all subsequent
similar figures. Thus the lightnesses and profiles cannot be compared
among the various figures and numbers of iterations.
Next, consider what has come to be called the Craik-O'Brien-Cornsweet
illusion. Cornsweet (16) pointed out that the output of the lateral
inhibition stage must be essentially the same for that input pattern
as for a step, which explains why this pattern and the step look
the same. (This is also approximately true for Craik's version (17)
and approximately approximately true for O'Brien's (18).) Why both
patterns look like a step, according to the model, is shown in

A pattern with cumulative steps is shown in Figure
5, a kind of pattern that Grossberg's model does not properly handle
(but Arrington's modified version does (19)). Note that, while the
output pattern is developing, all the steps except the top one are
scalloped, in agreement with the actual appearance of the luminance
pattern. At equilibrium, the output is simply a slightly low-pass-filtered
copy of the input.

Figure 6 shows the results at (dynamic) equilibrium
when the input pattern is that of Figure 5 and has been present
for a large number of iterations, but is shifted back and forth
periodically in a way that mimics the effects of small saccadic
eye movements. Here the scalloping appears and then gradually fades
after each movement.

Figure 7 is a combination of a classical simultaneous
contrast pattern and one that produces Mach Bands. The output after
200 iterations illustrates both results. Simultaneous contrast results
when the two small patches of equal luminance are presented against
differing backgrounds and the output of the model gives the same
result. The luminance transition between the two backgrounds is
of a form that generates Mach Bands and the output of the filling-in
stage behaves similarly.

At equilibrium in Figure 7, the output pattern is
simply a copy of the input (slightly low-pass filtered). If the
model is essentially correct, then a comparison of the outputs after
200 iterations and at equilibrium with the actual appearance of
the pattern suggests that, during normal viewing, equilibrium is
not normally reached. Either it is precluded by an eye movement,
or if not, the (stabilized) image will disappear.
Figure 8 shows the result for the same pattern as
in Figure 7 except that the outer border has been removed. In the
simulation, this is equivalent to making the pattern infinitely
large by extending the illuminances all around the edges to infinity
in directions perpendicular to the edges. Note that although under
this condition, the equilibrium results have the desirable form,
the condition itself is unrealistic. In general, simulations of
the model in which the pattern of interest lies on a background
or border yield results at equilibrium that are simply copies of
the input. (See the CSF in Figure 12, below, which is essentially
flat for low and middle frequencies at equilibrium.) These results
are, in general, different from the equilibrium results when the
border is omitted, and the same effect might well be expected for
any model of filling-in. Therefore, it is important, in testing
such models, that a realistic border be included.

The input in Figure 9 is a sine-wave grating, and
so is the output. Here is an input pattern that contains no uniform
regions, but at equilibrium the model preserves the sine-wave at
the output. This property of the model holds for sine-waves of all
frequencies and amplitudes (within the limits imposed by digitization).
Note that for a sine-wave of this particular frequency, the equilibrium
shape is approached very quickly, but because the profiles in the
figure are scaled to have equal ranges, the fact that the amplitude
rises as iterations progress is hidden.

Imagine looking at a uniformly illuminated disk on
a darker background. If the center of the disk looks brighter than
the background even when the disk is larger than the largest receptive
field it illuminates (it does (20)), then it must be true that the
response of the visual system to zero spatial frequency is not zero.
If it were zero, the center would have the same brightness as the
background. This seems inconsistent with the fact that the human
spatial contrast sensitivity function (CSF) approaches zero or close
to it as frequency approaches zero. The model described here resolves
that apparent inconsistency, as explained below.
Figure 10 shows the CSF for the lateral interaction
(inhibition) stage of the model. It falls to zero at zero spatial
frequency. (The high-frequency end of the CSF depends on optical
properties and other factors, like the width of the excitatory region,
that are not relevant to the present discussion. Therefore, high-frequency
behavior will be ignored here.

Figure 11 shows the CSF for the filling-in stage
alone, (sine waves are injected directly into the filling-in stage)
at equilibrium and at two earlier times in its development as iterations
proceed. Note that, at equilibrium, as spatial frequency approaches
zero, sensitivity approaches infinity. That is a result of the positive
feed-back in this stage.

Figure 12 shows the CSF for the complete system at
equilibrium and at two earlier times. At equilibrium, as the spatial
frequency approaches zero, the fall in lateral inhibitory stage
output is balanced by the increasing sensitivity of the filling-in
stage, resulting in a response curve that is fairly flat all the
way to zero. (That the function shows finite gain at zero is evident
from the fact that uniform regions of the input that are larger
than the size of the receptive fields produce uniform non-zero outputs,
e.g. Figure 4. At zero spatial frequency, the lateral inhibitory
stage output, and thus the input to the filling-in stage, is zero;
the filling-in stage multiplies zero by infinity, and the result
is finite!) Prior to reaching equilibrium, the system CSF shows
that it attenuates low frequencies more strongly.

There are some important similarities between the
present model and that of Grossberg and coworkers. Both models assume
a stage with center-surround antagonism and both assume on- and
off-center units. Grossberg (10), following Gerrits and Vendrik
(9), then proposes that the signals from these units diffuse over
a homogeneous two-dimensional layer. The action of the elements
in the filling-in stage of the present model can also be considered
to represent diffusion of the incoming signals. That is, one way
to simulate diffusion through a homogeneous layer in a system that
actually consists of discrete elements is to cause the output of
each element to be the sum of its input and the mean of its neighbors'
outputs.
One difference between the present model and those
of Grossberg and coworkers is that the present model is phrased
in terms of working parts that are simply excitatory and inhibitory
synapses (with neurons that connect them). A more important difference
is this. Grossberg and coworkers, again based on Gerrits and Vendrik's
qualitative model, assume that there is a mechanism by which diffusion
is blocked by barriers at places corresponding to contours or edges
in the scene. This idea appreciably complicates the equations, and
it seems difficult to imagine a physiological mechanism that would
generate the right kind of barriers. In the present model, no barriers
are required. (The presence of barriers in Grossberg's model produces
incorrect results when the scene contains a succession of luminance
steps, like those in Figure 5, below. Arrington (19) solved this
problem by modifying the basic Grossberg model to make the barriers
selectively leaky. This further complicates the equations, but does
make that model more similar to the one presented here.)
The CSF for the complete model (Figure 12) is relatively
flat and thus appears inconsistent with the low frequency attenuation
that is characteristic of the typical psychophysically determined
human CSF. But note that the "standard" human CSF is based
on measurements of threshold contrast, and no threshold was included
in the model above. To account for threshold behavior, a threshold
mechanism must be included in the model. It we accept the plausible
assumption that the threshold for contrast is set between the lateral
inhibition and the filling-in stages, then the CSF at threshold
must have the characteristics of the CSF of the lateral inhibition
stage (Figure 9) while the CSF measured at contrasts well above
threshold will have the shape of the overall system CSF (Figure
12). The flattening of the CSF at low frequencies and above-threshold
contrasts is well documented and is called "contrast constancy
(21).
This model requires that center-on and center-off
responses are both present (as does Grossberg's). (If neuronal firing
rates could be negative, then the generation of the two classes
of receptive fields would not be required for this model, as may
be clear from the discussion in the appendix.) It is interesting
to speculate that complementary classes of receptive fields might
have evolved because they enable filling-in.
The iteration that naturally occurs in the filling-in
stage imposes strong temporal properties on its output, properties
only minimally addressed here. Work on such issues as temporal CSFs,
temporal masking effects, etc, could be very interesting.
A simple model can be constructed, consisting only
of excitatory and inhibitory synapses and their connecting neurons,
that manifests behavior closely mimicking many phenomena of human
brightness perception. It consists of a stage manifesting strong
lateral inhibition or its equivalent, followed by a stage that combines
the output of the first stage with recurrent summation among neighboring
elements. When it is assumed that the threshold for contrast is
determined between these two stages, both the low frequency attenuation
of threshold CSF measurements and the flattening of the CSF at higher
contrasts (contrast constancy) result.
BUTThe two gray squares in Figure 13(a) have identical
reflectances. According to classical contrast theory, the gray square
on the left should appear lighter than the one on the right, since
the one on the left is bordered by dark squares and the one on the
right by light ones. When this pattern is presented as input to
the model described above, the result is in accordance with the
classical contrast prediction. However, in fact, the gray square
on the left appears darker than the one on the right.

Similarly, the two gray bars in Figure 1(b) have
identical reflectances. By classical contrast theory and in the
output of the model described above, the gray bar on the left should
look darker than the one on the right. The appearance of these patterns
clearly demonstrates that the theory described above is wrong.
Gilchrist (24) explains the phenomena in these figures
and many related ones in term of what he calls "Anchoring Theory".
He suggests that the visual system parses scenes into groups according
to Gestalt principles, then assigns lightnesses relatively within
each group, and the final lightness at each point is a weighted
average of all the lightnesses computed for that point as a member
of various groups.
The theory presented here is an explanation in terms
of simple physiological mechanisms. There is surely no fundamental
inconsistency between explanations in terms of physiological mechanisms
and in terms of "Gestalt Principles". Many early Gestalt
theorists looked for such physiological explanations for their principles.
It may be correct to say that the theory presented here is a description
of a set of plausible neural mechanisms that perform the computations
required by Anchoring Theory.
It seems likely that some aspects of brightness perception
are influenced by cognitive or "top down" processes. The
model presented here does not seem to fit the idea of a "cognitive"
process. Nor is it "top down", if that term implies that
signals at some level feed back to modulate earlier levels. (There
is feedback in the filling-in stage but that seems too local to
be called "top down".)
The model presented earlier consists of a lateral
interaction stage that performs band-pass filtering, a stage that
creates plus-center and minus-center receptive fields, and a filling-in
stage. The following modifications cause the model to produce correct
predictions for the patterns in Figures 13(a) and (b).
First assume that there is a variety of receptive
field sizes in the visual system, all of which exhibit center-surround
antagonism, and all of which act upon the input image in parallel.
That is, the input image is convolved with a set of masks; the algebraic
sum of the weights of each mask is zero (to attenuate zero frequency
to zero), and the masks vary in width. To put it another way, assume
that there exists a set of parallel neural images, processed by
filters with pass-bands whose peaks are spread along the frequency
axis. This idea has a venerable history under the label "multiple
channel theory" (25).
Next, assume that the outputs of all of these various
filters are simply added, pixel by pixel to create a single summed
image. This summed image is then input to the filling-in stage.
This arrangement is diagrammed in Figure 14. In other words, the
MC+FI model is essentially identical with the one presented above
except that the input to the filling-in stage is the sum of the
outputs of a set of band-pass filters with differing pass-band frequencies.

Figure 15 shows a plot of the output of MC+FI for
the two gray squares in the checkerboard as a function of the number
of iterations of the filling-in process. (The specific parameter
values used to obtain all of the results presented here are given
in the appendix.) The outputs are initially those predicted by classical
contrast theory, but as the iterations proceed, the lightnesses
reverse, and remain reversed as the process approaches equilibrium.

Figure 16 shows output patterns for the checkerboard.
The set of six images across the middle are output patterns for
six successively larger receptive fields and the pattern at the
bottom is the output of the MC+FI theory after 300 iterations. (As
discussed in the appendix, in this particular simulation the plus
centers of all the receptive fields are set at the same size. Only
the negative surrounds vary. If the sizes of the centers are made
to increase with the surrounds, the results are qualitatively the
same.)

Figures 17 and 18 show that the stripe pattern of
Economou et al (23) produces a similar result


The result (Figure 19 and 20) is also similar for
the classical contrast pattern, except that the reversal occurs
only after a much greater number of iterations. Thus, if the instantiation
of MC+FI theory with this particular set of parameters (as described
in the appendix) is right, then the appearances of the gray areas
in the three figures depend on the state of the system during a
period of time that begins after about 100 iterations and ends after
about 700 iterations.
In general, the initial behavior of the MC+FI model
produces the same effects as the (single channel) model described
above, as illustrated by the example in Figures 19 and 20. However,
as the iterations proceed, the results change in ways and at speed
that depend upon the stimulus configuration.


Because Economou et al attribute the strength of
the illusion in Figure 13(b) to the degree to which the gray stripes
are perceptually grouped with the flanking stripes, they varied
the number of stripes that flank the test (gray) stripe, a process
they describe as "articulation variation" and correctly
predicted the strength (and direction) of the illusion (23). Their
result is shown in Figure 21. Figure 22 shows the corresponding
result for the MC+FI model.

As a general rule, it appears that test patterns
lying on dissimilar backgrounds yield classical contrast while patterns
imbedded in repetitive patterns tend toward reversed contrast or
assimilation, and the result in Figure 22 can be thought of as demonstrating
the transition between the two states.

The pattern in Figure 23 is a classical demonstration
of what is usually called "assimilation", another example
of a reversal of classical contrast. The plot in Figure 23 shows
the results for the MC+FI model.

(In these figures, the high spatial frequency attenuation
that results primarily from spatial summation in the band-pass stages
produces very noticeable blur. However, the prominence of the blur
depends upon the relationship between the widths of the features
in the patterns and the widths of the receptive fields. The receptive
fields in the simulations are 3,5,7,9,11,and 13 pixels in width.
In figures 16 and 18, the stripes and checks are only about ten
pixels in width. In Figure 23, the thinner stripes are only one
pixel in width. If the patterns were enlarged, the stripes and checks
would be wider but the blur width would remain the same, thus appearing
less prominent. To try to predict, from the MC+FI model, the amount
of blur apparent in a scene would require assumptions about the
actual densities of cell assemblies in relation to retinal image
size.)
Figure 24 shows a pattern modified from Adelson (26).
The lighter squares within the darker circular region have the same
luminance as the darker squares in the other regions, but they appear
lighter. The model shows the same result during the first 650 iterations.
(In Adelson's display, an object rests on part of the checkerboard
and the darker region with blurred edges appears to be the shadow
of the object. However, as is evident from the pattern in Figure
24, the object that casts the shadow is not required for the illusion,
nor is the (cognitive) presumption that the darker region is in
fact a shadow.)

In Figure 25, the input pattern is that in Figure
24 but the channel with the largest receptive field is eliminated,
resulting in an increase in the number of iterations during which
the illusion is maintained.

The patterns presented here represent only a very
small sample of the essentially unlimited population of patterns
that demonstrate contrast or assimilation, but one has to stop somewhere.
In that regard, it should be pointed out that the model yields correct
predictions for every one of the patterns tested so far.
The output of the model changes as the number of
iterations of the filling-in stage increases, and, as was pointed
out above, the predictions are only correct over a limited, although
broad, range of numbers of iterations. Thus the model implies predictions
about the development of the illusions over time. However, certain
aspects of the model need to be more fully developed before such
predictions are testable. For example, because the lateral interaction
stages involve signals traveling laterally, the form of signals
output from that layer, that is, the input to the filling-in layer,
must change over time. Given that lateral spread must have a finite
velocity, at the very first instant that a pattern replaces a uniform
field, the output from the lateral interaction layer will not have
been influenced by lateral interactions (unless we postulate something
special, such as that the signals travel laterally much faster than
they travel straight through). Thus the initial input to the filling-in
layers will contain a strong zero-spatial frequency component, which
will significantly affect the dynamics of the output of the model.
Another important aspect of the visual system that
must be considered in connection with the dynamics of the MC+FI
model is the disappearance of stabilized images. Suppose each iteration
takes one millisecond and a pattern, if stationary, were to disappear
in less than 0.7 seconds. Then, for the parameters (arbitrarily)
chosen for the results displayed here, the pattern would disappear
before the standard contrast pattern could reverse. In general,
when the effects of continuous repetitive eye movements are introduced
into the model, the output achieves a stable dynamic equilibrium
that resembles the results of a stationary pattern after an intermediate
number of iterations.
The general behavior of the MC+FI model is very robust
in the sense that changes in the parameters, e.g. shapes or sizes
of receptive fields, number of channels, etc. affect the number
of iterations at which reversals occur but have little other effect.
There is one exception, however. In the filling-in stage, the output
of each element is the sum of its direct input and the average,
that is, the sum divided by the number, of its neighbors. In the
present simulations, for example, only four-connected neighbors
are used and the 'number" thus equals 4. If that number is
increased or decreased only slightly, e.g. 4.1, the shapes of the
resulting output patterns are strongly affected.
1) Filling-in, that is, the restoration of low spatial
frequencies, can be accomplished by a simple neural positive feedback
circuit in which, at each point, the incoming band-pass filtered
signal is added to the average of the outputs at neighboring points.
2) This is an iterative process. At equilibrium the
output pattern mimics the input illuminance distribution, but during
the build-up toward equilibrium, effects such as simultaneous contrast,
the apparent scalloping of the lightness of stair-step patterns,
and contrast constancy are reproduced. These same effects occur
during the dynamic equilibrium that is established with small repetitive
eye movements.
3) If the input to the filling-in process is taken,
at each point, as the sum of the outputs of a set of band-pass filters
(or receptive fields or "channels") of differing size,
then the model seems to account for a wider range of brightness
phenomena, including "contrast reversal" or "assimilation".
Note: I'm indebted to Julie Lindholm for her very
perceptive comments on this work.
The results above were all derived by simulation
of the model. The following is pseudocode for simulation of the
MC+FI model.
Set up four two-dimensional arrays of size xsize,
ysize. Call them:
ILL( x,y) { illuminance }
LIOUT1(x,y) { output of each single channel }
LIOUT2(x,y) ( sum of outputs of multiple channels }
FIOUT( x,y) { filling-in stage output }
{ First, generate an input pattern. The pattern in
Figure 4 will be illustrated here. }
for x = 1 through xsize
for y = 1 through ysize
ILL(x,y) = L1 { surround luminance }
next y
next x
for x = w through xsize - w { let the border by w pixels wide }
for y = w through ysize - w
ILL(x,y) = L2 { center luminance }
next y
next x
{ Now perform lateral inhibition for a series of
six receptive fields of increasing size }
{ the surround pixels always have weight -1. The single center has
a plus weight equal to the absolute value of the sum of the weights
of the surrounding pixels}
for side = 3 through 13 incrementing by 2 { e.g.
if side = 5 then kernel is 5x5 }
sside = (side-1)/2
for x = sside + 1 through xside - sside
for y = sside + 1 through yside - sside
negative_sum = 0
for xx = x - sside through x + sside { sum up the inhibitory signals
}
for yy = y - sside through y + sside
negative_sum = negative_sum + ILL(xx,yy)
next yy
next xx
negative_sum = negative_sum - ILL(x,y) { the center pixel is excitatory.
Subtract it}
LIOUT1(x,y) = (side^2 - 1)*ILL(x,y) - negative_sum
next y
next x
{ now sum up the channel just done with the preceding
sum }
for x = 1 through xsize
for y = 1 through ysize
LIOUT2(x,y) = LIOUT1(x,y) + LIOUT2(x,y)
next y
next x
next side
{ the output of the lateral inhibition stage (and
the input to the filling-in stage) is now complete }
{ the following code uses the four-connected neighbors. The results
are the same using eight-connected neighbors }
for i = 1 through iter { number of iterations }
for x = 2 through xsize -1
for y = 2 through ysize - 1
neighbor_sum = FIOUT(x-1,y) + FIOUT(x+1,y) + FIOUT(x,y-1) + FIOUT(x,y+1)
FIOUT(x,y) = LIOUT2(x,y) + neighbor_sum/4
next y
next x
next i
The resulting array, FIOUT( ), represents brightness.
Notes:
1) The output of the lateral inhibition stage, as
modeled in this code, is symmetrical around zero, that is, includes
negative values. For a more literal simulation of the model, these
values should be offset so that they never go below zero. In that
case, the offset must be removed before the filling-in stage, and
when that is done, the results are identical to those for the simulation
above.
2) In the above code, the output (FIOUT) can also
go negative, an unrealistic modeling. This can be corrected by inserting
the following line immediately preceding the last "next y":
if FIOUT(x,y) < 0 then FIOUT(x,y) = 0This will
have an effect only during very early iterations.
3) The above code implicitly assumes that the lateral
inhibition stage forms its output instantaneously. Because lateral
interactions require neural signals to travel laterally, it must
actually take time for the equilibrium pattern to develop. For meaningful
simulations of the temporal behavior of the model, this effect should
be included.
4) As written above, the filling-in stage is unrealistic
in that the recurrent signals from immediate neighbors are treated
as though they required no travel time. Simulating travel time complicates
the code somewhat and has a negligible effect on the results. (This
travel time, as well as the dynamics of the lateral inhibition stage,
does affect the dynamics of the output under certain conditions.
Those effects will be the subject of a subsequent publication.)
5) The receptive fields of all sizes in this simulation
all have a single central pixel as the positive input; only the
negative surround size varies. Scaling the size of the center to
stay in proportion with the surround changes the results in only
very small ways.
1) Kuffler, S.W. (1953) Discharge patterns and functional
organization of mammalian retina. J. Neurophysiol. 16, 37-68.
2) Hubel, D and Wiesel, T. (1968) Receptive fields
and functional architecture in the cat's visual cortex. J. Physiol.
195, 215-243.
3) Teller, D.Y. and Pugh, E.N.Jr.. (1983) Linking
propositions in color vision. In Colour Vision: Physiology and Psychophysics.
eds. J.D.Mollon and L.T.Sharpe. London:Academic Press.
4) Krauskopf,J. (1963) Effect of retinal image stabilization
on the appearance of heterochromatic targets. J. Opt. Soc. Amer
, 53, 741-44.
5) Walls, G. (1954) The filling-in process. Amer.J.Optom.
31, 329-340.
6) Gerrits, H.J.M and Timmerman,G.J.M.E.N. (1969)
The filling-in process in patients with retinal scotomata. Vision
Res. 9, 439-442.
7) Kanizsa, G. (1979) Organization in Vision: Essays
in Gestalt Perception. New York:Praeger Press.
8) Ramachandran, V.S., Gregory, R.L.and Aiken,W.
(1992) Perceptual fading of visual texture border. Vision Res. 33,
717-721.
9) Gerrits,H.J.M and Vendrik, A.J.H (1970) Simultaneous
contrast, filling-in process and information processing in man's
visual system. Exp. Brain Res. 11, 411-430.
10) Grossberg, S. (1983) The quantized geometry of
visual space: The coherent computation of depth, form, and lightness
. Behavioral and Brain Sciences. 6, 625-692.
11) Cohen, M.A. and Grossberg, S. (1984) Neural dynamics
of brightness perception:Features, boundaries, diffusion, and resonance.
Perception and Psychophysics. 5, 428-456.
12) Grossberg, S. and Todorovic, D. (1988) . Neural
dynamics of 1-D and 2-D brightness perception: A unified model of
classical and recent phenomena. Perception and Psychophysics. 43,
241-277.
13) Najand,S., Blough,D., and Healey,G. (1996). Forward
and inverse model for the intensity-dependent spread filter. J.Opt.Soc.Amer.
13, 1305-1314.
14) Cornsweet, T.N. and Yellott, J.Y.Jr. (1985) Intensity-dependent
spatial summation. J.opt.Soc.Amer,A, , 2, 1769-1786.
15) Crespo,J. and Schafer,R.W. (1997). Edge-based
adaptive smoothing . Optical Eng., 36, 3081-3092.
16) Cornsweet, T.N. (1970) Visual Perception. New
York: Academic Press.
17) Craik, K. (1940) Visual adaptation. Unpublished
doctoral thesis, Cambridge University.
18) O'Brien, V. (1958) Contour perception, illusion,
and reality. J.Opt.Soc.Amer. 48, 112-119.
19) Arrington, K.F. (1996) Directional filling-in.
Neural Computation. 8, 300-318.
20) Davidson, M. and Whiteside, J. (1971) Human brightness
perception near sharp contours.J.opt.Soc.Amer. 61, 530-536.
21) Georgeson, M. A. and Sullivan, G. O. (1975) Contrast
Constancy: deblurring in human vision by spatial frequency channels.
J. Physiol. 252, 627-656.
22) DeValois.R.L and DeValois, K. K. Spatial Vision
(1988) Oxford University Press
23) Economou, E., Annan,V.J., and Gilchrist, A.L.
(1998) Contrast depends on anchoring in perceptual groups. Abstract,
Ivest. Ophthal. and Vis. Science., 39, S857.
24) Gilchrist,A., Kossyfidis,C., Bonato,F., Agostini,T.,
Cataliotti,J., Li,X., Spehar,B., and Szura,J. (1999) An anchoring
theory of lightness perception. Psych. Rev. 106(4), 795-834.
25) Graham, Norma. (1989) Visual Pattern Analyzers.
Oxford University Press
26) Adelson, D. (1998) On his web site www-bcs.mit.edu/people/adelson
Figure Captions
Figure 1. (a) An ordinary scene. (b) The same scene
after processing that attenuates low spatial frequencies. The scene
was processed by convolving it with the following mask:
-1 -1 -1 -1 -1
-1 -1 -1 -1 -1
-1 -1 24 -1 -1
-1 -1 -1 -1 -1
-1 -1 -1 -1 -1
Figure 2. Schematic representation of the model.
If the input image has an illuminance profile like that at the top
right, the profiles of the outputs of the next two stages will be
as plotted.
Figure 3. A more detailed representation of the model,
as explained in the text. Cell bodies are represented as circles.
The output of the lateral interaction layer is assumed to yield
signal levels above and below a resting level. In the second stage,
resting activity is represented as if cells with that activity level
contribute their signals through synapses. The "S/N" labels
are intended to indicate that the signals between cells in the filling-in
layer are attenuated so that the total signal arriving recurrently
at each cell equals the mean of the outputs of its neighbors. Linearity
is assumed in the sense that excitation is taken to be additive
and inhibition subtractive.
Figure 4. A simple input pattern for the model is
shown at the upper left and its profile (half way up the vertical
dimension) is plotted below it. The second column shows the output
of the lateral inhibitory layer. The third and fourth columns show
the output of the entire model after 200 and 1200 iterations of
the filling-in stage. Note that the lightness values and profiles
are scaled so that all output patterns and profiles have the same
range. Therefore, it is not apparent in the figure that, as iterations
proceed, the absolute values generally increase. The same scaling
in included in all subsequent figures of this kind.
Figure 5. Performance for an input pattern containing
cumulative steps of illuminance.
Figure 6. The same input pattern as in Figure 5,
but the pattern is presented for a large number of iterations and
made to shift back and forth periodically, simulating small saccadic
eye movements. The output pattern in the third column immediately
follows a movement to the left of size equaling one pixel. The fourth
column shows the result after an additional nine iterations. This
set of patterns represents a dynamic equilibrium. That is, if, as
the process is iterating this input pattern is shifted horizontally
back and forth, one shift after each ten iterations, then, after
an initial set of iterations (about 100 in the present simulation),
the output patterns in this figure will be the same for all further
iterations.
Figure 7. Simultaneous contrast and Mach Bands.
Figure 8. The input pattern in Figure 7 except that
its border has been eliminated. This result looks nice but is misleading,
as explained in the text.
Figure 9. A sinusoidal input produces a sinusoidal
output.
Figure 10. Spatial Contrast Sensitivity Function
for the lateral interaction stage. (The high frequency region has
been omitted for simplicity.)
Figure 11. Spatial Contrast Sensitivity Function
for the filling-in stage by itself after three different numbers
of iterations. As equilibrium is approached, sensitivity to zero
frequency approaches infinity.
Figure 12. Spatial Contrast Sensitivity Function
for the model as a whole for three different numbers of iterations.
At equilibrium, the attenuation of low frequencies by the lateral
interaction stage is balanced by the increasing gain of the filling-in
stage. (Note that, as is evident from the change in shape of the
curves, the rate of approach to equilibrium varies with spatial
frequency.)
Figure 13. Two patterns demonstrating a reversal
of the contrast expected from classical theories and the model as
described so far. The two gray squares in (a) and the two gray lines
in (b) have equal luminances. Pattern (a) is from DeValois and DeValois
(22) and (b) is a simplified version of one by Economou et al (23).
Figure 14 Schematic representation of one unit or
pixel in the MC+FI model. All of the plus center receptive fields
add their signals to the cell in the filling-in layer and all the
minus center fields subtract their signals (that is, inhibit).
Figure 15 The output of the MC+FI model for the pattern
shown as iterations proceed. The vertical axis is the output of
the model (and presumed brightness). The larger dots plot the output
at the center of the gray square on the left and the smaller ones
at the center of the gray square on the right. Initially the square
on the left has a higher output, as would be expected from classical
contrast theory, but after about 100 iterations, the relationship
reverses.
Figure 16 The output of the model for the checkerboard
after 300 iterations. The series of six images across the center
are the images output from each of the six channels before they
are summed, receptive field size increasing from left to right.
The blurred image at the bottom shows the output of the filling-in
stage after 300 iterations. Its profile (through the centers of
the two gray squares) is plotted at its right. The dotted vertical
lines indicate the positions of the two gray squares, and the output
values at the centers of the gray squares are displayed in the lower
right corner. See the text for a discussion of the blurring.
Figure 17 The output of the model for the gray stripes
in the pattern shown. Contrast starts as expected from classical
contrast theory but after about 100 iterations, it reverses.
Figure 18 The output of the model after 300 iterations
for the stripe pattern.
Figure 19 The performance of the model for a classical
simultaneous contrast pattern. The outputs are initially in accord
with classical contrast but they reverse after about 700 iterations.
Figure 20 The output after 300 iterations.
Figure 21 The effect of changing the number of stripes
that flank the gray stripes in the pattern of Figure 13(b), as measured
psychophysically. The vertical axis is a measure of the apparent
difference in lightnesses of the two gray stripes, and the numbers
below the bars are the numbers of flanking stripes. Values above
zero represent the classical contrast expectation. From Economou
et al (23).
Figure 22 The corresponding result for the MC+FI
model.
Figure 23 The behavior of the model for a pattern
that yields "assimilation". All gray bars have the same
luminance but the gray bars separated by white lines appear lighter
than those separated by dark lines and the model gives the same
result after about 190 iterations. (The larger dots plot the output
corresponding to the centers of the gray bars on the left, and the
smaller dots, the gray bars on the right.)
Figure 24 This pattern is a simplified version of
one by Adelson (26). The white squares within the darkened region
have the same luminance as the dark squares outside that region,
but the dark squares outside the region appear darker than the white
ones within the region. The model shows the same behavior during
the first 650 iterations.
Figure 25 In Figure 24, the number of channels is
six. In this figure, the same pattern is input but the channel with
the largest receptive field is omitted.
|