WO1990010274A1

WO1990010274A1 - Neuronal data processing network

Info

Publication number: WO1990010274A1
Application number: PCT/NL1990/000018
Authority: WO
Inventors: Gezinus Wolters; Jacob Marinus Jan Murre; Rutger Hans Phaf
Original assignee: Rijksuniversiteit Te Leiden
Priority date: 1989-02-21
Filing date: 1990-02-20
Publication date: 1990-09-07
Also published as: EP0473592A1; JPH04505815A; CA2048598A1; NL8900425A; AU5164490A

Abstract

A data processing module or neural network, comprising a plurality of nodes, as well as connections between said nodes, through which connections data weighted by a weighting factor can be transmitted between the nodes, said nodes having an activation value representative of the data received. There are provides at least three types of nodes; a first type (R) adapted to receive external data and at least two of which are present; a second type (V), one of which is always paired with a node of the first type; and a third type (A, E), at least one of which is present. Each node of the first type (R) is connected through a connection with a preferably positive weighting factor (up, low) to the associated node of the second type (V) and to the node of the third type (A, E). Each node of the second type (V) is connected through a connection with a preferably negative weighting factor (flat; cross; high) to the other nodes of the second type (V), to the nodes of the first type (R) not being paired therewith, and to the node of the third type (A, E). The node of the third type (A, E) is connected through connections with a preferably positive weighting factor (strange) to the nodes of the first type.

Description

Neuronal data processing network

This invention relates to a data processing module, and a data processing network comprising a plurality of such modules .

Recently, large-scale research has been carried out into data processing networks that can function, i.e. learn and remember, similarly to the human brain. Such networks are sometimes referred to as neuronal networks . For a further explanation of the data processing and storage in the human brain, reference can be made to F.H.C. Crick and C. Asanuma, "Certain aspects of the anatomy and physiology of the cerebral cortex", in J.L. McClelland and D.E. Rumelhart (Eds.), Parallel distributed processing: explorations in the micro- structure of cognition. Vol. 2; Cambridge MA:MIT Press, 1986. Most of the hitherto known learning networks have the major drawback that a pattern has to be presented simultaneously at the input and at the output end of the network so that the network can learn relations between these patterns . Such networks are only capable of so-called supervised learning. However, supervised learning networks have only limited possibilities; so-called unsupervised learning networks have considerably more extensive possibilities. The human brain, too, is governed by the unsupervised learning principle and is capable itself of discriminating between presented patterns and of further organizing and retaining these without the learning process requiring external supervision. Hitherto, however, only a few algorithms are known that are capable of unsupervised learning. The major algorithm is the "Adaptive Resonance Theory" (ART) described in: Carpenter, G.A. & Grossberg, S. (1987), A massively parallel architecture for a self- organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54-115. This algorithm, however, has the drawback of being highly complex and inaccessible, while moreover, it has not been fully tested. Besides, the learning process according to this algorithm deviates strongly from the human learning process, so that its learning possibilities are limited.

It is an object of the present invention to provide a network capable of both unsupervised and supervised learning and of discriminating, generalizing and retaining input stimuli consisting of patterns in a manner which is highly similar to the human learning process and the structure of the human brain. A further object is to provide such a network which is composed of a multiplicity of relatively small, substantially identical modules, each in turn having a simple structure, so that the network can have a relatively simple structure.

It is observed in this respect that a modular, supervised learning network is known per se from Chapters 2 and 5 of Vol. 1 of the above mentioned book by J.L. McClelland and D.E. Rumelhart. This known network, however, is not capable of unsupervised learning, nor does it reveal the insight that extreme modularity is highly favourable for the realization of very powerful neural networks .

To achieve the object set, the present invention provides a data processing module comprising a plurality of nodes, as well as connections between said nodes, through which connections data weighted by a weighting factor can be transmitted among the nodes and which nodes have an activation value representative of the data received, there being provided at least three types of nodes; a first type adapted to receive external data and of which at least two are present; a second type, of which there is always one which is paired with a node of the first type; and a third type, of which at least one is present, each node of the first type being connected through a connection with a weighting factor from a first class to the associated node of the second type and to the node of the third type, each node of the second type being connected through a connection with a weighting factor from a second class to the other nodes of the second type, to the nodes of the first type not paired therewith and to the node of the third type; the node of the third type being connected through connections with a weighting factor from the first class to the nodes of the first type.

Preferably, the weighting factors from the first class are all positive and the weighting factors from the second class are all negative.

The present invention also provides a data processing network comprising at least two modules according to the invention, wherein the modules are present at different levels in the network and only modules at different levels can be interconnected, while, if modules are interconnected, at least each node of the first type of one module is connected to all nodes of the first type of the other module.

Some embodiments of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:

Fig. la is a diagrammatic view of the structure of a module according to the present invention;

Fig. lb shows the module of Fig. la with the names of the various connections;

Fig. 2 is a graphic view of the activation of a node in function of the total weighted activation presented; Fig. 3 shows three modules according to the present invention with intermodular connections;

Fig. 4 is a diagrammatic explanation of the interactions within a module;

Figs. 5a-h are diagrammatic views of the response of a module to a presented input stimulus during a number of cycles; and

Fig. 6 is a diagrammatic view of an output module.

Fig. la diagrammatically shows the structure of a module according to the present invention, to be referred to hereinafter as a CALM-module (categorizing and learning module) . A CALM-module consists of a plurality of nodes, indicated in the figures with circles. As will be explained below, there are a number of different types of nodes which, however, all have a number of properties in common. Thus each node is a simple data processing element which processes a one-dimensional variable (a voltage value) , called the activation, according to a small number of invariable rules . The activation of a node may assume any value between 0 and a maximum value of M and the activations of the various nodes are exchanged through connections between the nodes, while means may be incorporated in the connections for influencing the transfer of the activations by means of a weighting factor. Such means can be implemented in a very simple manner by means of resistors. The structure of the nodes can be simple, too; e.g. they may have the structure described in J.J. Hopfield and D. . Tank: "Computing with Neural Circuits", in Science, 233, pp. 625-633 (1986) .

The effective input signal for a node i in Fig. 1, the excitation ai, is determined by the weighted sum of the activations of all separate nodes connected to the input of node i. The activation of a node i at time t+1, designated by ai(t+l), is a function of the activation at time t, designated by aι(t) and the input excitation e±. The activation of node i is given in a discrete time representation by formula 1 :

aι(t+l) = (l-k) .ai(t) + ~- . [M- (1-k) .a-_, (t) ] for

ei > 0

= (l-k) .aι(t) + 7~- . (l-k) .aι(t) for

ei < 0 N where ei = ^ wijaj (t) (1) j=l

In this formula, wi_j indicates the weighting factor of the connection between a node j and node i, while k is a constant with 0 < k < 1. It is assumed that there are N nodes that are connected to node i.

In formula 1, three components can be distinguished, each having a different function.

The first component (1-k) aι(t) gives the autonomous decrease in the activation of a node. When there is no excitation, i.e. ei = 0, the activation of node i decreases to zero at a rate determined by the magnitude of . In a continuous time representation, this decrease is exponential, which indicates why 0 < k < 1. The second component

restricts the excitation at the

input to a value between 0 and +1 or -1 for ei ≥ 0, and e± < 0, respectively.

The third component of formula 1, which in the case e ≥

0 = [l-(l-k) aι(t)] ensures that the increase in activation resulting from the excitation decreases when the activation approaches the maximum activation ai(max) . As a result, the activation ai asymptotically approaches the value ai(max) , which is shown diagrammatically in Fig. 2.

In the event of ei < 0, the third component (l-k)ai(t) ensures that the negative excitation, i.e. the inhibition, asymptotically approaches the minimum activation value ai(min) . It is indicated above that the activation of all nodes in a CALM module is determined by formula 1. However, there are also other expressions defining a variation in activation in the manner shown in Fig. 2. Such a formula is:

aι(t+l) = (l-k) .aι(t) + (l-e^"ei) [M- (1-k) .ai (t) ] for ei > 0

= (l-k) .aι(t) - (l-e^ei) . (l-k) .ai(t) for ei < 0 (2)

In this formula, the various symbols have the same meaning as in formula 1. A formula for the activation which enables pattern recognition with the module according to the present invention at a higher rate, in which formula the same symbols are again used for the same variables as in formula 1, is:

aKt+1) = (l-k) .aι(t) +

[M-(l-k) .ai] for ei > 0

= (l-k) .ai(t) + -i ie~i • (l-k) .aι(t) for

ei < 0

N where ei = ∑ ijaj (t) (3) j=l

The variation of the activation on the basis of formula 3 is shown by a dotted line in Fig. 2, it being pointed out that, as compared with formula 1, the activation is not changed for negative values of ei but, as a result of formula 3, exhibits a clearly greater non-linearity for positive values of ei.

On the basis of the manner in which nodes are connected to other nodes, which is decisive for the function of a node, a distinction can be made into four different types of nodes. These categories I-IV will be explained hereinafter.

I. This node type can form connections with nodes in other modules, which mostly are of the same type. The nodes have excitatory (positively activating) output connections . Because the activation of this type of nodes may correspond with the presence of a specific pattern of input activation signals presented to the module, these nodes are called representation nodes or R nodes for short.

II. A node of this second type is always coupled with a node of the first type to form a pair. These nodes only have inhibitory (negatively activating) output connections and suppress the activation of all nodes in the CALM, although the extent of suppression need not be the same for all nodes. This type of nodes will be called a V node for short, in which V stands for Veto.

III. The third type is a node of which only one need be present per CALM module. This node is excited by all R nodes and inhibited by all V nodes. For the sake of brevity, this type of node is called A (arousal) node. Because the A node is connected, in the manner described, to all V and R nodes, the activation of the A node in the module is a positive function of the extent of competition between the V nodes. In the CALM module, this competition is fiercest with new input patterns, so that the activation of the A node is an indirect measure for the extent of newness of an input pattern.

IV. Of the fourth type of node, too, only one need be present per CALM. This node type is called E (external) node and receives exclusively an input signal from the A node and randomly transmits activation pulses to all R nodes in the CALM module. These activation pulses are distributed uniformly over the range of values of O-a_E(t), wherein as(t) indicates the activation of the E node at time t. As will be explained more fully hereinafter, the E node is also important for the learning process in a CALM module.

Fig. la shows a CALM module 1, in which the above- described four types of nodes with their interconnections are designated by the letters likewise indicated above. There are always equal numbers of R nodes and V nodes in a module. Fig. la shows only three nodes of each type, but in principle their number is unlimited. In actual practice, however, the number of R and V nodes will seldom be higher than about 100, since networks composed of a large number of relatively small modules can operate more quickly in principle than networks consisting of a few very large modules. Although the A node and the E node are named and shown separately, because they have clearly different functions, these nodes may be combined to a single node, if desired, because the E node receives exclusively activation from the A node. Such a single node should then combine all properties of the A and the E node. As further shown in Fig. la, each V node in a module is connected through inhibitory connections to all R nodes not paired therewith, to all other V nodes and to the A node. In this example, each V node is connected also through an inhibitory connection to the paired R node, but this is not necessary.

Each R node is connected through an excitatory connection to its associated V node, as well as to the A node.

The A node is connected through an excitatory connection to the E node and finally the E node is connected through excitatory connections to all R nodes.

Although not shown in the embodiment, it is possible in principle that other types of nodes in a module than the R nodes have connections to external nodes. In Fig. lb are named the various types of excitatory and inhibitory connections. The weighting factors incorporated in all connections present in a CALM module are pre-determined and invariable. The values of the intermodular weighting factors are not limited to a given range; in an experimental set-up, values in the range of -10 to +3 were used. It will be clear that a weighting factor of +1 means, in fact, a through- connection and a weighting factor of -1 a through-connection to the inverted output of a node. It is also observed that it is in principle possible to replace all positive weighting factors by negative ones of the same value and all negative weighting factors by positive ones of the same value without affecting the essence of the operation of the module. The general influence of the types of weighting factors shown in Fig. lb will be briefly explained in the following.

Up (RV; activatory) : connects an R node to the V node being paired therewith. Down (VR 4.; inhibitory) : the reciprocal of the up weighting factor. An up factor and a down weighting factor together provide for an RV node pair to exhibit a differentiating characteristic.

Cross (VR →; inhibitory) : controls the inhibition laterally between V nodes and R nodes not paired together.

These weighting factors are usually strongly negative, so that a V node can suppress all R nodes not.paired with it.

Flatt (W; inhibitory) : controls the competition between the V nodes . Low (RA; activatory) : controls the activation of the A node by the R nodes. Many active R nodes provide for a higher activation of the A node.

High (VA; inhibitory) : controls the inhibition of the A node by the V nodes . Many active V nodes provide for a low degree of activation in the A node.

AE (activatory) : controls the influence of the activatio of the A node on the activation of the E node.

ER (activatory) : controls the influence of the random activations of the E node on the R nodes. In the following Table we specify, by way of example, th weighting factors that were implemented in the various connections within a CALM module with which the following example, given on the basis of Fig. 4 and Table 1, has been realized.

RV - 0.5; VR → = -10; W = -1; VA 0.6; RA = 0.4;

Ae - 1.0; ER = 0.5 and VR i = -1 A number of limiting conditions can be formulated for the various weighting factors on the basis of the contemplated behaviour of the CALM module. These limiting conditions will be explained below, with the abbreviation of the respective weighting factors representing the value thereof. The value of weighting factor RV is therefore indicated by RV, while the modulus of that value is indicated by | RV | . In order that the A node may perform the function contemplated we should have:

RA R < — (4)

To enable the CALM module to learn also if the A node is not activated requires:

^RV > |V | ^k M ⁽⁵>

RA k+1

Furthermore, to enable a complete solution of the competition, we need:

^RV > M.(|VR→| -²|W| - k) <⁶>

while to prevent prolonged oscillations of RV node pairs, there is required:

^{RV <} IVRΪ!¹. M <⁷> The weighting factor from a veto node to an R node not paired with it should in general be quite large for the contemplated operation of the veto node. This requires:

IVR→I»IVRiI (8) en

|VR→|»|W| (9)

while moreover formula β should be met.

The weighting factor between the V nodes themselves determines how quickly the competition between the V nodes can be solved. To render the inhibition so strong that only one V node remains, and the activation of the others decreases to zero, the minimum requirement is that

|W| > K (10)

On the other hand, the inhibition must not be so heavy that the activations of the veto nodes continue to oscillate, so that formula 6 should be met as well.

The inhibitory weighting factor between a veto node and the A node should be so high that under the proper conditions the activation of the A node can be fully reduced to zero. To that end, this weighting factor should comply with formula 5. The weighting factor from the R nodes to the A node should also meet formulae 4 and 5.

The weighting factor between the A node and the E node is mostly chosen to equal one.

The excitatory weighting factor from the E node to the R nodes, together with the AE weighting factor, determines the extent of change in the learning parameter (to be explained hereinafter) and the random activations of the R nodes. There is an interaction between these two parameters, so that when the AE weighting factor is kept at the value of 1.0, the random activation and the learning parameter is determined with the ER weighting factor.

As explained above, all weighting factors of the connections within a CALM module are pre-determined to establish the properties of this module. Likewise, all weighting factors of one given type within a module are in principle identical. It will be clear that the object of the present invention, the provision of a modular learning network, can only be achieved when also variable weighting factors are present. These variable weighting factors are included in the intermodular connections between R nodes and in the connections between input sources and the R nodes in a module.

In general, a complete connection pattern is assumed to exist between CALM modules, i.e. if there is a connection from CALM 1 to CALM 2, each R node in CALM 2 receives an input signal from all R nodes in CALM 1. However, not all CALM modules need be connected to all other CALM modules in a network. The CALM modules at one level in the network are not connected to the other CALM modules at the same level but only to the CALM modules at higher levels and, possibly, connections may also exist to CALM modules at lower levels. However, as stated above, if there are connections between two CALM modules there is a complete connection pattern.

The weighting factors in the connections between the modules are called I (inter) weighting factors. These I weighting factors are all variable within a given range of values, e.g. the range of 0-2. In general, the initial value of this weighting factor is chosen to equal the mean between the limiting values, in the example given therefore the value 1.0. If the initial values are chosen too close to the minimum or maximum values, the module will not function properly. The adjustment of the I weighting factor is effected according to a variant of the Grossberg learning rule, as described in the above mentioned adaptive resonance theory. This Grossberg rule forms an extension of the Hebb rule. According to the latter rule, an increase in a weighting factor in a connection between two nodes depends upon the correlation between the activation in the node at the beginning and in the node at the end of the connection. The Grossberg rule in addition takes into account the total additional input excitation caused by the "neighbours" of the node situated at the beginning of the connection, which also have a connection with the node situated at the end of the connection.

Fig. 3 diagrammatically shows three CALM modules, two of which are at a level I and one at a higher level II. For the sake of clarity, only the R nodes of these modules are shown, while likewise for the sake of clarity, it is assumed that each CALM module has only three R nodes. However, it will be clear that the following also applies if the CALM module at level II is connected to more CALM modules at level I, while also each of the CALM modules may contain much more than three R nodes. Finally, once again for clarity's sake, only the connections of the R nodes of the CALM modules at level I to an R node i of the CALM module at level II are shown, although, in connection with the above described condition, of a complete connection pattern, the other two R nodes of the CALM module at level II are also each connected to all R nodes of the CALM modules at level I.

The following applies to a change Δ wi_j between the time t+1 and the time t of the I weighting factor Wi_j between R node i of the CALM module at level II and R node j of a CALM module at level I:

Δ wi_j (t+1)

N = μ(t) .ai(t) . [(P-wi_j(t) ) .a_j (t)-L.wi_j(t) . ∑ w_if (t) .a_f (t) ] f=l f≠j where μ(t) = d+w_μEa_E(t) (11)

It is assumed that all other R nodes of a CALM module at level I forming the "neighbours" of R node j are designated by f, with f = [1,N] and f ≠ j, and that at level I, N R nodes are present.

In formula 11, the various symbols have the following meanings, and the significance of each symbol is as follows: ai(t) = activation of the receiving R node i at time t; aj (t) = activation of the "transmitting" R node j at time t;

P = a constant > 0, determinative of the maximum value that wi_j can assume; wi_j (t) ^*= the inter weighting factor at time t, which is always larger than zero, because there are only excitatory connections between modules;

L = a constant > 0; ^wi_f(t) = the inter weighting factor at time t with each "neighbour" R node of R node j at level I which is also connected to R node i at level II; μ(t) = the so-called Hebb parameter regulating the learning rate of the CALM module; d = the minimum value of the learning parameter, which is relatively low but higher than zero; wμ_E = a pre-established, relatively low weighting factor limiting the maximum value of μt; and a_E = the activation of the E node of the CALM at level II, where 0 < a_E ≤ 1. Because the inter weighting factor wi_j according to the example is limited to the range [0.2], we also have

0 < Wi_j(t) + Δw±_j (t+1) < 2 (12)

It is observed that the generic form of the learning rule according to formula 11 is Hebbian, because

wi_j = μ . f(ai) . g(a_j) (13) The first part of the component between square brackets in formula 11 represents the Hebbian part of the learning rule, where the activation value of the node at a beginning of a connection contributes to an increase in the weighting factor. A difference from the Hebb rule is that high weighting factors tend to limit the increase in the weighting factor. The second bracketed term forms an extension of the Hebb rule, used for the first time by Grossberg. This term is responsible for a decrease in the changes in the weighting factor. In the event of high excitation due to the "neighbour" R nodes, a decrease in the weighting factor is highly probable, in particular when wi_j(t) is also high. The effect of this component is therefore the application of an adaptive reducing scale factor to the weighting factor when the total input excitation to a node becomes too high. This may happen for instance when many modules are connected to a single module, or when high input activations are present, or when the weighting factor is too high due to prolonged learning. Besides, this component provides for the important property that the module is capable of having non-orthogonal patterns represented by different R nodes.

To understand the dynamic behaviour of a CALM module, it is useful to discriminate between the three processes that define its operation: the excitatory process, the inhibitory process and the arousal process. Although these processes are interdependent, it is possible to analyse their function separately. In fact, the module can best be understood on the basis of the interactions between these processes. Fig. 4 shows a diagram indicating these interactions.

The first mechanism, the excitatory system, formed by the R nodes in a module, is directly activated by the stimulations presented to the module. Only the R nodes are connected to nodes outside the module. Therefore, the excitatory process is stimulated either from other modules, or from the E node or from receptor nodes which convert physical stimulation (e.g. light, sound, displacement and the like) into activations. In the beginning, when nothing has been learned yet by the module, all variable weighting factors in the connections to the R nodes have equal values . When a stimulus is presented, all R nodes are activated equally. Each R node will activate its associated V node, resulting in competition between the V nodes due to the mutually inhibitory connections between these nodes. The R nodes also excite the A node and hence feed the arousal mechanism. In short, therefore, the major function of the excitatory mechanism is the activation of the two other systems. The inhibitory process, composed of the V nodes is controlled exclusively by the excitatory system. As soon as a V node becomes active, it starts to inhibit the A node, the R nodes and the other V nodes. The mutual inhibition of the V nodes provides for the competition between these nodes. The activation of the V nodes will have an oscillatory pattern so long as competition continues. With respect to the inhibition of the R nodes by the V node, a distinction should be made between the inhibition of the paired R node and the inhibition of other, non-paired R nodes. Only the latter inhibition contributes to the competition by undermining the activation sources of the competitive V nodes. This is essentially a recurrent lateral inhibition mechanism depending upon the values of the VR → weighting factors. The lower the negative VR → weighting factors, the stronger the veto effect. Together with the VR weighting factors, the W weighting factors regulate the fierceness of competition and the rate at which the CALM module provides a solution. When one R node receives just a tiny bit more input excitation than the others, the competition is usually solved after a number of cycles. The inhibition of this paired R-V pair will be smaller than the inhibition distributed over the other pairs. The competition is completely solved as soon as only one R-V pair is still being activated, while all other R-V pairs no longer have activation.

The arousal process, consisting of the A node and the E node, has at least two functions. The first is to force the module to solve the competition in case several R nodes receive the same quantity of input excitation and cannot solve the competition. By transmitting random activation pulses to the R nodes, the equilibrium is disturbed, so that one of the R-V node pairs can win the competition. In fact, a stochastic search process takes place for possible representations among the activated R nodes, while the probability that one of the R nodes is selected as a representation is a function of the relative value of the activations of that R node with respect to the activations of the other R nodes and of the random fluctuations in the data received from the E node. The second function of the arousal system is to control the learning process. As described above, the learning rate is proportional to the activation of the E node.

Roughly, the influence of the arousal mechanism on the learning process is the following: The ratio between the VA weighting factors and the RA weighting factors is selected so that when a single R-V node pair is being activated, the total input activation for the A node is negative and the A node is suppressed by this pair. When, however, more than one pair are active at the same moment, the activation of the A node will increase and random activations will be distributed over the R nodes, which activations will help to solve the competition. In general, the R nodes must be activated more strongly than the V nodes for them to be able to activate the arousal system. Due to the fact that inhibition is present exclusively between V nodes themselves, and not between R nodes, the activation of the R nodes will be stronger than that of the V nodes when more than one pair of V-R nodes have been activated. This will be the case when a new stimulus is presented to the module (assuming that prior to the presentation, the activations were practically zero) . A wave of activations now reaches various R nodes and this provides a high initial activation of the A node and hence of the E node. Owing to the differentiating properties of the R-V node pairs, and the competition between the V nodes, the activation of the pairs will oscillate violently until the competition has been solved. During the oscillations, the amplitude of the winning R-V node pair increases gradually. Finally, the oscillation will stop and the winning pair will preserve its activation at a low, stationary level, while all other R-V node pairs no longer have activation. Because both the A and the E node integrate their input signals, the activations of these nodes can attain high levels as a result of an oscillating input signal. The result is that the presentation of stimuli producing substantial competition in the module is concomitant with more learning than stimuli that can activate a single R node in a simple manner. Stimuli that were presented earlier, therefore, activate the arousal mechanism to a substantially lesser extent and for a shorter period of time. In this manner, the CALM module can discriminate between old and new stimuli. Besides, the CALM module adjusts its learning rate, so that new stimuli, requiring much learning, are learned much more quickly than old stimuli, which do not require much learning. Thus, also the disturbance of representations by the prolonged presentation of old stimuli is prevented by this mechanism. In fact, the module has two forms of learning, in the first place the active search for new representations of newly presented patterns of stimuli, concomitant with much learning, and in the second place the passive activation of representations of patterns of stimuli already presented earlier, which need not require much learning and reinforces the representations only slightly.

Apart from the learning, the major property of a CALM module is the capability of categorizing stimuli.

Categorization is effected by associating a given stimulus presented to the R nodes with a single R node in the module, which node is then said to represent the stimulus. This categorization is an autonomous processing step by the CALM module according to the present invention, which means therefore that the CALM module according to the invention is capable of unsupervised learning, while supervised learning is possible as well. Learning takes place during and subsequent to the categorization process, thereby preserving the association between the stimulus and the R node by adjusting the I weighting factors to the R node. The CALM module continues to represent patterns of stimuli newly presented and to be discriminated in the new R nodes. However, when all R nodes have been used to represent such patterns, subsequent patterns will be represented by the R node representing a pattern already presented earlier, namely the nearest pattern. The CALM module is therefore capable of discriminating patterns of stimuli, i.e. by representing comparatively strongly differing patterns by different R nodes, and is also capable of generalization, i.e. by representing relatively strongly resembling patterns by the same R node. It will be explained on the basis of Fig. 5 and Table I how the above described three mechanisms cooperate to categorize a stimulus pattern and how this process affects the value of the I weighting factor to the winning R node. Figs. 5a - h and Table 1 give an example of the manner in which a very simple input pattern is stored by a CALM module according to the present invention. To that end. Fig. 5 shows a CALM module with two R nodes 3, 4 and two V nodes 5, 6 an A node 7 and an E node 8. An input stimulus is presented to the module from two nodes 1 and 2. These nodes may be the R nodes of a CALM module at a different level, but also two external input sources producing a pattern that is representative of an externally received stimulus. It is assumed for the example that node 1 has an activation of 1.0 and node 2 an activation of 0.0. The activation of the various nodes is shown diagrammatically in Figs. 5a - h by the extent of blacking of a node; an entirely black node has an activation 1.0.

The Table shows the cycles 1 - 20 traversed by the CALM module at successive times t. Because it is not illustrative to represent all cycles in the manner of Figs. 5a - h, these figures only show cycles 0, 1 , 2 , 3, 4, 10, 12 and 20 in the respective Figs. 5a - 5h.

The Table gives the values of the activations of nodes 1 - 8 by means of the symbols a(l) - a(8), while moreover the right-hand side of the Table indicates for each cycle the value of the learning parameter μ and the variable weighting factors W3i, W32, W₄1, and ₄₂, which weighting factors are also shown in Fig. 5. Cycle 0 is not shown in the Table, since this only indicates the initial state.

The operation of the CALM module after reception of the input pattern [1.0] will now be described with reference to Figs. 5a - h.

Fig. 5a shows the initial state wherein only node 1 has an activation 1.000 and the other nodes an activation 0.000. The learning factor μ has its low initial value and the weighting factors all equal 1.000, i.e. a value intermediate the limiting values of these weighting factors, i.e. 0 and 2.

Fig. 5b. The activation has reached the row of R nodes 3 and 4 in the CALM module and has been distributed uniformly over these nodes . The weighting factors have not changed, since nodes 3, 4 were not activated yet. Fig. 4c. The wave of activations has now also reached V nodes 5 and 6, while also A node 7 has received activation from nodes 5, 6. The weighting factors have been changed slightly. The reason that the changes are small, is that the learning parameter still has its low quiescent value. The weighting factors from the activated node 1 have slightly increased, while the weighting factors from the non-activated node 2 have decreased. This increase and decrease are a direct result of the learning rule employed.

Fig. 4b. The E node has now also been activated, which has the following two important results:

1) From now on, nodes 3 and 4 receive random activations ranging between 0.000 and the activation of node E multiplied by the weighting factor in the connection from the E node to nodes 3, 4. This weighting factor is 0.5.

2) From now on, the learning parameter μ will increase, μ is linearly dependent upon the activation of the E node. The increase in μ leads to a greater change in the weighting factors, which means that learning takes place more quickly. The V nodes have an inhibitory effect on the activation of the A node. The weighting factors in the CALM module have been chosen so that when a V node and an R node paired with it and having an excitatory connection with the A node, have the same activation, the nett contribution of these two nodes to the activation of the A node is negative, so that this activation decreases.

Fig. 5e. The activations of nodes 3 and 4 now differ slightly from one another due to the random activations these nodes receive from the E node. The learning parameter μ has increased from 0.005 to 0.016. The activation of the A node has decreased as a result of the inhibitory effect of the V nodes. The weighting factors have again increased further, while the weighting factors to the respective nodes 3 and 4 may now also assume different values, because the activations of nodes 3, 4 differ from one another. These differences in weighting factors are initially minute, however, and not yet visible in the Table.

Cycles 5 - 9. The Table shows that the competition between the V nodes provides for oscillation of the activation of the A node. The value of the learning parameter remains virtually equal, while a difference is growing between the weighting factors to the respective nodes 3 and 4. The activation of node 3 seems to become dominant; if this turns out to be actually the case, this is due exclusively to the cumulative effect of the random activations from the E node.

Fig. 5f. The activation of node 3 has suddenly decreased strongly, again due to the random fluctuations in the activation which nodes 3, 4 receive from the E node. The weighting factors now start to differ strongly from each other. The weighting factor from node 1 to node 3 is the higher, which may mean that eventually node 3 will "win" after all. The weighting factor from the non-activated node 2 to node 3 strongly decreases on the other hand. Fig. 5g. The battle has now almost been decided. The activation of the R node 4 and the associated V node 6 has become very low and the weighting factor W₃₁ is clearly the higher.

Fig. 5h. Node 3 definitively forms the representation for the input pattern [1.0], the activation of nodes 4, 6 has decreased to zero, while also the activation of the A node has decreased strongly. The value of μ decreases but this decrease takes place considerably more slowly; the activation of the E node decreases only slowly, too. From now on, an equilibrium condition will slowly be established in the CALM module. Repeated presentation of the pattern [1.0], after all activations have been reset to zero, now results in the node 3 being rapidly activated more strongly. The presentation of the orthogonal pattern [0.1] will probably lead, after about 20 cycles, to a representation by node 4. When the non-orthogonal pattern [1.1] is presented, the excitations e(i) supplied to nodes 3, 4 will be the following in connection with the value assumed by the weighting factors : e (3)=e (1) xw₃₁+e (2) W32=0.5x1.166+0.5x0.822=0.994 e (4)=e (1)xw₄₁+e(2)xw₄₂=0.5x1.037+0.5x0.962=0.9995. Node 4 will therefore receive slightly more excitation in the first cycle. The learning rule is so composed that after the pattern [1.0] has been presented a few times, always the total of the weighting factors to that pattern will decrease.

To that end, it is necessary that an increase in a weighting factor (in this case from node 1 to node 3) is compensated by a greater decrease in another weighting factor (here from node 2 to node 3) . In this case, it is naturally uncertain whether the lead of 0.005 will result in node 4 winning the competition. The chance of this happening is greater than 0.5, however. Repeated presentation of the pattern [1.0] will enlarge the lead of node 4 in this respect.

Fig. 6 shows within the dotted lines a so-called output module, which, in fact, consists of a combination of a single CALM module designated in the figure by reference numeral 1 and comprising N R-V node pairs, and a modified CALM module designated by numeral 2. The modified CALM module only contains N pairs of R-V nodes and no A or E node. N-l R-V node pairs in module 1 are coupled with N-l R-V node pairs in module 2. Both module 1 and module 2 therefore have one "free" pair of R-V nodes.

Such an output module makes it possible for a plurality of simultaneous parallel activations within a CALM module to be converted into a series of responses. An output module, in contrast to the impression that the name makes, need not necessarily be provided at the output end of the network. The output module forms a kind of parallel-serial converter, wherein the probability of a production in the series depends upon the activations that can be produced in the CALM module in response to a specific presented pattern._.

Module 2 shown in the figure consists of pairs of R-V nodes. These pairs, however, can always be combined in a single node without impeding the function. Each V node inhibits again the other V nodes and the R nodes not paired with it . The free R-V node pair in module 2 is connected through an excitatory connection to the A node of CALM module 1. Each R-V node pair of CALM module 1 excites only one R-V node pair in module 2. All R nodes of coupled R-V node pairs of module 2 excite the "free" R-V node pair in module 1. So long as the competition has not yet been solved, the A node excites the "free" R-V node pair in module 2, which blocks further transmission of activations by module 2. Activations in the other node pairs in module 2 can only be produced after the activation of the A node has disappeared. As explained above, this is not the case until the competition in CALM module 1 has been solved. In that case, only one activation is therefore transmitted from module 1 to module 2, while the activated R node can provide an output signal. In its turn, module 2 activates again the "free" R-V node pair in the CALM module. This pair only produces inhibition and does not receive inhibition itself. The CALM module 1 is thereby reset, because the activations in all coupled R-V node pairs are reduced to zero. After termination of the activation of the free node pair in CALM module 1, new activations can be built up in the CALM module 1 on the basis of the weighting factors and the data supplied. On the basis of the new activations in the CALM module a new response can be generated in module 2.

Claims

A I M S

1. A data processing module comprising a plurality of nodes, as well as connections between said nodes, through which connections data weighted by a weighting factor can be transmitted among the nodes and which nodes have an activation value representative of the data received, there being provided at least three types of nodes; a first type adapted to receive external data and of which at least two are present; a second type, of which there is always one which is paired with a node of the first type; and a third type, of which at least one is present, each node of the first type being connected through a connection with a weighting factor from a first class to the associated node of the second type and to the node of the third type, each node of the second type being connected through a connection with a weighting factor from a second class to the other nodes of the second type, to the nodes of the first type not paired therewith and to the node of the third type; the node of the third type being connected through connections with a weighting factor from the first class to the nodes of the first type.

2. A data processing module as claimed in claim 1, characterized in that each node of the second type is connected through a connection with a weighting factor from the second class to the node of the first type paired with it.

3. A data processing module as claimed in claim 1 or 2, characterized in that the weighting factors from the first class are all positive and that the weighting factors from the second class are all negative.

4. A data processing module as claimed in claim 3, characterized in that the node of the third type comprises two subnodes, the first of which receives the weighted data from the nodes of the first and the second type and is connected through a connection with a positive weighting factor to the second subnode, which is connected through connections with a positive weighting factor to the nodes of the first type, the second subnode being arranged to transmit to the nodes of the first type at any point of time and in a random manner, data representative of a random activation value in a range of activation values ranging from 0.0 to the maximum activation value of the second subnote at such point of time.

5. A data processing module as claimed in any of the preceding claims, characterized in that the activation of the nodes at any point of time is determined by the weighted data received by each node and by the activation already present prior to that point of time.

6. A data processing module as claimed in claim 5, characterized in that each node has predetermined maximum and minimum activation values and that in the event of an increase in the absolute value of the sum of the weighted data which a node receives, the activation of that node approaches the maximum and minimum activation values asymptotically.

7. A data processing module as claimed in claim 5, characterized in that each node has predetermined maximum and minimum activation values and that in the event of an increase in the sum of the weighted data which a node receives, the activation of that node first approaches the maximum activation value through a curve exhibiting at least a local maximum value and subsequently decreases again, while in the event of a negative increase in the sum of the weighted data which a node receives, the activation of that node approaches the minimum activation value asymptotically.

8. A data processing module as claimed in any of the preceding claims, characterized in that the weighting factors in all connections between all nodes within a module are predetermined.

9. A data processing module as claimed in claim 8, characterized in that the weighting factors have been chosen in such a manner that with a jointly activated single pair of nodes of the first type and the second type, the joint influence of the data transmitted from this pair of nodes diminishes the activation of the third node.

10. A data processing module as claimed in any one of claims 1 - 9, characterized in that the module includes one pair of nodes of the first and the second type which is connected only through output connections with a weighting factor from the second class to other nodes of the first and second type in the module and to the node of the third type; that this module is coupled with a submodule having exclusively pairs of nodes of the first and the second type, the number of which is at least equal to the number of pairs of nodes in the module, each pair of nodes of the module, with the exception of said one -pair, being coupled through connections with a weighting factor from the first class, with a pair of nodes of the submodule, said one pair in the module receiving data from all pairs of the submodule through connections with a weighting factor from the first class, and wherein one pair of nodes in the submodule not connected to a pair of nodes in the module receives data from the node of the third type in the module, through a connection with a positive weighting factor.

11. A data processing network comprising at least two modules as claimed in any of claims 1 - 9, characterized in that the modules are situated at different levels in the network and that only modules at different levels can be interconnected and that, if modules are interconnected, always at least each node of the first type of one module is connected to all nodes of the first type of the other module.

12. A data processing network as claimed in claim 11, characterized in that the data is transmitted among modules in weighted fashion by means of a variable weighting factor w, and that the change Δ in a weighting factor wi_j between a node j of the first type of one module and a node i of the other module between a point of time t+1 and a point of time t satisfies the following equation: Δ wi_j (t+i) = μ(t) .aι(t) . [ (P-wi_j(t) ) .a_j (t)-L.wij (t) .

2 w_lf(t) .a_f(t) ] f≠j

where μ(t) = d+wμj-a_E(t), and

wherein μ, P en L are constants, ai, a_f and a_j the activation of nodes i, f and j, respectively, at the point of time t and wherein Σ wi_f(t) .a_f(t) is the sum of the weighted data of all f≠j other nodes of the first type which are connected to node i.