TITLE
PARALLEL DISTRIBUTED PROCESSING NETWORK CHARACTERIZED
BY AN INFORMATION STORAGE MATRIX
FIELD OF THE INVENTION
The present invention relates to a parallel distributed
processing network wherein the connection weights are defined by an [N x N] information storage matrix [A] that satisfies the matrix
equation
[A] [T] = [T] [A] ( 1) where IA] is an [N x N] diagonal matrix the elements of which are the eigenvalues of the matrix [A] and [T] is an [N x N] similarity
transformation matrix whose columns are formed of some
predetermined number M of target vectors (where M <= N) and whose remaining columns are formed of some predetermined number Q of slack vectors (where Q = N - M), both of which together comprise the eigenvectors of [A].
BACKGROUND OF THE INVENTION
Parallel distributed processing networks (also popularly known by the term "neural networks") have been shown to be useful for solving large classes of complex problems in analog fashion. They are a class of highly parallel computational circuits with a plurality of linear and non-linear amplifiers having transfer functions that define input-output relations arranged in a network that connects the output of each amplifier to the input of some or all the amplifiers. Such networks may be implemented in hardware (either in discrete or integrated form) or by simulation using traditional von Neumann architecture digital computer.
Such networks are believed to be more suitable for certain types of problems than a traditional von Neumann architecture digital computer. Exemplary of the classes of problems with which parallel distributed processing networks have been used are associative memory, classification applications, feature extraction, pattern recognition, and logfc circuit realization. These applications are often found in systems designed to perform process control, and signal and/or data processing. For example, copending application Serial Number 07/316,717, filed February 28, 1989 (ED-0373) and assigned to the assignee of the present invention, discloses an apparatus and method for controlling a process using a trained parallel distributed processing network. There are numerous examples of parallel distributed processing networks described in the prior art used to solve problems in the areas listed above. Two of the frequently used parallel distributed processing network architectures are described by the Hopfield algorithm (J.J. Hopfield, "Neurons with graded response have collective computational properties like those of two-state neurons," Proceeding National Academy of Science, USA Vol. 81, pages 3088- 3092, May 1984, Biophysics) and the back propagation algorithm (for example, see Rumelhart, Hinton, and Williams "Learning Internal Representations by Error Propagation," Parallel Distributed Processing Explorations in the Micro structure of Cognition Volume I,
Foundations, Rumelhart and McClelland editors, MIT Press,
Cambridge, Massachusetts (1986)).
It has been found convenient to conceptualize a parallel distributed processing network in terms of an N-dimensional vector space having a topology comprising one or more localized energy minimum or equilibrium points surrounded by basins, to which the network operation will gravitate when presented with an unknown input. Moreover, since matrix mathematics has been demonstrated to accurately predict the characteristics of n-dimensional vectors in other areas, it has also been found convenient to characterize such
parallel distributed processing networks and to analyze their behavior using traditional matrix techniques.
Both the Hopfield and the back propagation networks are designed using algorithms that share the following principles:
(1) Based on the desired output quantities (often referred to as desired output, or target, vectors), network operation is such that for some (or any) input code (input vector) the network will produce one of the target vectors; (2) The network may be characterized by a linear operator [A], which is a matrix with constant coefficients, and a nonlinear thresholding device denoted by υ(.). The coefficients of the matrix [A] determine the connection weights between the amplifiers in the network, and υ(.) represents a synaptic action at the output or input of each amplifier.
The essential problem in designing such a parallel distributed processing network is to find a linear operator [A] such that for some (or any) input vector Xin, [A] arid υ(.) will produce some desired output Xo, that is:
[A] Xin -> Xo (2) where the operation by υ(.) is implicitly assumed. -o-O-o-
The Hopfield and back propagation models derive the matrix operator [A] by using different techniques. In the Hopfield algorithm the operator [A] essentially results from the sum of matrices created through the outer product operation on desired output, or target, vectors. That is, if Xo1, Xo2, . . . Xon are the desired target vectors then, A = Xo1Xto1 + Xo2Xt o2 + . . . + XonXt on (3)
where Xt oi is the transpose of Xoi, and XoiXt o i denotes an outer product for i=1, . . ., n. The operator [A] is then modified by placing zeroes along the diagonal. Once such a matrix operator [A] is created, then for an input vector Xin the iterative procedure can be used to obtain a desired output Xoi, i.e.,
(A] Xin =X1
[A] X1 = X2 (4)
[A] Xk = Xoi where again the operation by υ(.) is implicitly assumed. Unfortunately, this algorithm, because of the way it is structured, may converge to a result that is not a desired target vector. An additional limitation of this algorithm is that when the network behaves as an associative memory for a given input, it will converge to a stable state only when that input is close (in Hamming distance) to the stable state. This is one of the serious limitations of the Hopfield net Furthermore, there is very poor control over the speed and the way in which results converge. This is because the coefficients in matrix [A] (the connection weights between the amplifiers in the parallel distributed processing network) are "firmly" determined by output vectors and the outer product operation, i.e., there is no flexibility in restructuring [A]. Note, the outer product operation always makes [A] to be symmetric.
The associative memory network disclosed in Hopfield, United States Patent 4,660,166 (Hopfield), uses a interconnection scheme that connects each amplifier output to the input of all other amplifiers except itself. In the Hopfield network as disclosed in this last mentioned patent, the connectivity matrix characterizing the
connection weights has to be symmetric, and the diagonal elements need to be equal to zero. (Note that Figure 2 of the Hopfield paper referenced above is believed to contain an error in which the output of
an amplifier is connected back to its input. Figure 2 of the last- referenced patent is believed to correctly depict the interconnection scheme of the Hopfield paper.) The Hopfield network has provided the basis for various
applications. For example, see United States Patent 4,719,591
(Hopfield and Tank), where the network is applied to the problem of decomposition of signals into component signals. United States
Patents 4,731,747 and 4.737,929 (both to Denker), improve the Hopfield network by adjusting the time constants of the amplifiers to control the speed of convergence, by using negative gain amplifiers that possess a single output, and by using a clipped connection matrix having only two values which permits the construction of the network with fewer leads.
United States Patent 4,752,906 (Kleinfeld), overcomes the deficiency of the Hopfield network of not being able to provide temporal association by using delay elements in the output which are fed back to an input interconnection network. United States Patent 4,755,963 (Denker, Howard, and Jackel) extends the range of problems solvable by the Hopfield network.
The back propagation algorithm results in a multi layer feed forward network that uses a performance criteria in order to evaluate A (minimizing error at the output by adjusting coefficients in A). This technique produces good results but, unfortunately, is computationally intensive. This implies a long time for learning to converge. The back propagation network requires considerable time training or learning the information to be stored. Many techniques have been developed to reduce the training time. See, for example, copending application Serial Number 07/285,534, filed December 16, 1988 (ED-0367) and assigned to the assignee of the present invention, which relates to the use of stiff differential equations in training the back propagation network.
SUMMARY OF THE INVENTION
The present invention relates to a parallel distributed
processing network comprising a plurality of amplifiers, or nodes, connected in a single layer, with each amplifier having an input and an output. The output of each of the nodes is connected to the inputs of some or of all of the other nodes in the network (including being fed back into itself) by a respective line having a predetermined
connection weight. The connection weights are defined by an [NxN] matrix [A], termed the "Information storage matrix", wherein the element Ai.j of the information storage matrix [A] is the connection weight between the j-th input node and the i-th output node. In accordance with the present invention the information storage matrix [A] satisfies the matrix equation
[A] [T] = [T] [A] (1). The matrix [T] is an [N x N] matrix, termed the "similarity transformation matrix", the columns of which are formed from a predetermined number (M) of [N x 1] target vectors plus a
predetermined number (Q) of [N x 1] arbitrary, or "slack", vectors. Each target vector represents one of the outputs of the parallel distributed processing network. The value of M can be < = N, and Q = (N - M). Preferably, each of the vectors in the similarity
transformation matrix is linearly independent of all other of the vectors in that matrix. Each of the vectors in the similarity
transformation matrix may or may not be orthogonal to all other of the vectors in that matrix.
If the matrix [T] is nonsingular so that the matrix [T]-1 exists, the information storage matrix [A] is defined as the matrix product [A] = [T] [A] [T] -1 (5).
The matrix [A] is an [N x N] diagonal matrix, each element along the diagonal corresponding to a predetermined one of the target or the slack vectors. The relative value of each element along the diagonal of the [A] matrix corresponds to the rate of convergence of the outputs of the parallel distributed processing network toward the corresponding target vector. In general, the values of the elements of the diagonal matrix corresponding to the target vectors are preferably larger than the values of the elements of the diagonal matrix
corresponding to the slack vectors. More specifically, the elements of the diagonal matrix corresponding to the target vectors have an absolute value greater than one while the values of the elements of the diagonal matrix corresponding to the slack vectors have an absolute value less than one. The network in accordance with the present Invention provides some advantages over the networks discussed hereinbefore.
The information storage matrix [A] is more general, i.e., it does not have to be symmetric, or closely symmetric, and it does not require the diagonal elements equal to zero as in the Hopfield network. This means that the hardware realization is also more general. The cognitive behavior of the information storage matrix is more easily understood than the prior art. When an input vector is presented to the network and the network converges to a solution which is not a desired or targeted vector, a cognitive solution has been reached, which is, in general, a linear combination of target vectors.
The inclusion of the arbitrary vectors in the formation of the similarity transformation matrix allows flexibility in molding basins for the target vectors. The presence of such slack vectors is not found in the Hopfield algorithm.
The inclusion of the matrix [A] as one of the factors in forming the information storage matrix is also a feature not present in the Hopfield network. The speed of convergence to a target solution is controllable by the selection of the values of the [A] matrix.
In addition, the computation of the components of the
information storage matrix is faster and more efficient then
computation of the connectivity matrix using the back propagation algorithm. This is because the back propagation algorithm utilizes a generalized delta-rule for determining the connectivity matrix. This rule, however, is at least an order of magnitude computationally more intensive than the numerical techniques used for the information storage matrix.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be more fully understood from the following detailed description thereof, taken in connection with the
accompanying "drawings, which form a part of this application and in which:
Figure 1 is a generalized schematic diagram of a portion of a parallel distributed processing network the connection weights of which are characterized by the components of an information storage matrix in accordance with the present invention;
Figure 2A is a schematic diagram of a given amplifier, including the feedback and biasing resistors, corresponding to an element in the information storage matrix having a value greater than zero;
Figure 2B is a schematic diagram of a given amplifier, including the feedback and biasing resistors, corresponding to an element in the information storage matrix having a value less than zero;
Figure 3 is a schematic diagram of a nonlinear thresholding amplifier implementing the synaptic action of the function υ(.); and
Figure 4 is a schematic diagram of a parallel distributed
processing network in accordance with the present invention used in Example II herein.
The Appendix, which forms part of this application, comprises pages A-1 through A-6 and is a listing, in Fortran language, of a program for implementing a parallel distributed processing network in accordance with the present invention on a traditional von
Neumann architecture digital computer. The listing implements the network used in Example II herein.
DETAILED DESCRIPTION OF THE INVENTION
Throughout the following detailed description similar reference numerals refer to similar elements in all Figures of the drawings.
The parallel distributed processing network in accordance with the present invention will first be discussed in terms of its underlying theory and mathematical basis, after which schematic diagrams of various implementations thereof will be presented. Thereafter, several examples of the operation of the parallel distributed processing network in accordance with the present invention will be given.
-o-O-o-
As noted earlier it has been found convenient to conceptualize a parallel distributed processing network in terms of an N-dimensional vector space. Such a space has a topology comprising one or more localized equilibrium points each surrounded by a basin to which the network operation will gravitate when presented with an unknown input. The input is usually presented to the network in the form of a digital code, comprised of N binary digits, usually with values of 1 and -1, respectively. The N dimensional space would
accommodate 2N possible input codes.
The network in accordance with the present invention may be characterized using an [N X N] matrix, hereafter termed the
"information storage matrix" that specifies the connection weights between the amplifiers implementing the parallel distributed
processing network. Because of the operational symmetricity inherent when using the information storage matrix only one-half of the 2N possible input codes are distinct. The other codes [2N -1 in number) are complementary.
In general, the Information storage matrix [A] is the [NXN] matrix that satisfies the matrix equation:
[A] IT] = [T] [A] (1 )
Equation (1) defines an eigenvalue problem in which each λ, that is, each element in toe [A] matrix is an eigenvalue and each column vector in the similarity transform matrix [T] is the associated
eigenvector. Equation (1) can have up to n distinct solution pairs.
This matrix equation may be solved using Gaussian Elimination techniques or the Delta Rule. When [T]-1 exists the information storage matrix [A] is formed by the matrix product:
[A] = [T] [L] [T]- 1 (5). The matrix [T] is termed the "similarity transformation matrix" and is an [NXN] matrix the columns of which are formed from a predetermined number (M) of [N x 1] target vectors. Each target vector takes the form of one of the 2N possible codes able to be accommodated by the N dimensional space representing the network. Each target vector represents one of the desired outputs, or targets, of the parallel distributed processing network. Each target vector contains information that is desired to be stored in some fashion and retrieved at some time in the future and thus the set
(X1, X2 XM) of M =N target vectors forms the information basis of the network. Preferably each target vector in the set is linearly independent from the other target vectors and any vector Xi in N- dimensional space can be thus expressed as the linear combination of the set of target vectors. In this event, the inverse of the similarity transformation matrix [T] exists. Some or all of the M target vectors may or may not be orthogonal to each other, if desired.
The number M of target vectors may be less than the number N, the dimension of the information storage matrix [A]. If less than N number of target vectors are specified (that is, M<N), the remainder of the similarity transformation matrix [T] is completed by a
predetermined number Q of [N x 1] arbitrary, or slack, vectors, (where Q = N - M).
The slack vectors are fictitious from the storage point of view since they do not require the data format characteristic of target vectors. However, it turns out that in most applications the slack vectors are important. The elements of the slack vectors should be selected such that the slack vectors do not describe one of the possible codes of the network. For example, if in a typical case the target vectors are each represented as a digital string n-bits long (that is, composed of the binary digits 1 and -1, i.e., [1-11. . . -1-11], forming a slack vector from the same binary digits would suppress the corresponding code. Accordingly, a slack vector should be formed of digits that clearly distinguish it from any of the 2N possible target vectors. In general, a slack vector may be formed with one (or more) of its elements having a fractional value, a zero value and/or a positive or negative integer values.
The slack vectors are important in that they assist in contouring the topology and shaping the basins of the N-dimensional space corresponding to the network.
In sum, dependent upon the value of the number M, the target vectors may form all, or part, of the information storage spectrum of the matrix [A]. If less than N target vectors are specified, then the remaining vectors in the transform matrix are arbitrary or slack vectors. In each instance the vectors in the similarity transformation matrix [T] form the geometric spectrum of the information storage matrix [A] (i.e., they are the eigenvectors of [A]). The [A] matrix is an [NXN] diagonal matrix that represents the collection of all eigenvalues of the information storage matrix [A] and is
known as the algebraic spectra of [A]. Each element of the [A] matrix corresponds to a respective one of the target or slack vectors. The values assigned to the elements of the [A] matrix determine the convergence properties of the network. The freedom in selecting the values of the [A] matrix Implies that the speed of the network can be controlled. Thus, the time required for the network to reach a decision or to converge to a target after initialization can be controlled by the appropriate selection of the values of the [A] matrix. The values assigned to the elements of the [A] matrix have an impact in the network of the present invention. If a preassigned λi>1, then the corresponding eigenvector Ti (which contains a desired output information) will determine an asymptote in the N-dimensional information space that will motivate the occurrence of desired event. If a preassigned λi<1, then the corresponding eigenvector Ti will determine an asymptote in the N-dimensional information space that will suppress the occurrence of event. If a preassigned λi>>1, then the network will converge quickly to the corresponding target vector, approximating the feed-forward action of a back propagation network.
Preferably, the values assigned to the elements of the [A] matrix corresponding to the target vectors are greater than the values of the elements of the [A] matrix corresponding to the slack vectors. More specifically, the elements of the diagonal matrix corresponding to the target vectors have an absolute value greater than one while the values of the elements of the diagonal matrix corresponding to the slack vectors have an absolute value less than one.
In sum, through assigning λi'S and selecting slack vectors speed of network convergence and flexibility in shaping the basins associated with fixed equilibrium points is respectively obtained.
To evaluate the information storage matrix [A] two methods can be used. These methods are either the Gaussian elimination method or the Delta-rule method.
The evaluation of the information storage matrix [A] using the
Gaussian elimination method will be first addressed. Let X1, X2,...,
XM, ZM+1 ZN be the basis in RN, (i.e., Xi's represent target vectors and Zi's are slack vectors), where RN represents an N-dimensional real vector space. Using this basis construct a similarity
transformation matrix [T] = [X 1, X2, . . ., XM, ZM+1, . . ., ZN ]. To it associate the diagonal matrix
[A] =
that contains eigenvalues predetermined for each element in the basis. Next, form the matrix equation
[A] [T] = [T] [A] ( 1 ) and solve it for [A] using the Gaussian elimination method. It is more convenient to solve the problem
[T]t [A]t = [A] [T]t (6) since in this transposed version of Equation (1) matrix coefficients fall out in the natural form with respect to the coefficients of [A]. Note, Equation (1) or Equation (6) produces N2 linearly coupled equations which determine N2 coefficients of [A].
A second method is the Delta-rule method. Here a set of linear equations is formed:
AX1 = λ1 X1
A XM = λM XM (7)
A ZM+ 1= λM +1 λ1 ZM+ 1
A ZN = λN ZN in which λi'S are predetermined eigenvalues. Now applying the Deltarule in an iterative fashion to the system of linear Equations (7) [A] can be evaluated. The Delta Rule is discussed in W. P. Jones and J.
Hoskins, "Back Propagation". Byte, pages 155-162, October 1987,
Comparing the two methods It is found that the Gaussian elimination technique is faster than the Delta-Rule. If the inverse of the IT] matrix exists, the information storage matrix may be found by finding the matrix product of Equation (5).
-o-O-o-
Figure 1 is a generalized schematic diagram of a portion of a parallel distributed processing network in accordance with the present invention. The network, generally indicated by the reference character 10, includes a plurality of amplifiers, or nodes, 12
connected in a single layer 14. The network 10 Includes N amplifiers 12-1 through 12-N, where N corresponds to the dimension of the information storage matrix [A] derived as discussed earlier. In Figure 1 only four of the amplifiers 12 are shown, namely the first amplifier 12-1, the i-th and the j-th amplifiers 12-i and 12-j, respectively, and the last amplifier 12-N. The interconnection of the other of the N amplifiers comprising the network 10 is readily apparent from the drawing of Figure 1. By way of further explanation Figure 4 is a schematic diagram that illustrates a specific parallel distributed processing network 10 used in Example II to follow where N is equal to 4, and is provided only to illustrate a fully interconnected network 10. The specific values of the resistors used in the network shown in Figure 4 are also shown.
Each amplifier 12 has an inverting input port 16, a noninverting input port 18, and an output port 20. The output port 20 of each amplifier 12 is connected to the inverting input port 16 thereof by a
line containing a feedback resistor 22. In addition the output port 20 of each amplifier 12 is applied to a squasher 26 which implements the thresholding nonlinearity or synaptic squashing υ(.) discussed earlier. The detailed diagram of the squasher 26 is shown in Figure 3.
The signal at the output port 20 of each of the N amplifiers 12, after, the same has been operated upon by the squasher 26, is
connected to either the inverting input port 16 or the noninverting input port 18, as will be explained, of some or all of the other amplifiers 12 in the network 10 (including itself) by a connection line 30. The interconnection of the output of any given amplifier to the input of another amplifier is determined by the value of the
corresponding element in the information storage matrix [A], with a element value of zero indicating no connection, a positive element value indicating connection to the noninverting input, and a negative element value indicating connection to the inverting input.
Each connection line 30 Contains a connectivity resistor 34, which is also subscripted by the same variables i, j, denoting that the given subscripted connectivity resistor 34 is connected in the line 30 that connects the j-th input to the i-th output amplifier. The
connectivity resistor 34 defines the connection weight of the line 30 between the j-th input to the i-th output amplifier. The value of the connectivity resistor 34 is related to the corresponding subscripted variable in the information storage matrix, as will be understood from the discussion that follows.
Each of the lines 30 also includes a delay element 38 which has a predetermined signal delay time associated therewith which is provided to permit the time sequence of each iteration needed to implement the iterative action (mathematically defined in Equation (4)) by which the output state of the given amplifier 12 is reached. The same subscripted variable scheme as applied to the connection lines and their resistors applies to the delay lines. As noted earlier, the values assigned to the eigenvalue λ in the (A] matrix corresponds
to the time (or the number of iterations) required for the network 10 to settle to a decision.
The manner in which the values of the connectivity resistors 34 are realized, as derived from the corresponding elements of the information storage matrix [A], and the gains of the amplifiers 12 in the network 10 may now be discussed with reference to Figures 2A and 2B.
An input vector applied to the network 10 takes the form:
The information storage matrix [A], when evaluated in the manner earlier discussed, takes the following form:
where each element Aij of the information storage matrix [A] is either a positive or a negative real constant, or zero:
When an element Aij of the information storage matrix [A] is positive, the relationship between the value of that element Aij and the values of the feedback resistor 22 (RF) and the connectivity resistor 34 (Rij) may be understood from Figure 2A. The line 30 in which the resistor 34 (Rij) is connected to the noninverting input port 16 of the amplifier 12 and the gain of the amplifier 12 is given by the following: eo / Xj = [ 1 + ( RF / Rij ) ] = I Aij I (8) where eo is the voltage of the output signal at the output port 20 of the amplifier 12-j. Typically the values of the feedback resistors 22 (RF ) for the entire network is fixed at a predetermined constant value, and the values of the connectivity resistors Rij may be readily determined from Equation (8).
When an element Ay of the Information storage matrix [A] is negative, the relationship between the value of that element Aij and the values of the feedback resistor 22 (RF) and the connectivity resistor 34 (Rij) may be understood from Figure 2B. The line 30 in which the resistor 34ij is, in this case, connected to the inverting input port 18 of the amplifier 12 and the gain of the amplifier 12 is given by the following: e0 / Xj RF / Rij = - I Aij I (9) where eo is the voltage of the output signal at the output port 20 of the amplifier 12. Again, with the values of the feedback resistors 22
(RF ) for the entire network fixed at the predetermined constant value, the values of the connectivity resistors Rij may be readily determined from Equation (9). It should be noted that if the element Aij of the information storage matrix is between zero and 1 then one can use hardware or software techniques to eliminate difficulties in its realization. For example, a software technique would require adjusting coefficients so that the value of Ay becomes greater than 1. A hardware technique would cascade two inverting amplifiers to provide a positive value in the region specified.
When an element Aij of the information storage matrix [A] is zero, there is no connection between the j-th input node and the i-th output node.
Figure 3 shows a schematic diagram of the nonlinear
thresholding device, or squasher that generates the function υ(.). The squasher 26 defines a network that limits the value of the output of the node 12 to a range defined between a predetermined upper limit and a predetermined lower limit. The upper and the lower limits are, tuypically, +1 and -1, respectively.
It should be understood that the network 10 illustrated in the schematic diagrams of Figures 1 to 4 may be physically implemented in any convenient format. That is to say, it lies within the
contemplation of this invention that the network 10 be realized in an electronic hardware implementation, an optical hardware
implementation, or a combination of both. The electronic hardware implementation may be effected by interconnecting the components thereof using discrete analog devices and/or amplifiers, resistors, delay elements such as capacitors or RC networks; or by the
interconnection of integrated circuit elements; or by integrating the entire network using any integrated circuit fabrication technology on a suitable substrate diagrammatically indicated in the Figures by the character S. In addition the network may be realized using a general purpose digital computer, such as a Hewlett Packard Vectra, a Digital Equipment VAX or a Cray X-MP, operating in accordance with a program. In this regard the Appendix contains a listing, in Fortran language, whereby the network 10 may be realized on a Digital
Equipment VAX. The listing implements the network used in
Example II herein.
EXAMPLES
The operation of the parallel distributed processing network of the present invention will now be discussed in connection with the following Examples I and II.
Example I
Example I is an example of the use of the parallel distributed processing network as a Classifier. A big corporation has a need to collect and process a personal data file of its constituents. The data collected reflects the following personal profile:
and is entered into computer files every time a new member joins the corporation. For a new, married, not previously divorced, college graduate.male member of the corporation without children the entry would take the form:
Name:
John Doe
Not divorced/ Member: Male/Female: Single/Married: Divorced:
No kids/kids: College/No college:
Thus, each member has a 6-bit code that describes the personal profile associated with her/his name. The name and code are entered jointly into the data file. The "member" entry is included to account for the symmetric operation of the network 10 characterized by the information storage matrix.
This corporation has thousands of constituents and requires a fast parallel distributed processing network that will classify members according to the information given in a profile code.
Suppose the corporation wants to know the names of all members that fall within the following interest groups: (1) male and single;
(2) female and single; and (3) female and married. These three interest groups are reflected in the following table:
CLASSIFICATION 1 : MALE AND SINGLE
where "dc" represents "don't care"
In addition, one may also desire classifications presented below:
CLASSIFICATION 2:
These four classifications now can be used to generate target vectors. By examining the MALE status in all four classifications a target vector is obtained.
- member
- male
- single
X1 - divorced
- no kids
- with college
Similarly, two more target vectors can be derived by examining the FEMALE status in all four classifications:
- member
- female
- single
- divorced
- no kids
no college member
- female
- married
- not divorced
- kids
- college
Clearly there are three target vectors and the information dimension is six. Thus three more slack vectors are needed. Let
i-th position in the vector
define a standard basis set in Rn. Thus,
Now select Z4=e4, Z5=e5, and Z6=e6 to be the three additional slack vectors and form the similarity transformation matrix
T1= [X1, X2, X3, Z4, Z5, Z6]
Also select the following diagonal matrix
T
1 and A will produce the information storage matrix. This matrix, when executed against all possible codes (i.e., 32 distinct elements in the code considered), will produce four basins illustrated in Table 1. The first basin in Table 1 shows all elements of the code that converge to target X
1. Similarly, the third and fourth basins are responsible for X
2 and X
3. Each code falling in these target basins will increment a suitable counter. However, the second basin is responsible for the cognitive solution C=[1, 1,-1, 1-1, 1]
t that is not of our interest (i.e., C gives classification for member, male, married, not divorced, with kids, with college).
Whenever the code associated with a member's name converges to X1, the member's name is entered into a class of male and single. Similarly, if the code converges to X2 or X3, the name is entered into female and single or female and married respectively. But when the code converges to C, the name is ignored. The arrows in Table 1 indicate common informations within each basin. Thus T1 and A are used to design an information storage matrix for a parallel distributed processing network that is executing the function of CLASSIFICATION 1.
Furthermore, if [A] is used with the following similarity transformation:
T2= [X1, X2, X3, Z4, Z5, Z6] , where Z4=e3, Z5=e5 and Z6=e6 T3= [X1, X2, X3, Z4, Z5, Z6] , where Z4=e3, z5=e4 and Z6=e6
T4= [X1, X2, X3, Z4, Z5, Z6] , where Z4=e3, z5=e4 and Z6=e5 then Tables 2, 3, and 4 are obtained. These tables, respectively, produce CLASSIFICATIONS 2, 3, and 4.
Next suppose that one requires to obtain the information illustrated in CLASSIFICATIONS 5, 6, and 7 below:
CLASSIFICATION 5
then the same target vectors and [A] can be used. The new similarity transformations are:
T5=[X1,X2,X3,Z4,Z5,Z6], where Z4=e2, Z5=e4 and Z6=e5
T6=[X1,X2,X3,Z4,Z5,Z6], where Z4=e2, Z5=e3 and Z6=es T7=[X1,X2,X3,Z4,Z5,Z6], where Z4=e2, Z5=e3 and Z6=e4 The basin results are illustrated in Tables 5, 6, and 7.
Example II
Logic circuit realization: Consider a 4-bit symmetric complementary code such that:
Next, we want to design a logic circuit in which whenever C=D, the state X1=[1 ,1,-1 ,-1]t is executed, and whenever C does not equal D, the state X2=[1 ,-1 ,1 ,-1 ]t is obtained. To do this consider a similarity transformation
T= [X1,X2, Z3, Z4] , where Z3 = e1, Z4 = e2 and
These matrices produce a connectivity pattern given by the following information storage matrix
which when iterated and "squashed", as prescribed in the Figures, will produce basins for targets X
1 and X
2 given below:
basin for X
1 basin for X
2
Thus the logic is performed and realized through the analog circuit. Observe that in this network there are no cognitive solutions.
The schematic diagram for a parallel distributed processing network used in this Example II is shown in Figure 4. Thus
[Y] = [A] [X]
so Y1 = - . 5X1 -2 .5X4
Y2 = - . 5X2 -2 .5X3
Y3 = 2X3
Y1 = 2X4
The values of the resistors (assuming a 1K feedback resistor) are derived using Equations (8) and (9). The Appendix, containing pages A-1 through A-6, is a Fortran listing implementing the network shown in Figure 4 on a Digital Equipment VAX computer.
Those skilled in the art, having the benefit of the teachings of the present invention may impart numerous modifications thereto. It should be understood, however, that such modifications lie within the contemplation of the present invention, as defined by the appended claims.
c
c------------------------------------------------------------------------------ c A fortran program that implements the Information
c Storage Matrix (ISM) algorithm.
c
c
c
c
c n - the dimens ion of the Network, N of the text
c
c Array declarations:
c
c Array a(n,n) corresponds to matrix A of the text.
c Array t(n,n) corresponds to matrix T of the text.
c Array tinv(n,n) is the inverse matrix of T.
c Array lamb(n,n) corresponds to matrix Lambda of the text. c Array xcode(n) corresponds to matrix X of the text.
c Array xout(n) corresponds to matrix Y of the text.
c
c Arrays temp(n,n) is a temporary storage array that is used c in the program but has no corresponding matrix in the text . c
c-------------------------------------------------------------------- c
integer n
c
c------------------------------------------------------------ c n - the dimension of the Network, N of the text
c------------------------------------------------------------ c
parameter (n=4)
real a(n,n)
real t(n,n)
real tinv(n,n)
real lamb(n,n)
real xcode(n)
real temp(n,n)
real xout(n)
real nflt
external mrrrr
c------------------------------------------------------------ c Read in the diagonal eigenvalue matrix, lamb,
c Write out the eigenvalues to unit 4.
c------------------------------------------------------------ c
read (3,*) (lamb(i,i),i=l,n)
write (4,1000) (lamb(i,i),i-l,n)
1000 format(' Lambda -',4f7.2)
write (4,1)
1 format (/)
c------------------------------------------------------------ c
c Read in target and slack vectors
c and store as columns of matrix t.
c Write out the target and slack vectors to unit 4.
c------------------------------------------------------------ c
do 20 j=l, n
read (3,*) (t(i, j),i=l,n)
20 continue
write (4,21) (t(i,1),i=l,n)
write (4,22) (t(i,2),i=l,n)
write (4,23) (t(i,3),i=l,n)
write (4,24) (t(i,4),i=l,n)
21 format( 'Target vector 1: ',4f7.2)
22 format ( ' Target vector 2: ',4f7.2)
23 format('Slack vector 1: ',4f7.2)
24 format(' Slack vector 2: ',4f7.2)
write (4,1)
do 40 i=l,n
write (4,401) (t(i,j),j=l,n)
401 format(' T: ',4f7.2)
40 continue
write (4,1)
c
c------------------------------------------------------------- c Calculate inverse of t using IMSL routine, linrg.
c Store inverse in tinv.
c Compute A = t. lamb.tinv using the IMSL matrix multiplier c routine, mrrrr.
c
c Write out A to unit 4.
c------------------------------------------------------------ c
call linrg(n,t,n,tinv,n)
call mrrrr(n,n,lamb,n,
1 n,n,tinv,n,
2 n,n,temp,n)
call mrrrr(n,n,t,n,
1 n,n,temp,n,
2 n,n,a,n)
write (4,*) 'A = T. Lambda.T-l'
write (4,1)
do 50 i=l,n
write (4,2000) (a(i,j),j-l,n)
2000 format('A: \4f7.2)
50 continue
write (4,1)
c
c------------------------------------------------------------ c Assign ncode, the number of members of the complementary c code, Cn.
c Read in an element of the code.
c Write out code element to unit 4.
c Copy the element to a temporary array xdyn(n).
c------------------------------------------------------------ c
ncode = 2**(n-l)
do 100 i=0,ncode-l
read ( 3, *) (xcode(k),k=l,n)
write (4,*) 'Code element:'
write (4,3000) (xcode(jj),jj=l,n)
3000 format(4f5.1)
c
c------------------------------------------------------------ c Apply matrix A to the code element,
c----------------------------------------------------------- c
do 150 j=l,n
xsum - t - 0
do 160 kl=l , n
xsum = xsum + a(j ,kl)*xcode(kl) 1-60 continue
c------------------------------------------------------------- c
c Apply the squashing function c------------------------------------------------------------ c
if (xsum.gt. 1.0) xsυm = 1.0 if (xsum.lt.-1.0) xsum = -1.0 xout(j) = xsum
150 continue
.write (4,3000) (xout(jj),jj=l,n) write (4,1)
100 continue
stop
end
2. 0 2.0 -0.5 -0.5
1. 0 1.0 -1.0 -1.0
1 .0 -1.0 1.0 -1.0
1 .0 0.0 0.0 0.0
0 .0 1.0 0.0 0.0
1.0 -1.0 -1.0 -1.0 INPUT FILE
1.0 -1.0 -1.0 1.0
1.0 -1.0 1.0 -1.0
1.0 -1.0 1.0 1.0
1.0 1.0 -1.0 -1.0
1.0 1.0 -1.0 1.0
1.0 1.0 1.0 -1.0
1.0 1.0 11.0 1.0
1.0 -1.0 -1.0 -1.0
1.0 -1.0 -1.0 1.0
Lambda * 2.00 2.00 -0.50 -0.50
Target vector 1: 1.00 1.00 -1.00 -1.00 OUTPUT FILE Target vector 2: 1.00 -1.00 1.00 -1.00
Slack vector 1: 1.00 0.00 0.00 0.00
Slack vector 2: 0.00 1.00 0.00 0.00
T: 1.00 1.00 1.00 0.00
T: 1.00 -1.00 0.00 1.00
T: -1.00 1.00 0.00 0.00
T: -1.00 -1.00 0.00 0.00
A = T. Lambda . T-l
A: -0.50 0.00 0.00 -2.50
A: 0.00 -0.50 -2.50 0.00
A: 0.00 0.00 2.00 0.00
A: 0.00 0.00 0.00 2.00
Code element :
1.0 -1.0 -1.0 -1.0
1.0 1.0 -1.0 -1.0 Basin of Target 1
Code element :
1.0 -1.0 -1.0 1.0
-1.0 1.0 -1.0 1.0 Basin of Complement of Target 2
Code element :
1.0 -1.0 1.0 -1.0
1.0 -1.0 1.0 -1.0 Basin of Target 2
Code element:
1.0 -1.0 1.0 1.0
-1.0 -1.0 1.0 1.0 Basin of Complement of Target 1
Code element :
1.0 1.0 -1.0 -1.0
1.0 1.0 -1.0 -1.0 Basin of Target 1
Code element :
1.0 1.0 -1.0 1.0
-1.0 1.0 -1.0 1.0 Basin of Complement of Target l
Code element :
1.0 1.0 1.0 -1.0
1.0 -1.0 1.0 -1.0 Basin of Target 2
Code element:
1.0 1.0 1.0 1.0
-1.0 -1.0 1.0 1.0 Complimentary Basin of Target