CA2008902C

CA2008902C - Reconfigurable signal processor

Info

Publication number: CA2008902C
Application number: CA002008902A
Authority: CA
Inventors: Allen Louis Gorin; Patrick Anthony Makofsky; Nancy Morton; Neal Conrad Oliver; Richard Robert Shively; Christopher Anthony Stanziola
Original assignee: American Telephone and Telegraph Co Inc
Current assignee: AT&T Corp
Priority date: 1989-03-31
Filing date: 1990-01-30
Publication date: 1994-05-31
Anticipated expiration: 2010-01-30
Also published as: GB2262174A; CA2008902A1; GB9301712D0; GB2262173A; US5020059A; GB2231985A; JP2647227B2; GB9301713D0; GB2262175A; JPH02287668A; GB9301714D0; GB2231985B; GB9006712D0; GB2262173B

Abstract

RECONFIGURABLE SIGNAL PROCESSOR
Abstract An interconnection scheme among the processing elements ("PEs") of a multiprocessor computing architecture realizes, through PE reconfiguration, bothfault tolerance and a wide variety of different processing topologies including binary trees and linear systolic arrays. By using a novel variant on a tree expansion scheme, the invention also allows for arbitrary up-sizing of the PE count to build virtually any size of tree network, with each size exhibiting the same high degree of fault tolerance and reconfigurability.
The invention may be practiced with 4-port PEs arrayed in a module comprising a 4X4 board-mounted PE lattice. Each PE has four physical ports, which connect to the similar ports of its lattice neighbors. Each PE has an internal capability to be configured to route signals to or from ally of its neighbors. Thus, for tree topologies, any of the four neighbors of a given PE may be selected as the parent of the given PE; and any or all of the remaining three neighboring PEs may be selected as the child(ren) PEs.
The PE ports are configured under the control of a remote Host, which establishes an initial desired PE topology. The operability of the PEs is tested, and information on faulty PEs or communications paths is used to enable or disable nodes as necessary by revising the PE port configurations. The nodes thus are reorganized and can run or continue running, on a degraded basis.

Description

2~

RECONFIGU3RABLlE SIGNAL PROCESSVR

Field o~ the InYention This invention relates to concurrent computer architectures; and particularly to realizing a generic capability for a fault-tolerant and reconfigurable 5 multiprocessor computer scalable to thousands of processor elements.
` Back~round of the ~vention Concurrent computer architectures are configurations of processors under a common control interconnected to achieve parallel processing of information. Processors arrayed in linear strings, sometimes termed "systolic"
10 architectures, are an increasingly important example of concurrent architectures.
Another such architecture is the binary tree, in which the nodes are arranged in levels ~, beginnin~ with a single root and extend to two, four, eight, etc. computing nodes at '~ successive levels.
Pattern recognition is one class of problem to which parallel processing 15 is especially applicable. Pattern recognition is the comparison of an unknown signal pattern to a set of reference patterns to find a best match. Applications include speech recognition, speaker recognition, sh~pe recognition of imaged objects, and identification of sonar or radar sources.
One requirement of multiprocessor architectures important to the 20 solution of pattern recognition and other problems, is scalabili~ of the hardware and the programming environment. Scalability refers to use of the same individual PEs, board-level modules, operating system and programming methodology even as machine sizes grow to tens of thousands of nodes.
Although scalability has been achieved in machines adapted to pattern 25 recognition, its practical realization especially in larger machines has been limited by a lack of tolerance to faults exhibited by the relatively fixed PE lattice structures il heretofore used. Schemes in the prior art which supply fault tolerance by adding redundant processing elements and elaborate switching details to disconnect a failed PE and substitute a spare, are expensive and take up space.
3û If fault tolerance and scalability can bei achieved, however, parallel processing offers real-time execution speed even as the problem size increases. For example, a GigaFLOP (one billion flnating point operations per second) or more of processing can be required to achieve real-time execution of large-vocabulary speech recognition apparatus. Pattern recognition for future speech recognition algoritbms will easily require 100 to 1000 times greater throughput. In general,,.

2~

pattern recognition for higher bandwidth signals, such as imagery, will require a TeraFLOP (one trillion floating point operations per second). Fault-tolerant, scalable, pi~allel computing machines having hundreds or thousands of PEs, offer a potentially attractive choice of solution.
S A proper~ related to scale, is fast execution of communications between a Host computer and the PE array. PE configurations assembled as a binary tree, for example, have ~he advantageous property that if the nurnber of PEs in the tree array are doubled, the layers through which communications must pass, increase only byone. This property, known as logalithmic communications radius, is des;rable for10 large-scale PE arrays, since it adds the least additional process time for initiating and synchronizing communications between the Host and the PEs. Scalability is servedby devising a single, basic PE port configuration as well as a basic module of board-mounted PEs, to realize any arbitrary number of PEs in an array. This feature iscritical to controlling the manufacturing cost and to systematically increasing the 15 capacity of small parallel processing machines. Prior art aIrangements of high count PE configurations have not met this need, however, and further, have tended to increase the installation size and pin-ou~ count for backplane connections.
TeraFLOP capacities requiring many thousands of PEs in a single system, also currently are prohibitively expensive if realiæd in the inflexible and 20 permanent ha~d-wired topologies of the current art. Additionally, fault tolerance in conventional hard-wired PE arrays has been limited heretofore, because the PE
interconnecdon relationships are relatively determined by the wiring. For this same reason, hiard-wired PE arrays are not generally reconfigurable.
Ob.jects o~ the Invention Accordingly one object of the invention is to increase the fault tolerance of concu~ent computer architectures.
Another object of the invention is to pennit the reconfiguration of a concurrent computer architecture into any of a multiplicity of processing node topologies.
A further object of the invention ;s to achieve the foregoing objects without appreciably adding to backplane bus connections.
A further object of the invention is to provide in a concurrent computer architecture an intercomlected lattice array of processing elements which under software conhol can be reconfigured to utilize virtually every functional m~de despite 35 faults in multiple other nodes.

39~ `
. - 3-' A further object of the invention is to achieve greater scalability in parallel processor architectures.
Summary of th~ Invention This invention contemplates the use of a unique interconnection scheme S among the PEs of a multiprocessor computing architecture; and means utilizing the unique interconnections for realizing, tlhrough PE reconfiguration, both fault toleirance and a wide variety of dtfferent overall topologies including binary trees and linear systolic arrays. The reconfigurability realizable pursuant to this aspect of the invention, allows many alternative PE network topologies to be grown or s 10 embedded in a PE lattice having identified PE or inter-PE connection faults. Further, under the control of the same fault- identification and route around routine, the invention detects and compensates for faults occurring during operation.
J In a specific illustrative embodiment, the invention is realiæd through use of 4-p~rt PlEs arrayed in a square 4X4 rectangular lattice which constitutes a 15 basic 16-PE module. Each PE has four physical ports, which connect to the similar ports of its respcctive neighbors. For tree topologies, any of the four neighbors of a given PE may be selected as the parent of the given PE; and any or all of the remaining three neighboring PEs may be selected as the child(ren) PEs.
Typically, three of the four ports of a given PE are assigned to connect 20 to adjacent PEs, these being the parent and two children of the given PE. Theaggregate of unused fourth ports of each PE allow the PE lattice to be reconfigured to effect a large number of changes in parent- child relationships. Reconfiguration bypasses identified faults in a given topology. Reconfiguration also creates different computing node topologies.
The functionality of ~he ports of each PE, which define the neighbor relations, may be controlled by instruc~ions from an exterior source, such as a Host computer. The process for routing among ports within each PE may be software-defined.
By using a novel valiant on a tree expansion scheme, the invention 30 allows for virtually arbitrary up-sizing of the PE count to build virtually any siæ of tree network, with each size exhibiting the same high degree of fault tolerance and reconfigurability.
In a particular aspect, the invention provides predeterrnined node fault correction rowtines to specific node faults. This inventive feature is particularly 35 applicable to single-pr~gram-multiple-data binary tree (in which a given PE
communicates only with its patent and has from zero to two children), and single-; . " . , , , ., ,, ,. . .. , ,",, . " , . . , ~ . . .

2~89~2 ` threaded linear systolic arrays. For these topologies, the invention teaches mapping or imprinting a software-de~lned and controlled topology onto the fixed hardware and avoiding identified faults, without an expensive commitment of software overhead and run-time loss.
The same software controlled routing and the same physical interconnection pattern of the sub-networks as described so far, can be used to realize fault-tolerant "systolic" networks. These networks in their simplest form are realized with a linear string of PEs, with each PE requiring a single input and a single output port. The 4-port PE lattice provides extensive flexibility in realizing the linear systolic structure in any of a number of serpentine patterns, by routing around defective PEs.
The 4-port PE, together with the llexibility in routing, also allows the systolic structure to be expanded at particular stages of processing when the processing tasks require.
Advantageously, by building the modules with redundant backplane busses, and interconnecting each of the redundant backplane busses ~rom one module to a different module, a network of modules is created that enables routing around an entire module which has failed.
In accordance with one aspect of the invention there is provided a process for synthesizing a desired node interconnection topology in an assembly of processing elements under control of a Host, in which each element comprises plural signal communication ports, said elements are physically arrayed in X-Y matrices on one or more mounting means, each element of each said matrix other than elementslocated at the matrrx corners is interconnected to four neighbor PEs, each corner element is connected to its respective two neighbor elements and has two external connection paths, and said assembly comprises means for effecting signal routing within each element between its said processing capability and any of said plural ports, said process comprising: defining in said ~Iost a desired processor element intra-board port interconnection topology for each board, testing under Host control said elements for faults, determining in said Host alternate processor element port interconnections which route signals around elements identified as faulted; and reconfigurating selected ~nes of said ports based on said alternate processor element port interconnections.

-4a- 2008902 In accordance with another aspect of the invention there is provided - apparatus for expanding a tree multiprocessor topology while maintaining a constant .; number of root connection paths to said topology, and a constant number of expansion ;~ nodes, comprising: first and second arrays of substantially identical processor elements, '~3 5 each element having plural ports, means for selectively connecting ports of adjacent c ones of all but two of said elements in each said array, to form in each array a two-'~3 root subtree of processor elements, said two elements not used in said subtrees each comprising three-port expansion nodes, said two roots and said three-port expansion ~, nodes thereby furnishing eight connection paths to each said array, means for connecting the subtree and a first expansion node in said first array to the -3~ corresponding parts of said second array, thereby forming a further two-root subtree, -~ the second said expansion node of each said array being available to replace elements 3 in its respective array, and said two roots of said further subtree and said last-named ,3 nodes thereby comprising a total of eight connection paths to the combined J 15 assemblages of said first and second processor element arrays.
The invention and its further objects, features, and advantages will be '~ further elucidated in the detained description to follow and in the DRAWING, in which:
`~ Brief Description of the Drawing FIG. 1 is a block diagram of a conventional process for signal pattern reCognition;
3 FIG. 2 is a block diagram of a conver,tional expansion scheme for a tree machine;
FIG. 3 is a block diagram of a ~ault-tolerant tree expansion scheme;
FIG. 4 is a block diagram of a PE array of 16 elements, FIG. 5 is a block diagram of 16-element board module showing external and internal connections;
FIG. 6 is a schematic diagram of a PE board of the invention, with a specific set of external leads;
FIG. 7 is a schematic diagram showing options for interconnecting 16-element PE boards;

. 5 ~89~Z

FIG. 8 is a block diagram showing functionalities of each PE node;
FM. 9 is a block diagram of particular PPs on a board, interconnected in a tree struct~re with exemplary external linkages;
FIG. 10 is a high-level block diagram of a multiprocessor system ' 5 connected to a Host;
lFIG. 11 is a schematic diagrarn illustrating the data paths resulting from a particular path configuration at a node;
FIG. 12 is a bloek diagram illustrating the bussing for routing node configuration commands from a Host to multiple boards;
~; 10 FIG. 13 is a flow chart depicting the process ~or initializing the PE
` arrays; and FIG. 16, consisting of FIGs. 14 and 15, portrays in block diagr~n form, the creating of PE lattices within a PE board by configuring ports.
13etailed De~scription of an Illustrative Embodiment The present invention's applications will be more readily appreciated by first considering certain prior art. Accordingly, FIG. 1 depicts a simplified conventional process for signal pa$tern recognidon, in which an individual unknown pattern 1 and a specific known pattern 2 from a library or set of known patterns are compared according to some instruction set in unit 3, a similarity function generator.
20 Unit 3 develops a cornparison measure, for example, a distance or probability scvre, which is transmitted to a filter 4. A decision rule is implemented by filter 4, for example selecting that reference pattern which is of rninimum distance to the unknown, the result of which is a pattern classification output 5.
The unit 3 rnay, for example, be a binary tree machine. These vary in 25 si~e or "depth", depending on the problem complexity.One method for expanding the size of a binary tree machine, known as a "Leiserson Expansion", involves repeated use of identical 4-lead modules, as depicted in FIG.2. Two such modules, denotedlOa, lOb, illostrate the method. Each consist of a subtree comprising a root 1 la, 1 lb which are "parents" respectively of "children" 12a, 13a in module lOa; and 3 30 "children" 12b, 13b in module lOb. These children in turn are parents to further PF,s.
Included in each module lOa, lOb is an expansion PE denoted 14a, 14b, each having three ports 15a, 16a, 17a; and lSb, 16b, 17b respectively. The ports 18a, 18b respectively leading from the root Pli lla, llb, of each subtree, constitute the fourth port oP each idendcal module.

39~;~

Two such modules may be interconnected in such a way that the four-port cs)nvention ~or the resulting combination of the modules lOa, lOb, is maintained. As shown in F~G. 2, ehe subtree root port 18b of module lOb is connected to po~t 17b of the expansion PE 14b; and the subtree root port 18a of '`3 5 module lOa is connected to the port 15b of the exparlsion PE 14b. The resultant ~wo-board system has a 15-PE tree with root PE 14b and an expansion PE 14a. Thiscombination can now be interconnected to further identical modules through portslSa, 16a, 17a of expansion PE 14a of module lOa; and ~e port 16b of expansion PEof module lOb, the latter port becoming the "new root" for module lOa. The . 10 resultant network again comprises a subtree of PEs, plus one PE--namely PE 14a of module lûa, which is available for expansion.
Since the resultant network a~ter the interconneceion is equivalent in number of ports to the network in the individual modules lOa, lOb prior to interconnection, the illustrated interconnection scheme may be applied iteratively 15 with ever-increasing module sizes.
3 The preceding scheme is not, however, operationally practical for machines comprising thousands of PEs because it is not sufflciently fault-tolerant.
For example, if the subtree PE 11 b were to fail, all childrsn would be disconnected.
Thus, failure of a single component could disproportionately reduce the available 20 number of PEs.
A Fault Tolerant Expansion Scheme The present invention, inter alia, provides a mechanism for achieving bina~y tree expansion, which retains the requisite constant number of basic module ports while alss) providing a substantial degree of tolerance to PE ~aults.
The basic fault-toleran~ PE module is depicted in FIG. 3. Two such modules, 20a, 20b illustrate the principle. Each cont~uns a subtree 21a, 21b respec~fully, with each of the la~ter served by two busses 22a, 23a; and 22b, 23b. The subtrees 21a, 21b may each consist of a multiplicity of PE modules, for example,realized by the configuration shown in FIG. 4. Either of the two busses serving each 30 subtree may be selected as the root bus of that subtree.
In accordance with the invention, each of the modules 20a, 20b include two expans;on PEs, denoted 24a, 25a; and 24b, 25b respectively. Each expansion PE
has three ports labeled as follows: for PE 24a- ports 26a, 27a, 28a; for PE 24b- ports 26b, 27b, 28b; for PE 25a- ports 29a, 3ûa, 31a; and for PE 25b- ports 29b, 30b, 3 lb.

, . ". . .. . .... ~, , ~.. ,, . , , " . .,, j, . . . ~ . .. . . .

7 2~ 2 Each of the expansis)n PEs thus has three off-board ports available ~or expansion. The connection paths to/from eacll module 20a, 20b, therefore total eight, compared to four in the non-fault- eolerant scheme of the prior art.
In accordance with the invention, the two modules 20a, 20b may be S interconnected in a way that retains at eight the number of extemal connection paths.
, One illustrative way, shown in FIG. 3, is to effect the following connections of f subtree busses and PE ports: bus æa to PE port 28b; bus 23a to PE port 26a; bus 22b ~; to PE port 28a; and bus 23b to PE port 26b. The result is that the combination of the two 8- port rnodules 20a, 20b retains the eight-lead connection convention. Ports 27a 10 and 27b of the PEs 24a, 24b, become the "new roots"; and the ports 29a, 30a, 31a of ' spare PE 25a eogether with the ports 29b, 30b, 31b of spare PE 25b, constitute the ` eight external interconnections.
The resultant composite network is a subtree with ei~her of two busses 27a, 27b selectable ~or use as the root bus, and with hVO PEs 25a, 2~b available ~or 15 expansion. A failure of hardware associa~ed with either of the candidate root busses '3 27a, 27b may be overcome simply by selecting the alternative root as the operational rootbus.
It will be apparent hereina~ter in connection with FIG.S, that by selectively configuring ports of PEs in an afray such as an x-y matri~c, spare PEs can 20 be created and then integrated into either subtree PE topology.
Fault-Tolerant Lattice of PEs As shown in PIG. 5, the PE lattice at the board level, denoted 40, :! advantageously (although not necessa~ly) is an extended four-by-four rectangular ray of 4-port PEs, totaling sixteen for each board. Each PE is denoted by row and column call-outs 1.1.. 4.4. Each of the four intemal PEs is connected to each of its I four neighbors. Thus, P~ 2.2 is connected to PEs 1.2, 2.3, 3.2, and 2.1. Each of the `i four PEs 1.1, 1.4, 4.1, and 4.4 at the corners of the array 40, are connected to ~heir ' respective two neighboring PEs, such as PEs 2.1 and 1.2 in ~he case of corner PE 1.1.
,~ :Further, each of the four corner PEs have two ports through which dle `~ 30 board 40 can be connected to additional modules or to a Host. These external ports are denoted a, b for PE 4.1; c, d for PE 1.1; e, f for PE 1.4; and g, h fi)r PE 4.4.
The remaining eight PEs are pa;red by four interconnection paths, denoted j, k, l, and m, respectively to interconnect the P~ pairs 1.2 with 4.3, 1.3 with 4.2, 2.1 with 3.4, and 3.1 with 2.4.

' .

~ -8-- I The expansion PEs 25a and 25b illustrated in FIG. 3, colrespond tO ~he r, PEs 1.1 and 1.4 in FIG. 5. The roots of the subtree coTrespond to PEs 4.1 and 4.4; so that busses 22a and 23a of module 20a in FIG. 3 correspond to busses a and h of . FIG. 5.
Growing Minimum-Depth Binary Tree in a Two-Fault Lafflce Using the concepts illustrated, a variety of alternate routing options between PEs are available in each motlule. A basic 4-by-4 PE boar(l- level array is depicted schematically in FIG. 5, where the PlEs are given the call-outs 1.1....4.4.
The depicted a~ray is similar to that of FIG. 4, where in addition, nomenclature is ~l 10 within each PE symbol to represent the l'depths", or layer distances, of various PEs from the root PE of the array.
In the example illus~rated in FIG. 4, the two PEs denoted 2.3 and 3.1, are assumed to have faults; and the requirement is to "grow" a tree using the remaining PEs of the a~ray. Farther, however, it is desired to create a binary tree 15 having a minimum depth between the PE chosen as the root and the rnost remote~-~ "leaf".
~i First, a boundary PE is selected as the root; in this illustration, PE 4.1.
Then, in a manner described in detail hereinafter, a message is issued to each of the PE leaves, instructing each leaf to send a message to the (at most three) non-parent 20 neighboring PEs, interrogating whether they are ava~lable as potential children.
If a given neighbor PE already has a parent, a code is generated causing the PE vvhich sent the interrogation to not adopt that PE as a child. If a givenneighkor PE is detected as one that has a fault, or does noe respond to the interrogation, a code is generated causing the PE which sent the interrogation to not 25 adopt that PE as a child.
If a given neighbor PE has received only one request for adoption, a code is generated causing the sending P~ to adopt tha~ PE as a child. Finally, if a given neighbor PE has received more than one request for adoption, then a code is generated causing that PE to randomly select one of the parent candidates as its30 parent.
The resalting set of leaves are those most recent adoptees, plus any previous leaves that in the a~ove process were not able to adop~ any children. The tree thus conligured consists of: PE~ 4.1 as the root and first level; second- level children of root PE 4.1 consisting of: ~E 4.2; third level children of the second-level 35 leaves consisting of: PEs 3.2 and 4.3; and fourth-level child~en consisting of: PEs 2.2, 3.3, and 4.4, etc. The stl~cture extends to six levels, as shown in FIG. 4; and .

~0~8~2 g :..
bypasses the faulty PEs 3.1 and 2.3.
The process for initializing the PE alTay9 which includes orienting the ports of each PE to serve as parent/child or child/parent paths, is also depicted in '~ FIG. 13.
;~ 5 This arrangement's tolerance to PE ~aults is highly advantageous. First, the arrangememt is tolerant to a double fault of the type illostrated. It is also tolerant to a double fault if one of the fault-pair is "internal"; or if the pair complises a root and an expansion PE. Even if both roots fail, one ~f the expansion PEs can be configured as a root, still providing fault tolerance.
Refer~ing again to FIG. 3, if both of the expansion PEs 25a, 25b fail, a degree of fault tolerance is still pr~vided: ~here will always be one board whose ;'l expansion PE~ is not utilized. If, for example, the ~ault-pair comprises bo~h expansion PEs on one board such as PEs 24b, 25b, a fault avoidance strategy is to choose that board whose expansion PE is not required.
Of course, if there are no fault~, then the expansion P~s can all be subsumed into the subtree modules 21a, 21b, thereby utilizing all available i processors. The process depicted in FIG. 13 also illustrates this feature.
3 l he virtual tree machine shown in FIG. 4 is skewed, in that the leaves are of variable depths. Also, the struchlre is n~ purely binary, because the branching 20 factor at eaeh leaf varies between one and ~hree. The impact of these conditions on toeal execution time is, however, negligible. Lrnportantly, the machine retains a logarithmic communications radius, and uses identical and scale-inviarian~ modules to ~grow~. Once realized in hardware, the machine a1so offeIs the necessary small iand constant pin-out from the PEs i~d submodules.
Fault-Tolerance for Systolic Arrays A conventional systolic topology can be configured in a variety of ways in the 16-element array of FIG. 5. An exempL~y sequence is to enter the array at PE
1.1, then proceed through PE~s 1.2, 1.3, 1.4, 2.4, 2.3, 2.2, 2.1, 3.1, 3.2, 3.3~ 3.4, 4.4, 4.3 and 4.2, ex;ting at PE 4.1.
A systolic topology can also be implanted within the interconnected PE
configuration of FIG. 5, despite a single fault condi~on at any one of the 16 PEs. For exiample, given a fault at the c~rner PE 1.1, ia systolic array may be g~own by comlecting the remaining 15 PEs in the ~ollowing serial sequence: entering the array at PE 4.1, then pr~ceeding through P~s 4.2, 4.3, 4.4, 3.4, 3.3, 3.2, 3.1, 2.1, 2.2, 1.2, 1.3, 2.3 and 2.4, and exiting at PE 1.4.
' .

- l o -An inspection of FIG. S w;ll also reveal that a 15-element systolic :~ configuration essentially the mirror image of the just-described systolic configuration, can be grown if the single fault occurs at any of the other corner PEs 4.1, 4.4, or 1.4. A fault at PE 4.1, for exarnple is removed by entering the array at PE

5 1.1, then proceeding through PEs 1.2, 2.2, 2.1, 3.1, 3.2, 4.2, 4.3, 3.3, 2.3, 1.3 ,1.4, ~ 2.4, 3.4, and exiting at PE 4.4. The geo m etries of the Pl~ interconnection paths which 3 compensate for a single fault at any of the corner PEs, are congruent.
Consider next a fault at PE 1.2, which along with PEs 1.3, 4.2 and 4.3 are interior to the corner PEs and located on the north and south sides of the array 10 periphery. The ~ault- compensating path is selected by entering at PE 1.1, then prcNceeding through PEs 2.1, 2.2, 2.3, 1.3, 1.4, 2.4, 3.4, 3.3, 3.2, 3.1, 4.1, 4.2, 4.3, and e:citing at PE 4.4. Again, an inspection of FIG. S will demonstrate that a path of this same geometry will also compensate for a single fault at any of the PEs 1.3, 4.2 and 4.3.
Consider now a fault at PE 2.1, which along with PEs 3.1, 2.4 and 3.4 are also interior to the corner PEi and located on the east and west sides of the array periphery. The fault-compensating path is selected by entering at PE 1.1, then ` -connecting through PEs 1.2, 2.2, 3.2, 3.1, 4.1, 4.2, 4.3, 3.3, 2.3, 1.3, 1.4, 2.4, 3.4, and exiting at PE 4.4. A path of this same geometry will also compensate for a single fault at any of the PEs 3.1, 2.4, and 3.4.
Finally, consider a fault at the interio~ PE 2.2, one of four PEs including P ~s 2.3, 3.2 and 3.3 which in the 4 X 4 aTray of the illustratiYe embodiment are not on the periphery of the alTay. The fault-compensating path is selected by entering at PE 4.1, then proceeding through PEs 4.2, 4.3, 4.4, 3.4, 3.3, 3.2, 3.1, 2.1, 1.1, 1.2, 1.3, ~.3, 2.4, and exiting at P~ 1.4. Again, a path of this same geomet~y will also compensate for a single fault at any of the P13s 2.3, 3.2 and 3.3.
The significance of the symmetries just described, is that for systolic ~rchitectures, only four one-fault PE reconfiguration patterns are needed to cover the possible 16 cases of single faults. Each yields full usei of the 15 functioning PEs.
Other specific reconfiguration paths of serially connected P~s may be created besides the examples elucidated above to correct for a single PE fault in a systolic array.
It may be generalized that the ~ault tolerance of the uTay depicted in FIG. 5 is realized în a 4X4 L~ray of PE~s by providing two separate accesses (for 35 entering or exiting the array) at each of the four comer Pl~s. One means f~r realizing this capabili~ in a physical embodiment such as depicted in FIG. 5, is to tie the metallic leads together in the backplane in the manner shown in FIG. 6. With leads b and c tied to common lead y, as shown, a path through lead y is affo~ded to bothcorner PEs 1.1 and 4.1. With leads a and d tied to common lead x, as shown, a path through lead x is affor~d to the same two corner PEs 1.1 and 4.1.
S These redundant paths into and out of a PE board also make possible the bypassing of an entire PE board in a multi- board arrangement. FIG. 7 illustrates boards 40, 40a, 40b and 40c. Adjacent boards are connected by tying the x-leads created as described with respect to FIG. 6. Boards once removed from each otherare connected by tying the y-leads. A catastrophic failure of entire board 40a, for 10 example, is bypassed by the connection denoted y-prime, which enables board 40 to gain access to board 40b through either the latter's PE 1.1 or 4.1. These are the same 3 PEs which were connected to failed board 40a.
As seen in FIG. 5, buffers 45, 45a, respectively control the direction of s signals between lead b and the connection path for PEs 1.1,1.2; and, similarly, the dLrection of si~gnals between lead g and the connection path for PEs 1.47 2.4. Each buffer 45, 45a is a conventional tri-stable, bi-directional device available commercially. Under the direction of an external control means such as a Host 3 computer, buf~er 45 can assume a first state in which lead b furnishes only an input signal to the module 40. In a second state, the lead b furnishes only an output signal 20 from module 40. In its third state, the circuit is open and no signals pass to or from lead b. The operation of buffer 45a is identical to that of buffer 45. The function provided by the buffers 45, 45a, is to add connection opdons to and from module 40. -If, for example, PE 1.1 is Imable to pass signals fromJto either leads c or d, access ~o module 40 which bypasses PE 1.1 is afforded by lead b through switch 45a, placed25 in one of its transmission states. Buf~ers 45, 45a are included in this realization to provide the required variety of access ports to the lattice while limiting the number of connections to the backplane. The buffers provide more altemative topologies in routin~g around defective processor elements, and also lead to efficiencies in backplane connections.

30 General Hlardware and Control ~or Fault lRecovery and lFteconliguration A variety of control means can be envisioned to configure and reconfigure ~e PE~s in accordance with the invention. An illustrative description of the structure of each node, which enables the process con~ol set forth in FIG. 13 to be exercised, will first be presented.

2~0~39 Each processing node, denoted 50 in FIG. 8, consists of a digital signal processor 51, a memory 52 external to processor 51, and a configuration network 53.
~; Processor S l may advantageously comprise a di.gital signal processor chip available from American Telephone and Telegraph Company under the name "DSP32C".
S Memory 52 may comprise eight 64K X 4 static RAM chips, each being a CY7C196~ device available from Cypress Semiconduc~or Corporation. Network 53 may - com.prise, for example, an application-specific integrated circuit chip m.anufactured ~,1, by AT&T as a semi-custom .9 micron twin-tub CMOS device disposed in a 224-pin grid array.
-`~ 10 The functionalities provided by Network 53 include a crosspoint switch . array 54, an input FIFO 74, an output FIFO 75, and four neighbor-ports denoted - ~ north, south9 east, west. Data signals comrnunicated to or from DSP51 during the run of .~. application program are ~outed to the neighbor-ports through the input -'. FIFO 74 and output FIFO 75.
Tk.e four ports of each node 50, are designated N, E, S, and W. Each of the four ports is provided with a node-parallel interface ("NPI"), labeled 61, 62, 63, and 64 respectively, as seen in FIG. 8. Data and header information between neighboring nodes, that is, the nodes 1.1 - 4.4 shown in FIG. 5, axe comm.unicated through t}2.e NPIs. Part cular data/header comm.unication linkages between nod.es are 20 set up to establish a desired topology. FIG. 9 illustrates a boald 40 consisting (unlike the nodes of FIG. 4) of all functioning nodes 1.1 - 4.4. These have been configl2red -~ into a tree topology. Th.e solid lines connote the paths between the nodes which are utilized in the tree struct~re; dotted lines connote paths which are not used. Typically :1 if a port N is designated as a parent, port S is either an unused or an adopted child;
25 port E is the parent node's left-child; and port W is the parent nod.e's right-child.
FI~. 9 also shows the eight exterior connections ~or the illustrated board 40. These .~; are denoted with their "~irection" labels, and with the associated PE label: S4.1, -;3 W4.1, Wl.l, Nl.l, Nl.4, E.1.4, E4.4, S4.4. In the following section, the role of the ports in initial configuration will be described.
,.,~
Initial Tree Corlflguration of NPls at Nodes Por each P~ 50, a configuration message determines: (a) which of the ;~, PE ports N,S,E,W are participating, i.e., included, in the tree either as parent or children; (b) which port connects to the current PE's parent; (c) whether ~he paren~
port is the input port and the children ports are the output ports (as in the case of 35 "BROADCAST" mode), or vice-versa (as in the case of "REPORT" mode); and (d) , )2 , . .

., which ports are currently active, i.e., being used to read or write to a child P~; and which ports are currently passive.
; FIG. lO broadly illustrates an overall parallel multiprocessor system which utilizes tne invention in, for example a ~ee topology as in FIG. 9. FIG. lO
shows a Host computer 60, which may be an AT&T PC 6300~ unit, and only one PE
board consisting of 16 PEs of which only three are shown for ease of reading.
However, the assembly can and usually will consist of a large multiplicity of PEboards of the type illustrated in PI13. 5 The initialization process now to be further described, is substantially lO set forth in FIG. 13. Coded messages are formed in tne Host 60 in conventional ~1 fashion. A header message is for ned to include fields to specify a destination ID, -.~ indicating the ID of the PE to which it is to be sent. The header also contains a PASS
field to indicate whether, even after reaching its destination, the message should ~, continue to be propagated "downstream".
~' l5 Initial node port configuration control messages from Host 60 to the PEs are sent via parallel communications control bus 56 to each PE 50 of each board 40.
These messages instruct each node as to how to establish its ports. FIG. l l illustrates the configuration of one of the PEs of FIG. 9, arbitrarily PE 2.1. P~ 2.1, which is one of two second-level children of board root PE l . l is shown with its NPIs configured 20 in the "Broadcast" mode. NPI N2. 1 is prepared to receive data messages from NPI
S. l . l of the parent PE. A path denoted 76 is configured from NPI N2. l ~o the DSP
51 of PE 2.1 through the input ~lFO 74. to support the data processing opelations of J the PE 2.1. Two data signal routing paths, denoted 77, 78,are provided within PE
2. l, by structures described earlier in connection with FIG. 8. These support the data ?~ 25 processing of the lower level P~s. As seen in FIG. l l, input FIFO 74 is not connected when bypass paths 77, 78 are designated.
The tree "leaf ' levels and the associated parent-child relationships, of the tree topology illustrated in FIG. 9, achieved in accordance with the described process, are: root P~-l.l; second level -PEs 2.1 and 1.2; third level - PEs 1.3, 4.3, 3() 3. l and 3.4; fourth level - PEs l .4, 2.3, 3.3, ~.4, 3.2, 4.4, 4. l, and 4.2; and a fifth level PE 2.2 ;s provided.
A listing of the process steps and commands in "C" language for a basic operating system at each node is contained in Appendix I. The pr~ess as depictedwill perform all inter-node communication, will read and write data to/from a node's 35 memory and to the Host 60, will accept instructions and data from the Host 60, and will execute application code, debugging code, etc.

g~

. i The invention is adapted to the detection of and adjustment for faults during the running of an application. Pursuant to this aspect of the invention, the PEs run periodic test routines; and if a failure is detected, the PE genera$es a signal for an intelTupt to the Host. Interruption of the PE is achieved by conventional writing to S the system interrupt bit to invoke the control network of FIG. 10.Interrupts in this direction may be perfonned via bus 59. The fault route-around instructions generated by Host 60 also adjust for faulted PEs which must be bypassed during the configuration process.
i Three~Board Computer Iliustrating Inter-Board Path Options, Ss 10 Host Connection Options, and Tree Conffguration Avoiding a Faulted PE
To review again, each PE of each 16-PE board has four ports. Two of ? the ports in each of the corner PEs in the latdce are available to effect ~, communications external to the board. Further, each PE port communicates with a one of the ports in the nearest neighbor PE.
The flexibility afforded to "grow" P~ lattices within a PE board by the capability to confijgure the ports, is now further illustrated by reference to FIGs. 14 and 15 (together, FIG. 16). Three PE boards, denoted 1, 2, and 3, are shown with the port-to-port PE connections for a tree latitice structure. The PEs are shown not in their fixed lattice positions, but rather in the actual tree geomet~y for data flow, 20 which is created when the P!E ports are configured as described above.
It was stated earlier in the description of the basic board module of FIG.
5, that eight external connections (a through b) are provided; and that this number carries out the Leiserson expansion scheme for a binary tree machine. The advantages of these capabilities in practice are several, as will be seen in the example 25 of FIG. 16.
First, the single data flow connection of the thleie-board arrangement to Host 60, is through the west port of PE 4.1, which corresponds to the connectiondenoted b in FI~}. 5. However, any one of the other seven external connections available at the corner PEs, rnight also be used as the connection to Host 60, by 30 control inst~uctions from the Host 60 sent during the initialization.
Second, two data paths link the bo~rds 2 and 3 to board 1, providing a redundancy in case one path fails. l'heise paths are determined also by configunng the por~s. Board 1 çonnects to board 3 through the interface compr;ising port S of PE
4.1 and port N of PE 1.1 on board 3. Additionally, Board 1 connects to board 3 via 35 board 2, through the path between po~t W of PE 1.1, Board 1 and port W of PE 1.1 ~ 200~902 of board 2; thence, by any one on several possible routings inte~nal to boartl 2, to port S of PE 4.1; thence to port W of PE 4.1 of Board 3. ~dditionally, Board 2 and 3 connect by the route between port N of PE 1.1 of Board 2, and port W of PE 1.1 of Board 3.
S Third, the PE tree configuration of Board 1, it will be noted, omits use of PE 3.2, thus illustrating the fault "route-around" capability of the invention in conjunction with furnishing redundant board-to-board paths and a multiplicity ofoptions for connecting the three board unit to the Host.
Persons skilled in the art will readily realize that various topologies, various lattice mappings and multiple Host and board interconnections are feasible through the teachings of the invention.

Librar~ of Reconfiguration Patterrls Persons skilled in the art of real-time computing that computer architecture set-up processes cannot be allowed to interfere with or impede the execution of any real-time calculations. Since, in the multiprocessor topologies of the present invention, a typical node count is from eight to two hundred fifty-six, there could easily be thousands of reconfiguration scenarios to accommodate to one or more node faults. Real-time calculation cannot be affordçd on these operations. Therefore, an off-line data base resident in memory ~0 of FIG. 10 is supplied with hundreds of thousands of reroute configurations. Each reroute conrlguration is predetermined to be optimal (i.e., minimum depth of tree leafs) arrangement for the particular number of P~s, the desired topology, and any particular PE faults identified through the initial or on-going tests. The reroute configuration set would include, for example, information on the routing shown in FIG. 9, and in FIG. S; but also many more.
Additionally, memory gO is advantageously supplied with specific optimized re-routing of communications paths, to accommodate to backplane connection faults. 'Ihe latter account for a large fraction of faults experienced in multiprocessor operations.
The present invention can be practiced in the signal pattern recognition computing environ~ent depicted in FIG. 1. Signal pattern recognition problems ~or which the invention is particularly useful, include: speech recognition, speakerrecognition1 recogn;tion of imaged objects, and identification of sonar or radar sourcçs.

'~4., ' 2~8902 An existing parallel processing configuration on which this invention builds, isdescribed in Canadian Patent No. 1,293,063 which issued on December 10, 1991 to A.L. Gorin et al, which describes the BROADCAST RESOLVE and "REPORT"
operations of a speech pattern recognition application.
:, 5 In summary, the invention is adapted to:
,:, `.~ (a) configure, under the control of a remote command module, the interconnection ` paths of each PE board in the custom backplane in order to synthesize a desired one .~. of many node interconnection topologies including: linear systolic, binary/nonbinary , tree, and many forms of hybrid; and (b) enable or disable nodes as necessary by revising the communication paths.
Then, pursuant of the invention, the basic strategy for gaining a ; measure of tolerance to faults without prohibitive expense, requires:

(c) adding steps to an application program to convey an idealized or nominal system ' configuration;

(d) p}oviding a means for determining the robustness or health o~ the PEs of a given configuration; and (e) reorganizating the nodes/PEs to run, or to continue running, on a degraded basis.

' .

- 17 - 2ao~02 APPE~X 1 Processor Element Node Operating Syst-~m Punction~
. ~lncludo ~d~cr~-J~.h~
. ~d~in~ TO~AL SLO~S 12 Jd-a~in~ MAx sLo~5 11 ~ TOTA~ SLO~S~
t^ ~ - m~do up .~ddr~9~ for 3p~cial proco~lng ~unctlons '/
~d-f~ne halt ~hd Ox3~00 ~ ~define u~pend c~d Ox3010 : ~define prnt auspend cmd Ox3910 ~define ir~l ldentlfi~r OxffeOOO /' add~ of ~NTREQl - 3tart ~/
~ o~ Intarrupt Vactor t~blo ~/
i ~hort ine ~hadowcn, ~h~dowat, ahado~np;
;~ /~ ------------------- aN~REQl lnte~rupt handler tablo to load addroaae~ into ~/
s~ int eor_handler~ nd o~ reCQptlon ~/
int eot ~ndl~r~ /~ end o~ t~an~ml~alon ~/
- int ha handler~ /7 hoador avs~lablo ~ -~ int of9 handler: /~ output fl~o ~mpty ~/
;;.' int ohf handler; /~ output fl~o half full ~/
. ~nt ift handler; /~ intput ~ifo threahold ~/
int intl handlerl; /~ res~rvod for aslc ti~erl ~/
int intl handlar2; /~ reaervod fo~ aslc at.~re o~ m~sago ~/
_-_---------------- INTREQ2 lnt~rrupt handler tabl~ to lo~d ~ddreaa~a ~nto ~/
int loa handler; /Y npl 10s9 oi synchronlz~tlon ~/
:. lnt 51 handler; /~ ao~t~aro intarrupt ~/
in~ int2 handl2rl; /~ r~aerved for ~lc - npl parlt~ arror ~/
ne int2 handler2; /~ reaer~ed for a3~c - npl abort ~/
int int2 handler3; /~ res~rvod for a~ic - nml p~rity ~rror t/
~nt int2 handler4: t~ res~rv~d for aalc - n~l parlty error ~/ -int int2 handl~r5; /~ ro~orved fo~ a31c - 3~ lo~ of ~ync ~/ -. ine int2_handler6: /~ rea~rved for asic - tL~r 2 h/
/
int active;
:, int end of data;
int STOREat;
char ID:
int byteY recd: ~.
short int output_~ifo_empty - O;

6 ~hort ine end o~_trans D O;
short int if byte~_left ~ 0:
~hort $nt hd~ errcode;
J short $ne w errcod~;
~hort int u~r errcod~;
3 /~ tre~ confisuratlon variabl~
ahozt $nt l_ch~ lds ~, ahort lnt r_chlld;
~hort ine ~ child;
short int parent_ld;
I short int ~m -oots `, /~ identi~y ~hlch, l any, stand~rd con~igu~tion i3 b~lng u~ed ~/
; ~hort int qtd tr~ - O;
~ ~hort int ~td rlng o O; -.1 short int ~td ~ray ~ 0;
parameter~ assoc$aèed ~it~ tho conmund b~inq executed ~ro '/
~ ~ loaded lnto th~o v~riablo~ '/
3 ahort int cmd_param~;
:l ~hort int c~d_param2;
short int cmd_pazam3;
~. short int cmd_par~m4;

7 t ~ c~ll u~er ~unctlona ~/
j int S~u~o~ wait_pg~) ();
~ lnt l~u~er ~p_pgm~
. . .

- 18 - 2 ~ ~ ~ 9 0 2 void (~tmp~unc~
t~tua flaqs : ~hor- :.nt h~lt~d ~ O;
O'- _3t ~u~pendod ~ Os 3A~ nt uaer~p ~ O~
. ~hort int next ~lot ~ O: /o index u~ed to g~t next comman~ to execut~ ~J
~ort int fill alot ~ O: '/- index u~ed to qet n~xt lot to flll /
/~ ~tructur~ deflnlng command queue .
. ~truct cqueu~ I
~nt ~ubrtn;
~hort lnt p~raml:
.' ahort lnt param2:
short int para~3:
~hort ~nt par~m4 ) m~ tMAX SLOTSj:
~truct rqueue ~next alotptr, fill alotptr;
/~ command queue ~cel~al_slot ~/
: int ne~ subrtn:
short ~n~ new_p~raml;
ahort lnt ne~_param2 short lnt new~par~m3 ~hort ~nt new_par~m4;
tore cmd parameter~ ~h1le executing ~/
int sparaml; /~ ~pecial proce~Ying function3 r/
` int ~p~ra~2:
i int ~param3;
: int ~param4;
, / /
/~ U~ed to determins wher~ loa error occurred /
: int wfif - O; /~ ~itin~ fro~ input fifo ~/
ine wtof - O; /~ vritin~ to o~tput fifo ~/
$
i /~ control register maak~ and defini~ion i ~ define .~rset ~en Ox~OOO
t ~ define ien maak Ox7ff~
4 define ofeen maak Ox6ff~
defiAe ohfen_muak Oxdff~
. ~ defin~ haen_ma~ Oxeff define eoten maak Oxf7f ~ ~ define eoren muak Oxfdf~
: ~ define ~ften maYk Oxf0ff ; seruct acn l /~ la~t 7 are mu~ks, interrupt unma~k~d ~f ble ~
un~gned ln~ nod~ ld : 7; /- nod~ id~ntific~eion number /
~ un~gned int bcen : 1, /' bcaat enabl~ reapond /
i unaigned int ~r~ re~ot pld /
un~ign~d ln~ lften ~ inpu~ ifo dat~ ~/
` unslgn~d $nS eoren : l; /- end of ~ceptio~ r/
i unalgnod lnt eoton : 1: /- end o~ ~r~n~m;~3ion /
. un~gncd ln~ h~o~ ~ e~der avall~bl0 ~/
~ unslgn~d lnt o~fen : 1: /- output ~ifo h~ ull /
i un~igned ~nt of oen : 1: / outpuf ~i~o empey /
un~gn~d int len ~ sdak~ ift,of~,~h~,eot,eor,ha ~/
J shadwcn~
/~ ~tatu~ ~gister m~a~ and definition .
define iwrd m~k OxfcOO
de~in~ eor maak Ox76~
def~ne cot m~ Oxf7f~
~ d~fln~ h~ m~qk Ox~f~
t deflne ~hf musk Oxdfff defin~ ofa m~k Ox6f~
~ de~in~ lo~ ma3k Ox7ff~
struct ~at I /~ laat ~ g~nerate inter~upt~ i~ unm~kad ln cn reg~stor f ~n~lgnod ln~ 1~ w~d cnt : lO: /~ lnput fi~o byta coun~ /

19 - 20089~2 un~gn~d lnt ~or ~ end o~ r~ceptlon ~d~t ) ^/
un~l~ned lnt ~ot ~ nd of tr~na~l3~Lon ~aourco~ ~/
unaigned ~nt ha ~ h~dd~r Avallabl~ ~/
unalgnod lnt oh : l; /- output ~l~o hal~ ~ull ~/
un~lgnod ~nt o~o : 1: /- outpu~ fl~o ~mpty ~/
untl~ned lnt los : l; /' lo~s of qync (hd~e ~allur~
ahad~tstJ
` /~ npi reglator ma k~ ~nd deflnition ~ define nio_qel~ct . Oxfre define ~io select Oxff~d define ~io_a~lect Oxfffb ` ~ define wio_~elect Oxff~7 : ~ define ~rc if_maak Oxffcf /~ 2 blt~ /
; ~ definq if_enabled_ma~k Ox~6 :~ ~ deflne ift lvl Ox4f7 defin~ nport_cma~k Oxfcff /~ 2 blt3 ~/
define ~port_c~k Oxf3~ 2 bit~ ~/
define eport_cma~ Oxc~ff /~ 2 blt~ ~/
~ def~ne wport-cmaa~ Ox3ff~ /~ 2 bit3 Y/
struce snpil6 1 /~ npl control reg~tor ~/
~` un~gncd lnt nlo : 1; /~ 13t 4 deflno port ~5 lnput/output ~/
un~igned int ~$o : 1; /~ O ~ lnput . /
unaigned lnt elo : 1; / 1 ~ output c un~igned ~nt wio : l;
un3igned ~nt ~rG ~: 2; /- OO(n), Ol(a), 10(~), ll~u) ~/
unalgned ~nt ~a~k_3rc_1f : 1; /' i O, Ingoro Yalu~ ln arc lf ~/
un~igned int lf~ level : l; /- ~f l,ift ~han 1~ i~ 1/2 ~ull /
0, when l/mor~ byt~
f port 13 output, port ~ i~ port connectod eo ~ aourc~ ~/
/~ lf 11, output/header~ are 30U~Ce to output ~/
~ qelactlon lgnored if port i~ ~ourced to lnput/header ~ifo~ /
,A unaigned lnt n npconfi~ : 2: /~ 00~3).~ 0~ 0(~)~ ll(o/h~
; un~igned ~nt 9 npconfl~ : 2; /7 OOin), Ol(e) lO(~), ll(o/hf) /
:~ un~igned int ~ npcon~l~ : 2: /' OO(n), Ol~ w), ll~o/hf~ ~/
3 unsigned ~nt ~ npcon~i~ : 2: /' OOln), Ol(a) 10~, ll(o/hf) /
`~ ) qhadwnp;
~` struct Qnpi24~ /~ 32-bit npi regi~ter for 24-bie write~ to oen /
struct ~npil6 tmpl6:
unsigned lnt npi2q unu~edl : 7:
unaigned int ofen : 1:
un3igned lnt npl2~_unuqed2 : 8:
, ` ) ~npl:
/~ npl ~egiater m~9~ and de~inition t define np~_acti~o Oxeff~
define npa_actlYo Oxd~
define nper ctivo Ox6ff~
J defino npw actl~o Ox7~f~
struct 3n31 ( /Q ns~ ~and npi ~k ~ control regi~t~s ~/
un3ign~d lnt ns~ unu~d : 12;
un~igned lnt n np unuaed ~ th~ bit la a~ to one, I/
unsigned ~nt a np unuaed ~ lgnore npi control regi~e~r /
un3ign~d ine o np unu~od : ~ es~ing~ ior tha~ p~rticula~ /
un~ign~d int w np unu~0d: 1: /~ po~t .
9hadwna ~
char de~t id; /~ or ea~o o2 uso, de~in~ id aa 8 bit~, ~ctu~lly 7 ~/
~truct ~h~ I /o he~der ~ifo ~not actually a fi~o) t char src ~d: /^ need ~t~ucturo to creat~ 2~-bit ~/
~hort int not~inql; /~ ~ritn o~ upp~r 3 bita o~ b~ to set /
ahadwhf; /~ the HV b~
eruct 3h~21 . /~ 24-bit ints ar~ ~eor~d a~ 32 blta by DsP32C / :
~tru~t ~h~;
unaiqned i~t ~fi2~_unuaedl : 7;
un~ignod lnt ~h ~ign : 1;

.~ ;;, ,{,o~

- 2~89~2 - ~ h~2~1;
/
u~ion I un~igned lnt ~ctu~l; un~l~n~d ch~r ~hdr~[2~; ~truct ~3t tmp~t; ~addr, cpy, c:~
/
1- beginnlng ~ubroutl~ ll b~ loaded at addre~s Ox~
beginninq~
put PLD intv 3 ~ait atat~ mod~ ^/
a~m~nr2 - 9~):
asm(~pcw ~ r2n), /~ initiallze command queu~e pointer~ 't next_t~lotpt~ ~ ~mq[O~.tttubrtn:
fill ~lotptr ~ ~mq[O~ ubrtn;
a~m~nr22e ~ *irql_ident1l2rU); 1~ addr of ~tart of inter~upt vector tabla 1/
~ / . /
NODE INAC~IYE ~/
oxeucted when no command~ hav~ been received yet, or ~t / are in the middl~ o~ p~oqram execution and command queu~ emptie~ q/
,. / /
nod~ wait(~
~ uait_loop:
., active - O;
if ~mq~next ~lot].~ubrtn - OxO~ /^ no co~mand-~ $n queu~ ~/
if ~uaer ~ait_pgm !~ OxO) ~ (*user_wait_pgm) 1):
~ elt~o ~ - goto wait loop;
't , el~
act~vato_nodet1:
(~tmpfunc~ execute next commund /
.,, ) , ) :, / /
ACTIVA~E NODE ~/
/R read comn~nd from queuo, reinitializo 910t, increm~nt ~lot pointer ~/
t activ~te node~1 I
~ void ~tmpl;
`(, aceiv~ thorn ia anothcr comm~nd in queua ~
i tmpl ~ next_slotptr-~qu~rtn: / 9 COpy addre~ of aubroutin~ to b~
/~ execut~d to tmpl - will not ~
tmpunc ~ tmp1; /~ compilc i copy directly to tmpfunc ~/
cmd_par~ml - next_slotptr->p~raml:
cmd_param2 - next alotptr-~plram2:
cmd_param3 ~ n~xt_alotpt~->param3: ~ copy all p~r~m~tor~ to ~/
i cmd p~ram~ - ~ext i~lotpt~->p~ram~; J- varlable~ u3er can acce~
3 /~ --- reinitiali2~ oo~n~nd ~ueu~ ~lot /
next ~lotptr-~ubrt~ - OxO:
' n,sxt ~lo~ptr-~p~raml in OXO;
j next_~lotptr->par~2 ~ OxO:
nex~ ~lotptr->param3 ~ OxO:
next ~loeptr-~param~ Y OxO:
! /~ ---- ------~- ------------- ~ncrement next al~e pointor /
if ~next sloc -- MAX_SLOTS) n~xt :~lot ~ 0:
next ~llotp~r - ~mqlOj.~ubr~Q:

.,~;, .
.'~. _.!.

~ 21 2ao8~02 el l~
: /~ n~xt ~lot~+; ~/
: nex~ ~lotptr - Lm~+next_~lot~.aubrtn:
.: / J
', /2 Spocl~l Proc~3~1n~ Functlona ~/
Mov~ parAm~t~r~ to ~ork ar~A ~nd reinitlllzo cll varlablea Ln 'I
/~ the ~e~mand ~u~u~ recalval ~lot ~t :~ /~ ____________________________ /
pmc~lott) ~. I
. ~paraml ~ ne~_paraml:
-x aparam2 - n~w_pa~am~;
.. 3param3 - new_param3:
param4 - ne~_param~;
~:; /~ ------------------- reinitl~llzo receival ~lot Yari~ble~
: ne~ aubrtn ~ OxO;
. new_par~ml - OxO;
::.` new_psrnm2 - OxOs ne~_param3 ~ OxO;
~' ne~_param4 ~ OxO:
.i ~
/
TREQ2 Interrupt Idontific~t~on ~rogr~ ~/
/~ interrupta avallabl~
103 of iync - if u~e de ault w, c~n identi~y whether error '/
occured wh~lc readin~ input/writing output fifo '/
/~error 20 - unidsnsifiabl~i error ~/
21 - input fifo error ~/
/~22 - output fifo error ~/
/~aoftwaro interrupt - next command receivod ~rom RT~ ~/
~ oftwar~ inte~rupe placed fir~t io t~t i~ you recei~o another ~/
1 /~command ~h1l~ in ~not~Qr intorrupt ~andlor, i~
, /~ procea~ed immediately ~/
i2ident() ~ I
maakint():
l if (ne~ subrtn !~ OxO) awint~);
~ elac :1 if ( (~hadow~t h loi ma3k) ~ if SqhAd~.loe~
-`I ~ .
~ if ~wfif) /- error ~ re~ding from ~npue i~o ~
~ hdw2 errcodo ~ 21: 1 ~toro error cod~
,i a~mS~rl - OxlS~ writo er~ ~ ~o re~
asm~plr ~ rl~ wri~ p~r~ l int. r~q. ~/
~ ) t- can o~ly ~rit* pis from reg.
I olao 1 if ~wto~ error ~hLl~ ~riting to output fi~o ~/ :
' ( 1 hdwe ~rrcod~ - '2:
a~m~rl - 0~16~ rit~ ors ~ to reg. ~/
asm(~pl~ ~ rl~ ur1t~ par~ l int. reg. ~/
n only ~rito p~r ~rQ~ r~g.~/
~ e :3 ( /- u~sbl~ to ld~ntify ~rror /
hdw~ errcod~ ~ 20: /- ator~ ~rror cod~ ~/
a~m(~ Oxl~ rlt~ er~ ~ to ro~
~sml~p~ r rln~ /~ wrieo p~ra11O1 ~nt. req. /
e~n only wr~to pis ~rom reg.-/
C~ ine ~);
a~ml~return~), , ~ .

~ 22 ~ 2~89~2 /- INT~XQ2 Softu~r~ ~ntorrup~ tSl) Handl3r ~t __________ ~
. ~wint () t~ Joftwaro lnterrupt hsndler '/
: l oid ~t~p2;
if (! activ~i ~ activ~ if wa~ in~ctlve, actlv~te lt ~t :~ i ~ tne~ subrtn ---au~p~nd cmd) ll - Inew_subrtn -~ prnt su~pend cmd) ) .. auspended ~ 1:
. el~e if (new ~ubrtn ~ halt cmd) halted - l;
else if (new aubrtn ~- usor_ap_pgm~
. . ( .
; uaerap - ls ; tmp2 - u~or ~p_pgm:
~ tmpfunc ~ tmp2;
.'~ 3pmcalot();
*tmpfunc) ()s ~, el~e I if ~fill ~lotptr-~aubrtn - OxO) /~ next slot ls empty v/
v fill next comm~nd quouo ~lot ~t fill ~lotptr~>3ubrtn - ne~ subrtn;
fill alotptr-~paraml - ne~_paraml:
fill 310eptr->param2 - ~e~_oar~m2:
: fill alotptr->para~3 - new_p~ram3:
'l fill alotptr-~p~ram~ ~ ne~_param4;
. if Iflll alot -- ~AX_SLOTS) /^ increm~nt nex~ alot po~nte .~ ~
I f~ lot - 0:
~ flll slotpt~ ~ ~mq~Oj.~ubrtn: -j' elae tl~ ~ill Ylot+~; I't f fill alotpt~ ~ ~mqi++flll ~lot]~ubrtn: ~-1 new subrtn - OxO: / a r~initi~lize rec~iv~l alo~ /
;~' new_par~1 - OxOt ' ne~_param2 - OxO;
i ne~_param3 - OxO;
n~_param4 ~ OxO;
else ueuo ~ull, fatal erro~ ~/
a~m(nrl ~ Ox18~1s t- ~rite 2q to ~cratch reg~er ~/
a~m(~pii~ - Ll~); t~ write 2~ to parallal lnt~rrupt regiater a/
¦ t~ to gene~to an lnterr~pt ~o the R~
5i loop: t- ~atal error, w~it ~ero or ~H to do ~omeehing /
goto ~i_loop;

whil~ (su3pended) i (new_aub~tn !~ OxO) tmp2 - new lubrtn;
tmp~unc ~- tmp2;
_------------------ move param~ter~, rein~t~li2~ ~loe ~pmc~lot ~);

~ .

_ 2~ ~ 2 0 ~ 8 9 ~ 2 .
: ~.
: ('tmpfu~c) ();
,: ) r ~ ' e (halted) do nothln~ all m~mory loc:ationa remain exactly a~ rhey wer~ ~/
/~ when tho h~lt commancl was is~ued ~/
.,. ) ~j I
/~ INTREQl Interrupt Identif~c~tion Proqram ~/
interrupta ~llabl~ ccur3 at ) ~/
f- eor end of receipt~on ~de tinat~on node) ~/
eot ~nd of tran~mi-~lon (~ourc* node) ~/
/~ ha heDd~r a~allablo (des~inatlon node~ a/
: /~ oh~ output flfo half full ~ource node) ~/
t~ ofe output flfo empty Isource nod~
/~ ift ~nput ~ifo thre~hold ~/
i' /~ error~ ~hich may be ~enerat~d: ~/
i /~ error 30 - unideneifi~ble interrupt ~/
/ ~ 31 - ohf timeou~ ~/
t _lide~tl~
~hort int errcode;
~oid ~tmp3;
m~s~int~);
' /~ i lahadYst.~or ~ shadwcn.eoren) ~/
! if l( Ishadow~t 6 eor ma~k) -- lt ~ t ~hado~cn ~ eoren ma~k) /~ c~ll eor interrupt handl~
t~p3 ~ ~or h~ndler:
tmpfunc ~ tmp3:
l~tmpfunc1 1);
el~e /~ ~P l~had~s~.eot ~ shsdwon.eoten) ~/
if ~( (ah3do~at ~ eot mask) -- l1 Gh I ~ahado~cn ~ eoten m~ak) { /~ c~ll eot in~errupt h~ndler ~/
tmp3 - ~ot handler;
tmpfunc ~ t~p3;
i~tmpf~nc) ();
el~
/~ if ~ahad~at.ha 6S ~hadwGn.haen~
if (( ~3hadow-~t 6 h~ mask) ~ 6 ~ (~hado~cn ~ h~en mAak) ~ l)3 ~ /~ c~ll ha interrupt handler r/
tnp3 - ha handler;
~mpfunc ~ t~p3:
I~tmp~u~c) ~):
el~e /~ if Ishad~.ohf 6~ ~hadwcn.ohfent ~/
if 1~ ~3hadow3t ~ oh n~ak~ Y- l~ 6~ ( (ah3do~cn ~ ohen m~k~ -- a)~
~ /~ c~ll oh~ interrupt h~ndle ~/
tmp3 ~ ohf_h~ndl~r:
tmp~unc ~ tmp3 ~mpfunc) 1): -el o /~ if l~had~t.o~e L6 ~hadwcn.oeen~ ~/
i~ II l~h~do~t & o~o m~ak~ S I 13h~dowcn ~ ofeen m~3k) I /~ call o~e inter~upt ~andler /
tmp3 ~ ofo hAndle~:
tmp~unc ~ tmp3;
~*tmp~unc~
) - 2~ -2~8992 el3e ~ f ll~ ~nt ~ 0 ~ ahadwcn~l~ten1 fA/
:: ~f l( ~hado~t ~ lwrd m~sk) ~ 0) ~ ( (ahadowcn ~ lfton maa~
~ c-ll lft int~rrupt handl~r :~ tmp3 ~ l~t h~ndler:
` tmpfunc - t~p3:
: ~t~pfunc) (~;
.~ el~e /- unidentl~$abl~ Interrupt ~/
~ a~m(~rl ~ 0xlS~ rite err ~ to rag. ~/
: a~m~pir ~ rl~ rit~ 30 to p~r~llel int. re~
'. ~/R can only write p~r rom reg.~/
. cle~rintl~:
asm(~ireturn~);
:,. /
. EOT de~ult Interxupt Handler ~/
~ f eot ~ ofe, clear of en (releaae xpa~ */
.; / /
:,~ def eo~ handler(~
' (/n at regi3ter copicd eo ahadowst in maakint ~/
`~' reqiater int ~rtl:
regi~ter int ~rt2:
regi~ter int rt3:
reqi ter int ~rt4:
. end of trans ~set thls for mult. hdr only 3ends ~/
i~ ( (shado~at ~ ofe mas~
rt~ ~ np:/~ load np reg. addr. in~o regi3ter ~/
,. output fifo empty - l:
i rt3 ~ ~hado~np:/~ lo~d contents of shado~p ~nto regiater ~t rt4 - rt3;/~ 16 bit ~rite eo np to clear OFEN ~/
,1 /Q --------------------------------- clear interrupt ~/
~ r~ t;/~ rtl ~ addr~s~ of t regi~ter ~/
`~ rt2 ~ ~hadow~t; /~ r~2 ~ addresa of ~h~do~ copy of ~t regiater ~/
I hadow~t - shado~st^eot m~s~ cle~, ~ot interrupt ~/ :
! ~rtl - ~t2;/R at ~ ahadowat ~/
/
- EOR defaul~ Int~rrupt ~andler ~/
/
de~ eor handlerl~ -~t regiseer c~,pied to ~h~do~t in m~kint ^/
register int ~rtl;
regiater int ~rt2:
end of dat~
if bytes le~t ~ ~hadwat.i~ wrd cnt; /- get ~ byt~a~ l~f~ft to read ~/ -/~ --------------------------------- clea~ ,~n~errupt ~/
rtl ~ ~t: ~^ rtl - addrea~ o~ cn rey~ter ~/
r:2 - ~hadw~t: /~ rt2 ~ addres~ o ahado~ copy o~ ~t r~gl3t~
~hadow3t ~ ~hadow~lt^~eor m~ cl~a~ ~or lnt~rrupt ~/
rtl - ~rt2; ~ sh~do~t ~/

HA d~~ult Intc rupt H~ndl~r ~/

I~ ________________ /
def_ha handler~) reqis~r int trtl;
register int rt2:
rtl ~ cn; /~ rtl ~ ~ddroJ~ o~ cn r~glate~
ahadowcn ~ s~adowcn+haen maak~ C h~ m~k ~ n~s~ h~ nto~rupt ~/ :
~' '','""" ;' ''` " "'i.' '' ;' '' ''` .'; ;:```". .'' '' ;` ' '' "'''' ,',,' , ~ - 25 ~ 2~9~2 /~ kno~ haon ~a~n't O ~or~ bec~u~e 90~ lnterrupt ~/
rt2 ~ sh~dowcn: /~ rt2 ~ ~hado~ Copy o~ Cn r~ql~t~z ~/
.. ~rtl - rt2; /- cn - ahadowcn ~/
~' I
/
/. O~E def~ult ~nt~rrupt Handl~r ~/
de~ ofe handler() I /~ at re9i~tgr copied to ahadowat in maskint /
reqi~ter int ~r~l:
regiater int ~rt2;
~ /. __________________-__------------ cle~r ~nt2rrupt ~/
,.~?,, rtl ~ st; /' rtl - addre~a of cn roglat~r ~/
-~ rt2 - ~shad~3t: /~ rt2 - addr~ of ~h~dow copy of at reqi~te- ~
hadow~t ~ ahadow~tAofe maak: /^ clear ofe ~nterrupt '/
rtl ~ ~rt2; J~ ~t ~ ahado~t - can do ME~-MEM ~/
J- ur~te only ~n C cod~ q/
:. / /
IFT def~ult ~ntorrupt Handler ~/
/
~ def ift handler5) .. I t~ ~t regi~ter copled to 3hadowat ln ma3~in~ ~/
.~ . regi~ter int rtl;
, reqister int ~rt2:
~-~' regiater int rt3;
.' , /t _____________--_----_------------ clear eor interrupt - ~eed eor to ~/
... J~ get ~inal lf byte count ~/
:' shado~3t - ~hadowst^eor mask: /~ cle~r ~or lnterrupt ~/
~I rtl - ~e:/^ rtl ~ addre~s o ~t regi3t~r ~/
'~, rt2 - ~ahadw3~ rt2 - ~ddre~ of ahddo~ copy of ~t regi3ter !/
rt2 ~ ~rtl; copy ~t regi~t~r to ~hado~ copy ~/
rt3 ~ ~rt2; /~ copy _~adow copy to regiator ~/ rtl - rt3:/~ at - ~hado~3t ~/
if bytes left ~ 3hadw~t.if_wrd_cnt: /~ qet ~ byte~ left ts read .~ byte~ recd - O;
movein(~move data from i~ to memory ^/
.~ ) .
/
Ma~k dll IN~REOl Ineerrupt~ ~/
ien - 1 upon entry to th~s aubroutine ~/
if ien ~ O interrupt~ uill be ma~ked ~/
/
maskint() , regi~ter int ^rtl:
;~ :egi3ter int rt2;
-~ ~ regiseer int trt3;
~ rtl - cn;/~ rtl - ~ddre~ o~ cn regi~ter ~/
.~ shado~cn - ah~do~cn~araet ien; /- clear ien-maa~i all ~N~l int~rrupta n/
:~ rt2 ~ ah~dowcn; t~ rt2 ~ ahadow copy og cn regi~t~r a/
rt3 - ~hado~ats /~ rt3 - shadow copy of ~t re~iatar /
~rtl ~ re2: /~ cn ~ ~hado~cn ~t rtl - at: /- rel ~ ~ddr~as o~ 3t regiater ~/
i ~re3 ~ ~r~ copy ~t ~eqi~te~ to ~hadow copy /
addr.actuDl - ahadowat; /' copy int ah~do~ copy eo unlon ~J
shadw~t ~ creg.empat; /- copy union to ~truct ~/.
t t~ ~nmaak IN~REQ1 2ncerrup~s C/
t~ ~en ~ O upon entry ~o this ~ubroutin~
^/
t~ if ien ~ 1 intol.rupt~ wlll bo generac~d o/
. /
! c~earint~

-26- 2ao8902 -e^:~ter int rtl;
regl~ter int ~rt2;
`~ rtl ~ cn; /- rtl ~ addroas o~ cn regl~ter ~/
. ~hadowcn - ~hado~cn^~r~et ien; /- ~et i~n-unma~ ~ll IN~l lnterrupta ~/rt2 - Cohadwcn; /~ rt2 ~ addreaa o ~h~dou copy of cn reglster ~/
rtl - rt2; ~' cn - ahado~cn ~/
.: ) /^ MOVEIN Subroutln~ ~/
hie ~ubroutine move3 dsta from the input FIFO to ~ memory loca~on, ~/
/~ the addre3~ of which 1~ ~tored in the variable S~OREat ~/
/- NOTE: ~hls Qubroutlno ~111 not work for 24-blt lntegera ~/
t~ aecau~e input FIFO 19 only 8 blt~, alway~ readln~ the ~l~ part of /~ an int or float from the ifo ~/
/- Take ~nap~hot of byto count when ift occura. Decrement that count ~/
/- to 0: if not eor, thero'~ at{ 11 more bytea to copy. Ta~e /
/t another snap~hot o~ byt~ count and decrement. Do thls until ~/
/- eor (end o~ d~ta /~ ------------------------------------------------------______________ n movein() s regi~eer int mcl;
regi~ter int ~rtl;
regi3ter int ~rt2:
wfif ~ l;
. aam~rle ~ iflfo~); / rl ~ addr of input fio ~/
a~m(~r3e ~ STOREat~ r3 - adds to ator~ d~ta ~t ~/
keep readi~g:
`~ mcl - if bytes left; /~ lo d byte3 eo mo~o into reg~ter ~/
-~ byte~ recd ~ byte~ ecd~ byt~ lefe; /~ sum total ~byte~ ln packet /
, ~hile ( mcl ~ O) .~ ~
aam("r21 ~ ~rln); /t r21 ~ b of 16 bit data ~/
a~m(~3~nop"~ pld r~g~. ar2 3 ~ait atate /
a~ml~r3 - r21~ move data to memory ^/
a~m(~3~nop~ pld regs. are 3 w~it atate t/
a~m(~r3 ~ r3~ incr. per Co next B bit memory location ~/
mcl D ~Cl ~ decr. count of byte~ to move ~/
. if (! end of_data) /~ end of d~ta Yhen get eor interrupt ~/
rtl ~ st; / rtl ~ addrea~ o~ 9t regiJter ~/
rt2 - rtl: copy st reqi~ter to sh~do~ copy 1~/
hado~t ~ ~rtl; /^ copy ~t r~giat~r to shadoY copy ^/
~i add~.sctua1 ~ ~hado~t; /~ copy int sh2~dv~- copy to union ~/
hadw t ~ creq.t~o~t; /- ^opy union to ~truct ~/
if byte~ left - ~h~d~t.$~ wrd_cnt: /~ get ~I bytes le~t to read . ~ goto keep_r~dlnq;

end of dat~ - O; /- cle~r end o~ da~a ~/
f i ~ ~ O;
.; / /
/^ MOVEI2~ Subroutin- ~/
This aubroutine mo~re~ 24 b$t $nteg~ra f rom th~ input ~IFO to a ^/
memory locatlon, thc address of ~hich i~ ~tor~d $~ th~
~ /^ var~blo STOREat /
7 /- ^/
-I /^ NOTE: Th~ ~ubrout$n~ ~ll not ~ork ~o~ 16 or 32-bit dat~ ~/
/~' Becaua~ input FIFO 13 only e bit~, a1~ay~ reading th~ "1" part of ~/' ; .

` ~ 27 - 2a~902 an ine fro~ tho fl~o, ~/
24-b:~ dat~ actunlly allocatcd 32 bit. ln memory. Ater randlng ~/
- /~ laat 8 bits of data, must lncrem~nt pointer 2 byte~ to point b/
/- to 3tartlng 3ddse~s o~ next 24-biC lnteger R/
/- R/
: /- ~ake 3napahot o~ byte count ~h~n ~ft occurs. Decremænt that count ~/
- ~o O: if not eor, ther-'3 atill more byte~ to copy. Tako /
~- another ~nap~hot of byte count and decrement. Do thia until 8/
eor (end o dat~
. ______________________~_________________________________~___________ .ove:24l~
: ~ .
reg~er in~ mcl;
~`- resi~ter int ~rtl;
~egi~ter int rt2;
w ~
'-~ a~m~"rle - ifif On): /~ rl ~ addr of input ~ifo */
asm('-r3e - STOREat~ r3 ~ ~ddr to atore data ~t */
:;.
- rea~ ~24:
-~ mcl - i byte~ left: /~ lond ~bytea to move into regi~ter ~/
r bytea recd - bytea recd+l_byte~_1eft; /~ aum totnl ibytea in packet ~/
while ~ mcl > O) ` a~m(~r21 ~ ~rl"); /~ r21 8 l~b of 24 bit daea ~/
aam~n3~nop~ pld reg~. are 3 ~alt atate '/
aam~"Rr3 r21~); /^ move data to ~mory ~/
a~m(n3~nopn); /- pld reg~. aro 3 ~a~t ~ta~
: a~m~ r3 ~ r3+1~ incr. ptr to next a bit m~mory location ~/
a~m(~r21 ~ rln); /~ r21 ~ next ~ bi~a of 2~ bit dat- t/
a~m~n3~nOpn); /^ pld reg~. are 3 ~ait ~t~te ~/
~- a~m~n~r3 - r21"); /- mo~e d~ta to 0emory ~/
~?' a~m~n3Rnopn); /- pld rog~. ar~ 3 wait 3tate */
atml"r3 ~ r3~1n); /~ incr. ptr to next 8 b~t ~emory loc~tion R/
~! a~m~nr21 - *rl~); IR r21 - 8 m~b o 24 blt data /
a~m~n3~nopn): /~ pld reg~. are 3 v~it ~t~to ~/
a~m~ntr3 - r21n); /~ mov~ data to memory R/
a~m~"3~nopn); /- pld reg~. are 3 ~ait ~tat0 ~/
a~m~nr3 - r312~ incr. ptr to next 2~ ~it memory location /
mcl ~ mcl - 3; /- decr. count - movo 3 8-bit quantities ~/
, if ~! end of data~ /~ end o data ~hen get eor interrupt /
1 rtl - ~t; /~ rtl - addre3s of ~t regi~ter ~/
;~ /R ~rt2 ~ ~rtl: copy ~t regi~Cer to ~hado~ copy /hadowYt ~ ~rtl; /c copy ~t regi~t~r to ~h~dow copy ~/
addr.actual ~ Yhado~st; /- copy int ~ado~ copy to union ~/
~hadwat ~ creg.tmpat; ~' copy union to ~truct /
Af bytea left - shadwst.if_~rd_cn~: /- get bytes left to read /
-. qoto read_124;
) er.d_of data ~ O; /^ clea; end o~ dat2 ~/
wf_f - O;
;~ t ____________________~______ _______ _________________~_____________ ~
.; /~ SE~DPKT~ Subroutin~ ~/
/~ This 3ubrsutine c~lculate3 the number of 8-bit dat~ tr~n ~ers ar~ /
. /~ requi~ed ~o~ 16 bit ~nteqe~ a~d call~ ~endpkt ~/
.
se..dpkti(de~t,mem,bite3) ~; ohar dest;
n: mem, bite~;
bite~ ~ bite~ ~ 2;

- 28 - 2~089~2 sendpkt~de~t,m~m,b~te~);

. /. SENDPK~ Subroutlne ~/
~^ S~ ubroutlne calculato3 the number o~ ~-bit date tr~n~fer3 are ~/
/^ required ~or 32 blt ~loat~ and call3 ~endpkt ~/
/
. sendpict~ ~de3t,mem,biee~) : char de~t;
ir.~ ~em, bit~;
.: ~
:. bitea - blte~
~endpkt(deat,mem,bite~);

S~NDPKS Subroutin~ ~/
/~ Thi~ ubroutln~ createa a~d load~ a ~data plu~ he~der pack~t 1nto /
/~ the output FIFO ~/
/~ Output FIFO mu~t be empty before loading next data pac~et - lf ~e ~/
not empty, return error code 14 to application program ~/
sendpkt(de~t,mem,blte~) ` char de~t;
lnt mem, bite~:
int t~pctr;
wtof - l;
a~m("rle ~ 9t"): /~ rl - addr o~ ~t regiater /
a~m(nr2 - ~rl~ copy ~t reg. to dap32 reg. /
a~m~"3~nop~ pld reg~. are 3 wait ~tate ~/
a m~"~qhado~t - r2~ copy ~t reg. to sh~dow t ~t if ( (~hadow~t ~ ofe ma3k) !~ ofifo not empty - error ~/
return(14):
outpu~ fi~o empty - 0: /~ e~ 9 ~or loading dat~ to of ~/
end of trana ~ 0; /' ~t flag - noe end of t~an~ ~/
asm("rle ~ ofifoU); /- rl - addr of output ~ifo ~/
asm("r3e ~ S~OREat~ r3 - addr to move data fro~ /
asm("r4e ~ hf~ r4 ~ addr of header ~ifo ~/
asm(nrlSe - npn); /~ rlS - addr of np regi~ter /
snpi.tmpl6 - ~hadwnp;
asm("rl6e - npi24~ rl6 ~ 2~ bie np ~g. ~/
de~t id - de~t; /- fill de~t id ~debugging? ) ~/
~hadwhf.~rc id ID; /- creat~ h~ad~r ~ rc id ~/
i~ (b~tea ~ 512) ritc a11 dat~ except la~t ~ bita _____________--------- writc la~t 8 bit~ as 2~ bit~ to ~et EOM ~/
/~ -_------..------------ write h~ ~ np to atart data tranami3~ion /
for ~tmpctr ~ 0; tmpctr~ c bite~
rito a11 ~ut la~t byt~ to o~ ~/
a~m~r21 ~ ~r3~ r21 - 0 bita o data Y/
a3m5n~nop~): /- pld reg~. ar~ 3 w~it ~t~t~ /
aam~rl ~ r21"): /' move dat~ to outpu~ fi~o ~/
aam~" 3 ~ ~311"): / incr. ptr to n~xt 8 bit memory ~/
1 /^ loc~tion -/
a~m(~r2l - ~r3n): /~ r2l ~ la~ byte of dat~ /
asm(ntrl - .r3¢~ 2q bit wriee ~o of~fo to ~et eom bit J
a~m(~rl7 - ~dea~ write l~b ~o regi~ter m~n3~nopn~ pld r~g~. are 3 wait at~t~
a3m5n*~4 ~ rl7~ write lab to ~fifo ~
a~m~3dnopA); /~ pld r2g~. are 3 uait 3tate ~/
a3m(~rl7 ~ ~shadwhfn); /~ 24-bit write m~b to r~gilter ~/
~JQ~ :

. - 2a _ - 2~89~2 ai~m(~3'nop~ pld roga. ~r- 3 ~lt i~tat~ ~/
m( r4 - rl7en); /~ 24-blt wrlte m~b to hrito to ~t ~V blt ~/
~; ai~m(~rl5 - rl6eU); /~ 2q-blt ~rlt~ to np to ~ctlvtt~t~ of XPA b/
~ ',A, ) '~, el3l~
;~ l . ____________---------- wrlt~ chunk o~ 8 blt data to o~ t ~ /- ---------------------- ~rite hf ~ np to teart data tranaml~alon ~/
`~ /^ ---- ----------------- ~rite rc~t o~ data except laat 8 blt~ ~/
____________-____----- write laat ~ bitii Di~ 2~ bita to i~et EOM ~/
~or ~tmpctr - 0; tmpctrl~ < 512;) I /~ write data to hal th~ of ~/
~a~(~r21 - ~r3ni; /a r21 - 8 blti~ of d2t~ ~/
-~ ai~m(~3~nopn) /~ pld regs. ~r~ 3 ~it 3tat~ ~/
` ~3m(~rl ~ r21~ mo~o data to outp~t fifo ~/
~am(~r3 - r3~1n): /~ incr. ptr to next 8 bit memo y ~/
) /^ locatio~ ~/
~ ai~m~nrl7 - ~dest"): /~ write 13b to reg$i~ter ~/
v a~m( 3~nop~ pld rc~3. are 3 whie atate ~/
~i~m(~r4 - rl7"); /~ ~rito l~b to hfi~o C/
ai~mt~3~nopn); /~ pld regs. arc 3 walt i~t~t~ ~/
ai~m("rl7e ~ ~shDdwhn)s /~ 24-blt ~rito m3b to ~eg~ter 7/
a~m~n3~nop~ pld reg3. aire 3 ~ait i~eat8 ~/
ai~m!"-r4 - rl~an); /~ 24-bit write m3b to h1fo to et HV blt ~/
a3m(~rl5 - rl6e"); /- 2~-bit ~rite to ~p to ~ctlvltate of XP~ ~/
for 1: tmpct~+ < bite~
rit~ data to b~lf th~ o~ ~/
~ aam~r21 - ~r3"): /- r21 - ~ blta of data ~/
i 2am~n3~nop~); t- pld roga. ~r~ 3 ~ 3tAte ~/
a~ml~rl - r21~ mo~o dat~ to output ifo ~/
~: ~3~nr3 ~ r3~ incr. ptr to noxt 8 bit ~Qmory ~/
a~m~r21 ~ ~3~)s /~ r21 ~ laat byt~ o~ daea ~/
a3m(n-rl ~ ~3en); /~ 2~ bit ~rit~ eo ofifo to Jet eom blt /
) ~tof - O;
IN~ERRUPS YECTOR TA~LE -~
/- list of goto3 o~ t~e de~ault interrupt bandl~r~ - t~e order i~ mandatory! ~/
/- The follo~ing canno~ be alte~ed by applic~tion programm~a ~/
/~ ilident - INSREQ2 lnterrupt lden~ c~tion proqru~
t~ i2ident - INTREQl ~nte~upt idontific~tlon progr -- ----------------______________________ ~
t~ ------------------------- ex~orn~l inte~rupt 1 ^/
asm~S goto ilident~
a Sm S nopn ) -t- ------- ----------------- PIO bufer ~ull ~/
asm~goto pio_ull~);
a~mS"nopnj:
~ ------- PIO bu~er empty ~/
asmt"goeo pio_empty~)~
asm~"nopn):
/- ------------- ----------- SIO input buffer ull ~/
asm( go~o 910_~ull~);
asmS"nop");
/- ------------------------- SIO output buffer emp~y ^/
a~m~ngoto sio emptyn);
a i~m ( nnop" );
/~ -------- ---------------- ext~rn~l interrupt 2 IIJ
asmS~goto 121dent~)s a sm S ~nop" t /~ ------------------------- d~p32C re~e~ved 1 /
as~l"goto d~p3r~
asms~nopu) /~ ~~~~~~~=~-----------.----- d~p32C re~rved 2 ~t asm("goeo dsp3r~2")J

~ 3 ~ 2 ~ 2 c~ q_ ~ nopn );
------- ond ~N~PRUP~ V~C~OR ~ABL~ ------- ~/
pio ~ull O
: a~m(~lreturna~:
. p o ~m~ty~) a~ml"ireturnw);
, I .
sio full~
.` aamlnireturn"~;
io emptyl) .. ~ .
a~ml"ireturn"~;
hould never get to d~p3re~1 or dap3re~2 - tho9e will b~ ~atal error~ ~/
/- and a~signed error codea to ~enerate (in addition to ireturn~ ~/
. dsp3re ll) -~ aamt~ireturn~):
h dap3re~2~
; a~ml~ireturn~:
, .

. .

` .:
. .
~, }~

Claims

1. A process for synthesizing a desired node interconnection topology in an assembly of processing elements under control of a Host, in which each element comprises plural signal communication ports, said elements are physically arrayed in X-Y matrices on one or more mounting means, each element of each said matrix other than elements located at the matrix corners is interconnected to four neighbor PEs, each corner element is connected to its respective two neighbor elements and has two external connection paths, and said assembly comprises means for effecting signal routing within each element between its said processing capability and any of said plural ports, said process comprising:
defining in said Host a desired processor element intra-board port interconnection topology for each board, testing under Host control said elements for faults, determining in said Host alternate processor element port interconnections which route signals around elements identified as faulted; and reconfigurating selected ones of said ports based on said alternate processor element port interconnections.

2. The process of claim 1, wherein said desired topology is a tree, and said process comprises the further step of:
determining in said Host the minimum depth of tree leaves in said matrix, using the available unfaulted processor elements and communications paths.

3. Apparatus for expanding a tree multiprocessor topology while maintaining a constant number of root connection paths to said topology, and a constant number of expansion nodes, comprising:
first and second arrays of substantially identical processor elements, each element having plural ports, means for selectively connecting ports of adjacent ones of all but two of said elements in each said array, to form in each array a two-root subtree of processor elements, said two elements not used in said subtrees each comprising three-port expansion nodes, said two roots and said three-port expansion nodes thereby furnishing eight connection paths to each said array, means for connecting the subtree and a first expansion node in said first array to the corresponding parts of said second array, thereby forming a further two-root subtree, the second said expansion node of each said array being available to replace elements in its respective array, and said two roots of said further subtree and said last-named nodes thereby comprising a total of eight connection paths to the combined assemblages of said first and second processor element arrays.

4. Apparatus in accordance with claim 3, where each said processor element comprises four ports.

5. Apparatus in accordance with claim 4, wherein said first and second arrays ofprocessor elements comprise elements disposed in an X-Y matrix, with respective processor elements connected to their immediate neighbors.

6. In a system for performing concurrent computational processes in a controlledassembly of processing elements interconnected as processing nodes, means for embedding a desired topology of nodes into a fixed lattice, comprising:
a remote command Host, a plurality of processor elements arrayed in one or more matrices of nodes, each said element having plural exterior ports accessing the processing capability of said element, and means for effecting signal routing within each said processor element between its said processing capability and any of said plural ports, and for blocking signal routing at selected ports, means for connecting selected ports of the elements in each said matrix to selected ports of neighbor elements, and for connecting selected ports of designated elements either to selected element ports in a further matrix of processor elements or to said Host, and means in said Host for conditioning said element ports to direct signals to and from only selected ones each element's neighboring processor elements, said conditioning means achieving a desired interconnection topology for the nodes of said system.

7. In a system for performing concurrent computational processes in an assembly of processing elements fixedly interconnected as processing node, means for synthesizing a desired node interconnecting topology comprising:
a remote command Host, a plurality of processor elements arrayed in one or more rectangular matrices of nodes, each said element having for exterior ports accessing the processing capability of said element, means for defining a desired node interconnection topology, means for detecting inoperative processor elements, means for determining a processor element port-to-port connection arrangement for the given said assembly which maximizes use of processor elements found to be operating, and means in each said element under control of said Host for enabling signal routing within said operating processor elements between their said processing capability and any of their said four ports, and for blocking signal routing at selected ports.

8. A system in accordance with claim 7, wherein said desired topology is a tree, and said system further comprises:
means for modifying said port-to-port connection arrangement to minimize tree depth.

9. A system in accordance with claim 8, wherein said enabling means further comprises:
means responsive to indicia of the location in said matrices of detected inoperative processor elements for reconfiguring said port-to-port connection arrangement to utilize only operating ones of said elements.

10. A system pursuant to claim 9, further comprising:
means for connecting selected ports of said elements to selected processor element ports in a further matrix of processor elements or to said Host.

11. A process pursuant to claim 2, comprising the further step of:
operating said enabled ports to serve as parent/child or child/parent paths.

12. A system in accordance with claim 9, further comprising:
means for orienting selected ones of said ports of each said element, to serve as parent/child or child/parent paths.