DE4335690A1

DE4335690A1 - Architecture of and method for configuring a parallel computer

Info

Publication number: DE4335690A1
Application number: DE4335690A
Authority: DE
Inventors: John Simpson Mccaskill
Original assignee: Max Planck Gesellschaft zur Foerderung der Wissenschaften eV
Current assignee: Max Planck Gesellschaft zur Foerderung der Wissenschaften eV
Priority date: 1993-01-28
Filing date: 1993-10-20
Publication date: 1994-08-04

Description

Die Erfindung betrifft eine Architektur für einen und ein Verfahren zum Konfigurieren eines Parallelrechners.The invention relates to an architecture for and a method for configuring a parallel computer.

Die vorliegende Patentanmeldung beschreibt eine Parallelrechnerarchitektur und ein Verfahren zum Bau und zur Neukonfiguration eines den Problemen eines vor bestimmten Problemkreises gewidmeten Parallelrechners für die Berechnung der Probleme dieses Problemkreises, wobei diese Probleme aus einer Vielzahl gekoppelter diskretisierbarer Vorgänge bestehen, die innerhalb des Problemkreises unterschiedlich sein können. Die beson dere Rechnerarchitektur erlaubt es dem Endbenutzer, mit Hilfe eines neuen Verfahrens, wiederholt selbst ein zelne diskrete Vorgänge in einem kleinen digitalen Schaltkreis zu bestimmen und danach den gesamten Rech ner mit diesen gekoppelten Vorgängen zu konfigurieren, und zwar mit einem Aufwand ähnlich demjenigen bei der Kompilation. Weil das Problem dann direkt parallel "in Hardware" berechnet wird, ohne die Zwischenstufen eines Befehlssatz, können Probleme bis zu einen Millionenfach schneller als die schnellsten heutzutage erhältlichen Workstations berechnet werden. Zur Konfiguration werden kommerziell erhältliche Chips benutzt, was es dem End verbraucher erlaubt, einen an seine Probleme abgestimm ten leistungsstarken Parallelrechner zu bestimmen.The present patent application describes one Parallel computer architecture and a method of construction and to reconfigure one of the problems before parallel computer dedicated to specific problem areas for the calculation of the problems of this problem area, taking these problems from a variety of coupled discretisable processes exist within the Problem area can be different. The particular their computer architecture allows the end user to Help a new procedure, repeated yourself individual discrete processes in a small digital Circuit and then the entire calculation to configure with these linked processes, with an effort similar to that at Compilation. Because the problem is then directly parallel "in Hardware "is calculated without the intermediate stages of a Instruction set, problems can be up to a million times faster than the fastest available today Workstations are calculated. For configuration Commercially available chips use whatever the end consumer allows one to be tailored to his problems to determine the most powerful parallel computer.

Es werden eine neuartige Computerarchitektur und ein Prototyp eines Hardware-programmierbaren parallelen (Genetik-)Prozessors zur Untersuchung von interaktiver molekularer Evolution vorgeschlagen, der derart ausge bildet ist, daß er die Simulation einer interaktiven molekularen genetischen Verarbeitung ermöglicht. Der Genetik-Prozessor erreicht die 10⁹-Generationengrenze im Zeitraum von mehreren Rechnerstunden bei einer Popu lation von 10⁴ interagierenden Sequenzen variabler Länge mit einer Durchschnittslänge von 64 Bits. Der Computer ist hardware-programmierbar, wobei eine Anord nung von Field-Programable-Gate-Array-Chips (FPGA) und die mit diesen verbundenen speziellen Eigenschaften der verteilten Speicher verwendet werden. Insbesondere las sen sich die Bausteine der 4000-Familie solcher von der Firma XILINX verwenden. Die Verschiebung bei Simulatio nen von höheren Computersprachen zur Hardware-Ebene der Computer ist aufgrund der konzeptionellen und praktischen Vorteile vorteilhaft.A novel computer architecture and a prototype of a hardware-programmable parallel (genetics) processor for the investigation of interactive molecular evolution are proposed, which is designed in such a way that it enables the simulation of an interactive molecular genetic processing. The genetics processor reaches the 10 ⁹ generation limit over a period of several computer hours with a population of 10 ⁴ interacting sequences of variable length with an average length of 64 bits. The computer is hardware programmable, using an arrangement of field programmable gate array chips (FPGA) and the associated special properties of the distributed memory. In particular, the building blocks of the 4000 family can be used from XILINX. The shift in simulations from higher computer languages to the hardware level of the computer is advantageous due to the conceptual and practical advantages.

Problem

Viele feinkörnige parallelisierbare Probleme der heuti gen Wissenschaft, Medizin und Wirtschaft können mit der vorhandenen Kapazität der jetzigen Rechner nicht bewäl tigt werden. Ein typisches Beispiel ist die kombinato rische Optimierung, wie sie z. B. auf dem Gebiet der evolutionären Biotechnologie für Proteine gewünscht wird. Ein zweites Beispiel ist das Simulieren von Viel körpersystemen, wie bei der Untersuchung von Stauver halten im Verkehrssystem. Die konventionelle serielle von-Neumann-Maschine, wie sie in vielen modernen Rech nern wie RISC Workstations realisiert wird, ist ledig lich in der Lage, sequentiell mit Hilfe eines univer sellen Befehlssatzes solche Probleme zu emulieren. Diese Emulation ist begrenzt durch die CPU Leistung des Prozessors, die wiederum von der endlichen Verpackungs dichte (z. B. 1µ Maskenabstände) auf Silikon Chips be grenzt ist. Es ist auch ineffizient, weil der univer selle Befehlssatz von diesen Maschinen nicht optimal an jedes Anwendungsproblem angepaßt werden kann.Many fine-grained parallelizable problems of today science, medicine and business can be achieved with the existing capacity of the current computers not overwhelm be done. A typical example is the combinato rischer optimization, as z. B. in the field of evolutionary biotechnology desired for proteins becomes. A second example is simulating a lot body systems, such as the investigation of congestion stop in the transport system. The conventional serial von Neumann machine, as used in many modern computations The way in which RISC workstations are implemented is unimportant capable of sequentially using a univer set of commands to emulate such problems. This emulation is limited by the CPU performance of the Processor, in turn, from the finite packaging density (e.g. 1µ mask spacing) on silicon chips is bordered. It is also inefficient because of the univer command set from these machines is not optimal any application problem can be customized.

Die Schwierigkeit bei der Verwendung von Parallelrech nern liegt in der Anpassung der Architektur und den Grundbefehlen des spezifischen Problems an die des Rechners: verschiedene Probleme erfordern verschiedene Parallelrechner. Es kommt hinzu, daß selbst die Kopp lung mehrerer von-Neumann-Maschinen im Bereich der Synchronisierung und Datenverwaltung Probleme mit sich bringt, die sowohl bei der Herstellung von Rechnern als auch bei der Programmierung beachtlich sind. Hier wird ein Verfahren beschrieben, das es erlaubt, mit vertret barem Aufwand einen "gewidmeten" Parallelrechner für jeden dieser Problemkreise zu bauen. Was hier angeboten wird, ist ein neues Verfahren zur Computerherstellung, eine neue Computerarchitektur und ein neues Verfahren zur Computerprogrammierung. Die patentgemäße Aufgaben stellung besteht in folgendem:The difficulty in using parallel rake lies in adapting the architecture and the Basic commands of the specific problem to that of the Calculator: different problems require different ones Parallel computer. In addition, even the Kopp development of several von Neumann machines in the area of Synchronization and data management problems with it brings that both in the manufacture of computers as are also remarkable when programming. Here will described a process that allows to represent a "dedicated" parallel computer for to build each of these problem areas. What is offered here is a new way of making computers, a new computer architecture and a new process for computer programming. The patented tasks position consists of the following:

1. To design a family of parallel computers from End users in hardware to their own problem circle can be adjusted.
2. A method for producing such an fit computer without time-consuming chip design, d. H. with chips that are already commercially available, and without difficult synchronization tasks.
3. A method of reconfiguring such Calculator. Programming the calculator for a specific problem is said to be directly at the level of Configuring the hardware and not by means of machine language in software.

State of the art 1. Available parallel computer architectures

Die folgende Diskussion wird auf feinkörnige Parallel rechner beschränkt, da nur bei diesen genügend Prozes soren für große Leistungsgewinne im Vergleich zu den seriellen Rechnern zur Verfügung stehen. So entfallen eine ganze Reihe von Rechnern wie Transputernetze, der Intel Hypercube, der Kendall Square (Virtual Shared Memory) Rechner usw. Während solche Rechner, mit bis zu 1000 Prozessoren, Zeitgewinne bis zu etwa einem Faktor 100 erzielen können, kommen sie nicht darüber hinaus. Entwicklungszeiten und Kosten sind enorm hoch. Es blei ben drei feinkörnige Architekturen von Rechnern, die am ehesten mit dem Erfindungsgegenstand vergleichbar sind.The following discussion is based on fine-grained parallel computer limited, because only with these enough processes sensors for large performance gains compared to the serial computers are available. So do not apply a whole host of computers like transputer networks, the Intel Hypercube, Kendall Square (Virtual Shared Memory) calculator etc. While such calculator, with up to 1000 processors, time savings up to about a factor Can achieve 100, they do not get beyond that. Development times and costs are extremely high. It remains ben three fine-grained architectures of computers, which on are most comparable to the subject of the invention.

a) Connection Machines (Thinking Machines Inc.)
Originally developed as SIMD (Single Instruction Multiple Data) machines, the connection machines allow synchronous processing of up to 65,000 data carriers. A MIMD (Multiple Instruction Multiple Data) connection machine has also recently come onto the market. In both cases and in contrast to the architecture to be patented here, the architecture is based on a classic set of instructions. Although the connection machines are particularly well suited for some fine-grained arithmetic calculations, their architecture cannot be influenced by the user.
b) Cellular Automata
A cellular automaton is a collection of identical automatons, each with a finite number of states, which are regularly linked to their neighbors on a grid. A relatively general two-dimensional cellular automaton that can be programmed directly in the hardware could also be built here using the design process. The direct neighborhood is generalized in the new architecture through connecting channels with slipping possibilities. This, and with the help of locally distributed RAM, enables the processing of character strings. This is described in more detail in the basic document.
c) Furthermore, configurable cell arrangements are be knows how z. B. described in W090 / 11648. Each Cell arrangement is housed on a chip and several such chips are in regular (Matrix) form arranged and connected to each other the. Even the cell layout within the one individual chips are matrix-shaped. Such an arrangement configurable chips are used to e.g. B. solve user-related problems. Each Cell, any group of cells or any chip serves to solve different parts problems of the task to be calculated.

2. Method of manufacturing the parallel computer

Chip Design ist noch mit erheblichem Aufwand verbunden. Ein Produktions-Test-Modifikations-Zyklus dauert in der Regel mehrere Monate. Programmierbare Bausteine verkür zen zwar diesen Zyklus, sind aber wegen ihrer geringe ren Verpackungsdichte und der bis jetzt starren Archi tektur der Rechner noch nicht für den ganzen Prozessor bau verwendet worden. In dem hier zu patentierenden Verfahren wird dagegen das Chip-Design-Problem für Parallelrechner modular gelöst. Eine über die Chip grenze hinaus modulare Architektur beschleunigt den Designprozeß und ist so flexibel, daß Programmierer selbst innerhalb dieses Rahmens in wenigen Minuten ein für das Problem maßgeschneidertes Design implementieren können. Programmierbare Bausteine, die Festlegung eines modularen Designprinzips und die Erzeugung von problem spezifischen Chip-Designs mit Hilfe einer konventionel len Programmiersprache ermöglichen den neuen Weg.Chip design is still associated with considerable effort. A production test modification cycle lasts in the Usually several months. Programmable blocks shorten zen this cycle, but are due to their low packaging density and the still rigid archi architecture of the computer not yet for the entire processor construction has been used. In the patentable here The process, on the other hand, becomes the chip design problem for Parallel computer modularly solved. One over the chip Beyond limits, modular architecture accelerates the Design process and is so flexible that programmers within a few minutes implement custom design for the problem can. Programmable blocks that define a modular design principle and the generation of problem specific chip designs using a conventional len programming language enable the new way.

3. Reconfiguration procedure

Ein Verfahren zur Neukonfigurierung von FPGAs wird z. B. von der Firma XILINX angeboten. Mit einem Blockschalt bildeditor werden Änderungen im Design lokal und inter aktiv vorgenommen. Automatische Partition, Place and Route (PPR) Programme bestimmen das Einbetten der Logik auf dem FPGA. Dieses Verfahren hat jedoch zwei Nach teile, die es zu vermeiden gilt. Für Parallelrechner, wo das Blockschaltbild eine sich wiederholende Struktur aufweist, erfordern die sich wiederholenden Editier operationen die Kraft einer Programmiersprache, um effizient parallele Änderungen durchzuführen. A method for reconfiguring FPGAs is e.g. B. offered by XILINX. With a block switch image editor, changes in design locally and internally actively made. Automatic partition, place and Route (PPR) programs determine the embedding of the logic on the FPGA. However, this procedure has two consequences parts to avoid. For parallel computers, where the block diagram is a repeating structure repetitive editing is required operations the power of a programming language to make efficient parallel changes.

Die gegenwärtige Forschung auf dem Gebiet der auf Gene tik basierenden funktionalen Selbstorganisation erfor dert einen erheblichen Datenverarbeitungsaufwand, um Populationen von interagierenden kettenkodierten (string-encoded) Automaten zu simulieren. Während die unabhängige molekulare Selektion als ein logisches Ske lett wohl verstanden wird und dies insbesondere die Entwicklung evolutionärer Optimierungsalgorithmen er möglicht hat, sind die Konsequenzen und Möglichkeiten, die sich durch interagierende Funktionalität für höhere Organisation ergeben, bisher weitgehend unver standen. Es wurden zahlreiche Versuche, die über eine einfache Duplikation herausgingen, unternommen, um die essentielle Logik der funktionalen Interaktion zu er kennen, wobei Populationen molekularer Automaten ver wendet wurden. Hier soll sowohl auf praktische als auch auf konzeptionelle Gründe für ein Abweichen von der auf diesem Gebiet herkömmlichen Software-Ausbildung und für einen Übergang zu in der Hardware realisierten Kon figurationen mit vom Anwender programmierbaren Verknüp fungsfeldern (field programmable gate arrays = FPGA) hingewiesen werden. Dies erfordert ein Umdenken in der Design-Konzeption und der Methodik, birgt jedoch auch die Aussicht auf Verbesserungen der Geschwindigkeit auf das 10⁶-fache bei Populationen von bis zu 10 000 Ketten, so daß ein Teil der Flexibilität der Software-Ausbil dung erhalten bleibt und konzeptionelle Vereinfachungen im Design ermöglicht werden. Ein einfaches Prototyp- Design soll im folgenden beschrieben werden, um die Nützlichkeit dieses Ansatzes und die Art der durch ihn motivierten konzeptionellen Veränderungen darzustellen.Current research in the field of genetically based functional self-organization requires a considerable amount of data processing in order to simulate populations of interacting chain-encoded (string-encoded) automatons. While the independent molecular selection is well understood as a logical skeleton and this has made it possible, in particular, to develop evolutionary optimization algorithms, the consequences and possibilities that result from interacting functionality for higher organizations have so far largely been unexplored. Numerous attempts have been made, using simple duplication, to understand the essential logic of functional interaction using populations of molecular automata. Both practical and conceptual reasons for a deviation from the software training conventional in this field and for a transition to configurations implemented in the hardware with user-programmable linking fields (field programmable gate arrays = FPGA) should be pointed out here. This requires a rethink in design conception and methodology, but also harbors the prospect of improvements in speed by 10 ⁶ times in populations of up to 10,000 chains, so that part of the flexibility of software training is retained and conceptual simplifications in design are made possible. A simple prototype design will be described below to illustrate the usefulness of this approach and the nature of the conceptual changes it motivates.

Die einfachsten evolutiven Modelle befaßten sich mit homogenen, gut gemischten Populationen von konstanter Größe, wie sie chemisch in einem Durchfluß-Mischreaktor (continuous stirred flow reactor = CSTR) erreichbar sind. Die Selektion wird als Ergebnis der proportiona len Verdünnung zum Ausgleich des molekularen Produk tionsüberschusses erreicht. In einem weiteren theore tischen Modell wurde versucht, die zur Evolution erfor derlichen Elemente auf das lokal physikalisch Wesent liche zu reduzieren und die funktionalen Effekte der molekularen Interaktion einzubringen. Es erfolgte eine sukzessive Relativierung der Rolle des Zufalls als der Quelle der Vielfalt, auf die die Selektion in der Evo lution einwirken kann, sowie eine Neubewertung der Be deutung von räumlichen Effekten in der Evolution. Beide genannten Entwicklungen sind erforderlich, bevor ein Schritt hin zur FPGA-Technologie glaubwürdig sein kann.The simplest evolutionary models dealt with homogeneous, well mixed populations of constant Size as chemically in a flow mixing reactor (continuous stirred flow reactor = CSTR) are. The selection is the result of the proportional len dilution to balance the molecular product tion surplus reached. In another theory tical model was attempted, the one needed for evolution elements to the local physical essentials liche and reduce the functional effects of to bring molecular interaction. There was one successive relativization of the role of chance as that Source of diversity on which the selection in the Evo lution, as well as a reassessment of the Be interpretation of spatial effects in evolution. Both developments mentioned are required before a Step towards FPGA technology can be credible.

In einem nicht-interagierenden darwinistischen Evolu tionsmodell gibt es zwei Gelegenheiten, bei denen normalerweise der Zufall eine Rolle spielt: die Muta tion an ausgewählten Stellen in den molekularen Sequen zen und bei der Auswahl von Molekülen, die in der Popu lation weiter bestehen sollen. Bei interagierenden Modellen existiert ferner die zufällige Auswahl der aufeinanderfolgenden Interaktions-Partner (ein bimole kularer Prozeß, der im allgemeinen ausreicht, um bio chemische Reaktionen zu beschreiben). Im letzteren Fall hat es sich als möglich erwiesen, bei kettenkodierter Funktionalität, Variationen implizit durch die unter schiedlichen Prozesse der bimolekularen Interaktion zu erzielen. Ein dynamischer Mischprozeß in einem offenen Gefäß ist als physikalische Basis für die Auswahl von Molekülen und deren Interaktionen ausreichend. In der Physik ist heute allgemein bekannt, daß einfache deter ministische nicht-lineare Systeme chaotische Dynamiken erzeugen können, und es würde insbesondere ein Hart kugelmodell ausreichen, um eine lokale und selektive Randomisierung von angebrachten Ketten darzustellen. Es würde daher eine rein lokale deterministische dyna mische Regel zur Beschreibung des evolutionären Vor gangs genügen. Dies hätte ferner zur Folge, daß räum lich heterogene Phänomene erstellt werden könnten. Tat sächlich wurde für eine breite Gruppe von homogenen, sich entwickelnden, replizierenden Populationen festge stellt, daß sämtliche kooperativen funktionalen Inter aktionen sich gegenüber parasitäreren Ausbeuten als instabil herausgestellt haben, und die Frage der räum lichen Heterogenität ist daher für jegliche Modell untersuchung kritisch.In a non-interacting Darwinian Evolu there are two occasions when usually chance plays a role: the muta tion at selected locations in the molecular sequences zen and in the selection of molecules in the Popu lation should continue to exist. When interacting There is also a random selection of models successive interaction partner (a bimole kular process that is generally sufficient to bio describe chemical reactions). In the latter case it has proven to be possible with chain-coded Functionality, implicit variations through the below different processes of bimolecular interaction achieve. A dynamic mixing process in an open Vessel is the physical basis for the selection of Molecules and their interactions are sufficient. In the Physics is well known today that simple deter minimalistic non-linear systems chaotic dynamics can produce, and it would in particular be a hard spherical model sufficient to be a local and selective To represent randomization of attached chains. It would therefore be a purely local deterministic dyna mix rule to describe the evolutionary pre gangs are enough. This would also have the consequence that space heterogeneous phenomena could be created. Did for a wide group of homogeneous, developing, replicating populations states that all cooperative functional inter actions towards more parasitic yields than have highlighted unstable, and the issue of space heterogeneity is therefore for any model investigation critical.

Die bisherigen Modellerstellungsarbeiten haben zu einer Vielzahl kleiner, gruppenweise interagierender Automa ten in Computersprachen wie C geführt. Die miteinander verbundenen Erfordernisse bezüglich geeigneter Popula tionsgrößen (<1000) und Generationszahlen (<1000) zur Verdeutlichung von generischem Verhalten machen solche Simulationen in herkömmlichem Computerequipment zeit aufwendig. Üblicherweise benötigt die Simulation von 1000 Ketten für 1000 Generationen bei einer RISC- Arbeitsstation mehrere Stunden Rechnerzeit. Es besteht hier eine geringe Effizienz in der Verwendung von Computerbetriebsmitteln. Wie zuvor erörtert, scheint es, daß Gleitpunktoperationen zur Simulation der Evolu tion nicht unbedingt erforderlich sind. Zweitens sind die Prozessoren für segmentierte Operationen, die Byte- oder Wortbegrenzungen beinhalten, optimiert, nicht je doch für Ketten variabler Länge. Drittens bestehen softwarebedingte und finanzielle Schranken für die effektive Nutzung paralleler Maschinen. Viertens sind die höheren Programmiersprachen sowohl für serielle als auch für parallele Maschinen dahingehend einge schränkt, daß sie eine größere Breite an Interaktions schemata zulassen als für eine genetische Verarbeitung relevant sein können.The previous model creation work has become one Large number of small automata interacting in groups in computer languages such as C. The one with the other related requirements regarding appropriate popula tion sizes (<1000) and generation numbers (<1000) for Clarify generic behavior Simulations in conventional computer equipment time complex. Usually the simulation of 1000 chains for 1000 generations with one RISC Workstation several hours of computing time. It exists here a low efficiency in the use of Computer resources. As discussed earlier, seems that floating point operations to simulate Evolu tion are not absolutely necessary. Second are the processors for segmented operations, the byte or include word limits, optimized, not ever but for chains of variable length. Third exist software and financial barriers for effective use of parallel machines. Fourth are the higher programming languages for both serial as well as for parallel machines limits that they have a wider range of interaction Allow schemes as for genetic processing may be relevant.

Der Erfindung liegt die Aufgabe zugrunde, eine Vorrich tung zur Berechnung feinkörniger parallelisierbarer Probleme zu schaffen und ein Verfahren zum Konfigurie ren einer derartigen Vorrichtung anzugeben, bei der der Benutzer die Vorrichtung innerhalb einer kurzen Ent wicklungszeit für das zu lösende Problem konfigurieren kann und die Berechnung des Problems mit der derartig konfigurierten Vorrichtung extrem geringe Rechenzeit benötigt.The invention has for its object a Vorrich device for calculating fine-grained parallelisable Create problems and a procedure for configuration ren specify such a device in which the User the device within a short ent Configure the development time for the problem to be solved can and calculating the problem with such configured device extremely low computing time needed.

Zur Lösung dieser Aufgabe werden mit der Erfindung eine Vorrichtung mit den Merkmalen der Ansprüche 1 bzw. 2 sowie ein Verfahren mit den Verfahrensschritten des Anspruchs 3 vorgeschlagen. Die Merkmale vorteilhafter Ausgestaltungen ergeben sich aus den Unteransprüchen. Bevorzugte Verwendungen sind Gegenstand der Verwen dungsansprüche.To solve this problem with the invention 2. Device with the features of claims 1 and 2 respectively and a process with the process steps of Claim 3 proposed. The characteristics more advantageous Refinements result from the subclaims. Preferred uses are the subject of the uses claims.

Sinngemäß macht die Erfindung von der Erkenntnis Ge brauch, einen Rechner hardwaremäßig derart teil- oder vorzukonfigurieren, daß der Anwender schnell und ein fach den Rechner für ein bestimmtes Problem einer An zahl von Problemen (Problemkreis) optimal konfigurieren kann. Der Rechner ist also hardwaremäßig ausgelegt, um sämtliche Probleme eines vorgegebenen Problemkreises lösen zu können. Für die mit dem Rechner bzw. der Vor richtung zu lösenden Probleme gilt, daß sie feinkörnig und parallelisierbar sind sowie als eine Vielzahl mit einander verkoppelter diskretisierbarer Prozesse be schreibbar sind. The invention makes sense of the knowledge Ge need a hardware part or so preconfigure that the user quickly and a fold the calculator for a particular problem of a person Configure the number of problems (problem area) optimally can. The computer is therefore designed in terms of hardware to all problems of a given problem area to be able to solve. For those with the computer or the front Direction to solve problems is that they are fine-grained and are parallelizable as well as a variety of interlinked discretisable processes are writable.

Vorzugsweise handelt es sich bei den einzelnen Logik schaltungen um sogenannter anwenderspezifische Logik felder (Field-Programable-Gate-Arrays - FPGA), die aus in Arrayform angeordneten einzelnen Logikzellen be stehen. Diese Logikzellen sind frei programmierbar, womit das gesamte Array frei konfigurierbar ist. Mehrere Logikzellen werden zu einzelnen Logikschal tungsblöcken, den sogenannten konfigurierbaren Logik blöcken oder CLB (Configurable Logic Block), zusammen gefaßt.The individual logic is preferably involved circuits around so-called user-specific logic Fields (Field-Programmable-Gate-Arrays - FPGA) that are made up individual logic cells arranged in array form stand. These logic cells are freely programmable, with which the entire array is freely configurable. Multiple logic cells become single logic scarves blocks, the so-called configurable logic blocks or CLB (Configurable Logic Block), together composed.

Desweiteren ist erfindungsgemäß die synchrone Taktung sämtlicher Eingänge sämtlicher Logikzellen sämtlicher Logikschaltungsblöcke vorgesehen. Damit werden die bei großen Prozessornetzen mitunter auftretenden Timing- Probleme gelöst. Die FPGA verfügen über interne Verbin dungsstrukturen, die es erlauben, daß das an den Ein gang des FPGA angelegte Taktsignal zeitgleich an sämt lichen Logikzellen anliegt.Furthermore, the synchronous clocking is according to the invention all inputs all logic cells all Logic circuit blocks provided. So that at large processor networks sometimes occurring timing Problem solved. The FPGA have an internal connection structures that allow it to be clock signal applied to the FPGA at the same time on all logic cells.

Ein weiteres Merkmal der Erfindung besteht darin, daß die Verbindungsmuster zwischen benachbarten CLB einer Logikschaltung (eines FPGA) regelmäßig und untereinan der gleich sind. Werden mehrere FPGA verwendet, so setzt sich das benachbarte CLB verbindende Verbindungs muster über die Grenzen der FPGA fort. D.h., daß CLB benachbarter FPGA genauso verbunden sind, wie die CLB ein und desselben FPGA.Another feature of the invention is that the connection pattern between neighboring CLB one Logic circuit (of an FPGA) regularly and with each other that are the same. If several FPGAs are used, so the neighboring CLB connecting connection is set pattern beyond the limits of the FPGA. That is, CLB neighboring FPGA are connected in the same way as the CLB one and the same FPGA.

Die Erfindung basiert unter anderem auf der Erkenntnis, verteilte Dekorrelation in vom Benutzer konfigurier baren Logikschaltungen (nämlich den FPGA′s) für die Großsimulation von Prozessen zu verwenden, die Zufalls elemente erfordern. Verteilte Dekorrelation meint die Verwendung nicht korrelierter Bits von funktional ge trennten aber benachbarten Digital-Schaltungen auf einem Chip zur Erzeugung quasi-zufälliger Signale an jeder Stelle, um damit stochastische Prozesse zu simu lieren. Da diese Signale rekonfigurierbar sind, kann ihre Auswahl zwecks Ausräumung statistischer Vorgaben (statistical bias) optimiert werden.The invention is based, among other things, on the knowledge that distributed decorrelation in user configured ble logic circuits (namely the FPGA's) for the Large-scale simulation of processes to use the random require elements. Distributed decorrelation means that Use of uncorrelated bits of functionally ge but separated adjacent digital circuits a chip for generating quasi-random signals any place to simulate stochastic processes lieren. Since these signals are reconfigurable, their selection for the purpose of eliminating statistical requirements (statistical bias) can be optimized.

Ferner werden erfindungsgemäß verteilte RAM-Speicher mit paralleler Adressierung als Schieberegister zwischen Prozessoren eingesetzt, um die Simulation größerer Ansammlungen von Objekten zu ermöglichen. Die Verwendung kürzerer RAM′s mit heterogener Adressierung erlaubt lokale Einfügungen und Streichungen bei der String-Verarbeitung. Die Verwendung fest gekoppelter synchroner Ein-Ausgangsdatenströme ermöglicht es, ein "once-off-Design-Prinzip" bei allen anwendungsspezi fischen Verwendungen von Speichern einzusetzen. Die Speicher können auf den FPGA′s selbst verteilt sein und/oder auf dazu bestimmten Speicherchips zwischen den FPGA′s verteilt sein.Furthermore, RAM memories distributed according to the invention with parallel addressing as a shift register between processors used to do the simulation to enable larger collections of objects. The Use of shorter RAM's with heterogeneous addressing allows local insertions and deletions at the String processing. The use of tightly coupled synchronous input-output data streams enables one "once-off design principle" for all application-specific fishing uses of storage. The Memory can be distributed on the FPGA's itself and / or on dedicated memory chips between the FPGA's be distributed.

Schließlich ist auch das Verbinden aller Daten- und Steuerbits in Schieberegistern zum (busbreiten) seriel len nicht-destruktiven Auslesen des augenblicklichen Zustandes einer Berechnung oder Simulation möglich. Der Datenfluß kann auf einen einfachen regulären Verschie bungsprozeß mit Defold-Voreinstellung der Prozessoren zum Auslesen reduziert werden. Der Zustand der Prozes soren selbst, was lediglich einige wenige Bits erfor dert, kann in lokalen Registern festgehalten werden, und zwar vor dem Auslesen, um für spätere Speicherungen oder Einbindungen in Flipflop-Ketten zwecks Auslesens und Wiederspeicherns zur Verfügung zu stehen. Diese leistungsfähigen Eigenschaften des Auslesens werden durch die Designregeln der synchronen Eingangs -Aus gangskanäle möglich.Finally, connecting all data and Control bits in shift registers for the (bus-wide) serial len non-destructive reading of the current Calculation or simulation possible. Of the Data flow can be done in a simple regular way Practice process with defold pre-setting of the processors be reduced for reading. The state of the process sensors themselves, which only requires a few bits can be recorded in local registers, and before reading out, for later storage or integrations in flip-flop chains for reading and restoring are available. This powerful properties of the readout through the design rules of synchronous input-off aisle channels possible.

Schließlich läßt sich auch die Erfindung zum lokalen Rerouting im Designerstellungsprozeß für modulare Designs zwecks drastischer Erhöhung der Designimplemen tationszyklen bei gleicher Zeitdauer einer konventio nellen Software (Editieren, Compilieren, Verbinden, Laufen) einsetzen.Finally, the invention can also be local Rerouting in the design process for modular Designs to drastically increase the design implemen tion cycles with the same duration of a convention software (editing, compiling, connecting, Running).

Mit der Erfindung wird der Interaktions-Engpaß ver mieden, indem ein neuartiges Hardwaredesign mit sich kontinuierlich schnell bewegenden Daten und rein loka ler Interaktion verwendet wird. Es werden zur Illustra tion ein zweidimensionaler Raum sowie binäre Ketten verwendet, bei denen die genetischen Sequenzen auf natürliche Weise sowohl eine nahezu zufällige lokale mischende, als auch eine nahezu deterministische inter aktive kopierende Funktionalität kodieren, ohne stochastische Prozesse einzubringen. Im Grunde kolli dieren molekulare Ketten an einer Anordnung von binären Schaltpunkten, die deren Durchgang entsprechend dem gegenwärtig nächsten Kettennachbarn steuern.With the invention, the interaction bottleneck is ver avoided by having a new hardware design with it continuously fast moving data and purely local interaction is used. Illustra tion a two-dimensional space as well as binary chains used where the genetic sequences are based on naturally both an almost random local mixing, as well as an almost deterministic inter encode active copying functionality without to introduce stochastic processes. Basically colli molecular chains are attached to an array of binary Switching points, the passage of which corresponds to the currently control nearest chain neighbors.

Nachfolgend werden die erfindungsgemäße Vorrichtung und das erfindungsgemäße Verfahren genauer beschrieben.The device according to the invention and the method according to the invention described in more detail.

1. Computer architecture

Auf eine Instruktionsmenge wird verzichtet. Statt dessen ist der Rechner in eine Vielzahl kleiner Prozessoren unterteilt, die aufgrund der Verwendung von in Hardware programmierbaren Bausteinen selbst völlig neu konfigu rierbar sind. Jeder derartige Prozessor ist über eine begrenzte, aber sonst frei wählbare Anzahl von Bit- breiten Leitungen mit benachbarten Prozessoren verbun den. Die Nachbarschaft ist zunächst die eines zwei dimensionalen Gitters. Abhängig von dem Anwendungs problemkreis sind aber auch andere Geometrien vorge sehen. Jeder solcher Prozessor verarbeitet synchron die Bitströme auf seinen Eingangsleitungen und erzeugt un unterbrochen synchrone Bitströme auf seinen Ausgangs leitungen. Bei jedem Prozessor wird intern jeder Ein gangsbitstrom vor der Weiterverarbeitung in speziellen lokalen Schieberegistern zwischengespeichert, die auch einem "random access" Zugriff erlauben. Die Prozessie rung kann dabei von einer relativ langen Strecke der eingehenden Bitströme abhängen.There is no instruction set. Instead is the calculator in a variety of small processors divided due to the use of in hardware programmable blocks themselves completely reconfigured can be generated. Each such processor is over one limited, but otherwise freely selectable number of bit wide lines connected to neighboring processors the. The neighborhood is initially one of two dimensional grid. Depending on the application other geometries are also problematic see. Each such processor processes the synchronously Bit streams on its input lines and generates un interrupted synchronous bit streams on its output cables. With each processor, every on is internal gangsbitstrom before further processing in special local shift registers cached that too allow "random access" access. The process tion can range from a relatively long distance depend on incoming bit streams.

Die Verbindungsstruktur von Eingangs- und Ausgangs leitungen zwischen Prozessoren ist ebenfalls nicht von vornherein festgelegt, sondern lediglich eine begren zende Obermenge der für den Problemkreis geforderten Verbindungen. Die Verbindungsstruktur zwischen Prozes soren innerhalb eines Chips wird fortgesetzt zwischen Prozessoren, die sich an nahestehenden Seiten unter schiedlicher Chips befinden. Für die Verbindung von Prozessoren, die sich nicht auf demselben Chip befin den, müssen nicht wieder konfigurierbare Leitungen ge legt werden. Die vorhandenen festen Leitungen müßten alle im Problemkreis zu realisierenden Verbindungs muster zwischen Prozessoren erlauben. Dieses ist mög lich durch programmierbare Bausteine, solange eine Obermenge der gewünschten Verbindungen fest verdrahtet ist.The connection structure of input and output lines between processors is also not of fixed in advance, but only one upper superset of those required for the problem area Links. The connection structure between processes sensors within a chip is continued between Processors located on related pages below different chips. For the connection of Processors that are not on the same chip that must not be reconfigurable lines ge be placed. The existing fixed lines would have to all connections to be realized in the problem area Allow patterns between processors. This is possible by programmable modules, as long as one Superset of required connections hard-wired is.

Ein Bruchteil der gesamten Ein- und Ausgänge wird zu sätzlich auf einem (z. B. 32 oder 64 Bit breiten) Daten bus eines seriellen oder nur grobkörnig parallelen Hostrechners geführt. Die Taktfrequenz des DMA (Direct Memory Access) Schreib-Lese Zyklus dieses Hostrechners bestimmt die Frequenz des synchron getakteten Parallel rechners. Eine Taktfrequenz um 25MHz ist mit jetziger Technologie durchaus erreichbar. Eine Kopplung mit Graphikprozessoren ist zur direkten Visualisierung auch möglich.A fraction of the total inputs and outputs becomes additionally on a (e.g. 32 or 64 bit wide) data bus of a serial or only coarse-grained parallel Host computer led. The clock frequency of the DMA (Direct Memory access) read / write cycle of this host computer determines the frequency of the synchronously clocked parallel calculator. A clock frequency around 25MHz is with the current one Technology quite achievable. A coupling with Graphics processors are for direct visualization too possible.

In den für den Prototyp benutzten Field Progamable Gate Arrays (FPGAs) muß die Architektur das Herstellen des Anfangszustandes des verteilten Speichers erlauben. Hier wird eine Defaultschaltung für die Prozessoren mit einem gesonderten Reset-Schalter aktiviert, die die Prozessoren in einfache Schieberegister umwandelt. Der aktuelle Stand des Rechners kann auf gleiche Weise aus gelesen werden. Die Zustände aller interner Flip-flops (Register) können sowohl beim Start einer Berechnung bestimmt werden als auch zwischendurch ausgelesen wer den.In the field programmable gate used for the prototype Arrays (FPGAs) must be used to manufacture the architecture Allow initial state of distributed storage. Here is a default circuit for the processors a separate reset switch that activates the Processors converted into simple shift registers. Of the The current state of the calculator can be viewed in the same way to be read. The states of all internal flip-flops (Register) can both at the start of a calculation be determined as well as read in between the.

Um die Tauglichkeit dieser neuen Architektur zu demon strieren, ist im Basisdokument die genaue Spezifikation eines Ein-Chip Prototyp-Rechners, der für den Evolu tionsbereich gebaut wurde, präzisiert. Pläne für eine Erweiterung dieses Rechners um hundert Chips sind auch im Basisdokument vorhanden. Verschiedene modulare Auf teilungen eines Chips in Module von 2 bis 12 Konfigura tionseinheiten sind auch dort aufgezeichnet, um die allgemeine Gültigkeit der Architektur zu zeigen.To demon the suitability of this new architecture is the exact specification in the base document of a one-chip prototype computer that is used for the Evolu area was built, specified. Plans for one Expansion of this calculator by a hundred chips are also possible present in the basic document. Different modular on divisions of a chip into modules of 2 to 12 configurations tion units are also recorded there in order to to show the general validity of the architecture.

2. Operation to build a predetermined problem circle dedicated to this or its hard would be adapted parallel computer

a) For a problem example, identify the Data streams that the individual processes (processes) couple. These then form a certain number of bit-wide inputs and outputs. One understands different problem examples for a problem area together and examined whether it was for this problem circle an upper limit or superset of Solving the problems required processes exist. If not, you either have to end the process or after simpler processes in the problems or look for smaller problem areas.
b) Determine the geometry of the coupling between the individual processes, are typical examples one-, two- or three-dimensional grid or the Hypercube. If there is no geometry, all of them through those specific problem geometries Deletion of connections as sub-geometries hold, then this procedure can be a single Parallel computer for the problem area not deliver.
c) Design using a block diagram editor one circuit that does the local processes for a certain simpler problem example describes. Let this design (with Using an automatic "Place and Route" pro converts) into an FPGA sub-configuration animals and pack the design in the smallest possible way compact compact form from the figure N des Basic document.
d) Choose one of the optimal packaging for this Processors. Depending on the processor and Chip sizes are between 16 and 1024 processors possible per chip. Count the inputs and outputs, between the different chips necessary are about this processor connection structure to implement beyond the chip limit.
e) Lay conductors on "Printed Circuit Boards" to the superset of those determined in point 2d) Connections for the different problems of the Realize problem area. In addition some clock and control signals to configure who uses chips.
f) A DMA access is on a data bus of the Host computer connected. The selected On gears should load the chip (see above) and the selected outputs for reading the ver shared memory.
g) Write interface software to change problems on commands from the FPGA design editor the. For example, here are macros, the equiva lent changes on all processors call, especially useful. The chip limits can also be made transparent so that the Connection structure easy to change in software is.

Explanation of the procedure

FPGAs (Field programmable gate arrays) bilden die Bau steine dieses neuen Designverfahrens. Diese an sich bekannten Chips sind wiederholt konfigurierbar und be stehen u. a. aus einem quadratischen Gitter kombinato rischer Logikblöcke (die mehrere Gatter und Speicher elemente enthalten) und einem verbindenden Netz von pro grammierbaren Leitungen (Interconnect-Ebene mit pro grammierbaren Schalter-Matrixen). Das hier vorgestellte Designkonzept macht Gebrauch von einem regelmäßigen Gitter von ca. 100 dieser Chips. Die Aufgabe jedoch, ein funktionierendes Design für nur einen solchen Chip zu entwerfen, birgt normalerweise u. a. komplexe Probleme bei der zeitlichen Abstimmung. Selbst das Äquivalent eines effizienten automatischen Compilers für serielle Probleme ist noch nicht denkbar. Weiterhin fordert das Problem der Kommunikation zwischen Prozes sen, die auf den verschiedenen Chips laufen, jedesmal eine gesonderte und schwierige Behandlung.FPGAs (field programmable gate arrays) form the building stones of this new design process. This in itself known chips are repeatedly configurable and be stand u. a. from a square grid combinato logic blocks (the multiple gates and memories elements included) and a connecting network of pro programmable lines (interconnect level with pro programmable switch matrices). The presented here Design concept makes use of a regular one Grid of approximately 100 of these chips. However, the task a working design for just one such chip design usually involves a. complex Timing issues. Even that Equivalent to an efficient automatic compiler for serial problems is not yet conceivable. Farther calls for the problem of communication between processes that run on the different chips every time a separate and difficult treatment.

Die beiden Probleme von Design und Kommunikation werden erfindungsgemäß gelöst durch das hier vorgestellte Ver fahren. Die Lösung basiert auf kleinen designenden Schaltkreisen, die einfach durch Hardware-Programmie rung zu entwerfen sind und so klein sind wie die kleinsten Prozeßeinheiten der zu lösenden Probleme des Problemkreises. Viele solcher Einheiten werden symme trisch in jeden Chip verpackt. Kommunikation erfolgt über die Wiederholung eines lokalen Musters von Verbin dungslinien zwischen diesen Einheiten. Die Architektur der festen Interchipverdrahtung spiegelt eine Obermenge dieser lokalen Struktur wider, um verschiedene Probleme innerhalb desselben Kreises zuzulassen. Die ganzen Ein heiten werden synchron getaktet und so gebildet, daß sie stetig alle Eingaben ihrer Nachbarn in einem loka len Speicher aufnehmen können. Auf diese Weise werden die sonst schwierigen Kommunikationsprobleme beseitigt.The two problems of design and communication will be solved according to the invention by the Ver presented here drive. The solution is based on small design ends Circuits made easy through hardware programming design and are as small as that smallest process units of the problems to be solved Problem area. Many such units become symme packed in every chip. Communication takes place about repeating a local pattern from Verbin lines between these units. Architecture the fixed interchip wiring reflects a superset this local structure reflected to various problems allow within the same circle. The whole one units are synchronized and formed in such a way that she constantly all inputs from her neighbors in a loka len can store. That way eliminates the otherwise difficult communication problems.

Die hier angebotene Lösung liegt schon sehr dicht an der Ideallösung, jedes Problem direkt auf Silizium zu verwirklichen, die sich anbieten würde, wenn Multi- Chip-Design einfach wäre. Ein Prototyp eines solchen Rechners ist für Evolutionssimulationen vom Erfinder bereits gebaut worden. The solution offered here is very close the ideal solution to address any problem directly to silicon that would be offered if multi- Chip design would be easy. A prototype of one Calculator is for evolution simulations from the inventor already been built.

3. Reconfiguration procedure

Die Neukonfigurierung von Parallelrechnern mit der im Punkt 1. erwähnten Architektur ist erfindungsgemäß mit einer klassischen Programmieraufgabe vergleichbar.The reconfiguration of parallel computers with the im The architecture mentioned in point 1 is in accordance with the invention comparable to a classic programming task.

a) Edit
With the help of a higher computer language such. B. the C language, the systematic naming of network names (the corresponding term for variables in conventional programming) is made. Systematic changes, either in the modules themselves or in their connection structure, can then be carried out efficiently using program-generated design modification commands, which can be summarized in macro files.
b) Compile
This command file and the original design file can then be accessed by an optimization routing program (from the FPGA manufacturer) in order to generate a new design file for each of the FPGAs. The user only needs to specify the local changes in the connection structure or logic of a module to create a new overall design. Loadable binary files are automatically generated from the design files for the FPGAs using software from FPGA manufacturers.
c) loading
These binary files are sent in parallel or in series to the parallel computer via the bus structure described in point 3.a) above.
d) reading
The results can either be read directly from the distributed memory (using a readback pulse initiated by the FPGA manufacturer) or dynamically via data lines that scan the computer structure at crucial points.

Nachfolgend wird anhand der Figuren die Erfindung an hand eines Ausführungsbeispiels eines Genetikprozessors näher erläutert.The invention is described below with reference to the figures hand of an embodiment of a genetic processor explained in more detail.

Brief description of the figures

Fig. 1 Programmierbare Verbindungseinrichtungen. Fig. 1 Programmable connection devices.

Die Darstellung zeigt eine Ecke eines XILINX 4000 LCA- Chip (Logic Cell Array) und die erhältlichen program mierbaren Verbindungen zwischen den CLB (Configurable Logic Block) und den 10-Blöcken (Input/Output-Blöcken), Dekodern und Puffern. Die hohe Dichte der möglichen Verbindungen in den Schaltmatrixen, die ein regel mäßiges Gitter zwischen den CLB bilden, ist besonders auffällig, jedoch zeigen die zahlreichen Punkte auch programmierbare Verbindungspunkte (Programable Inter connection Points - PIP).The illustration shows a corner of a XILINX 4000 LCA Chip (Logic Cell Array) and the available program connectable connections between the CLB (Configurable Logic block) and the 10 blocks (input / output blocks), Decoders and buffers. The high density of the possible Connections in the switching matrices that are a rule forming a moderate grid between the CLB is special striking, but the numerous points also show programmable connection points (Programmable Inter connection points - PIP).

Fig. 2 CLB-Konfiguration als Logik oder Speicher. Fig. 2 CLB configuration as logic or memory.

Hierbei handelt es sich um eine Reproduktion aus dem XILINX Serie XC-4000 Datenbuch. Die Konfigurations logikblöcke der XILINX Serie 4000 weisen 2 programmier bare Funktionsgeneratoren zur Implementierung einer arbiträren Boole′schen Funktion mit 4 Variablen auf. Diese wird über ein Tabellen-Nachschlagschema implemen tiert, das nicht nur bei der Konfiguration, sondern auch während des Betriebs programmierbar ist, so daß die Funktionsgeneratoren auch als RAM-Speicher verwend bar sind. Die CLB sind ebenfalls mit bei der Konfigura tion programmierbaren Multiplexern versehen, die ver schiedene Verbindungen zwischen den direkten und den (durch Flipflops) gehaltenen Ausgängen und den Funk tionsgeneratoren ermöglichen. Ein dritter Funktions generator verleiht dem CLB weiteres kombinatorisches Potential oder weiteres Potential als kombinierte RAM. Die Details der schnellen Übertragungslogik, welche die effiziente Verwendung der CLB als Konstruktionsblöcke für Zähler und Addierer oder dergleichen erlauben, sind nicht dargestellt.This is a reproduction from the XILINX series XC-4000 data book. The configuration Logic blocks of the XILINX Series 4000 have 2 programming bare function generators for implementing a arbitrary Boolean function with 4 variables. This is implemented using a table lookup scheme not only in the configuration, but also is also programmable during operation, so that the function generators are also used as RAM memory are cash. The CLB are also part of the configuration tion programmable multiplexers provided ver different connections between the direct and the outputs (by flip-flops) and the radio enable generators. A third function generator gives the CLB further combinatorial features Potential or further potential as combined RAM. The details of the fast transfer logic that the efficient use of the CLB as construction blocks allow for counters and adders or the like are not shown.

Fig. 3 Einheitsstruktur und IO für den Genetik-Prozes sor. Fig. 3 unit structure and IO for the genetics process sor.

Diese Darstellung betrifft die lokale Verbindungsstruk tur für 4 _* 4 Einheiten des Genetik-Prozessors (wobei 164 CLB dargestellt sind). Die räumliche Isotropie wird bewahrt, indem 4 gedrehte Einheiten (x1, x2, x3, x4) kombiniert werden, um eine übergeordnete Wiederho lungseinheit zu bilden, die in den gestrichelten Recht ecken dargestellt ist. Dieses Design paßt gut in den LCA-Chip 4003 der Firma XILINX, während der LCA-Chip 4010 100 = 5 _* 5 _* 4 Einheiten (400 CLB) aufnehmen kann. Die diagonalen Verbindungen zeigen das Prototyp-Inter aktionsschema für die Schaltsteuerung.This representation relates to the local connection structure for 4 _* 4 units of the genetics processor (164 CLB are shown). The spatial isotropy is preserved by combining 4 rotated units (x1, x2, x3, x4) to form a higher-level repeat unit, which is shown in the dotted right corners. This design fits well in the LCA chip 4003 from XILINX, while the LCA chip 4010 can accommodate 100 = 5 _* 5 _* 4 units (400 CLB). The diagonal connections show the prototype interaction scheme for the shift control.

Fig. 4 Design der Prototyp-Genetik-Prozessoreinheit. Fig. 4 Design of the prototype genetics processor unit.

Die Figur zeigt ein Design einer 4-CLB-Prototyp-Einheit mit 4 Eingängen und 2 Ausgängen. Das Design verwendet einen 2-Ketten-Speicher (FIFO) mit genetischer Steue rung des Informationstransfers von i1, i2 nach o1, o2 über den Schalter SW1. Der Schalter weist vier Stellun gen auf und es erfolgt eine interne und externe Steue rung, wenn geschaltet wird. Eine Adressierung wird durch 4 solcher Blöcke aus einem einzelnen globalen zyklischen Adreßzyklus 0000100110101111 erzeugt, wobei ein Flipflop in der Steuerung (CTL) jedes Blocks ver wendet wird.The figure shows a design of a 4-CLB prototype unit with 4 inputs and 2 outputs. The design uses a 2-chain memory (FIFO) with genetic control Information transfer from i1, i2 to o1, o2 via the switch SW1. The switch has four positions on and there is an internal and external tax when switching. An addressing will through 4 such blocks from a single global cyclic address cycle 0000100110101111, where a flip-flop in the controller (CTL) of each block is applied.

Fig. 5A, 5B, 5C und 5D CLB-Designs für die Genetik-Prozessoreinheit. Fig. 5A, 5B, 5C and 5D CLB designs for Genetics processor unit.

Die CLB-Einheit gemäß Fig. 4 besteht aus den Blöcken KAM, MEM, CTL und SWI. Die ersten beiden CLB, nämlich RAM und MEM (Fig. 5A und 5B), dienen als zwei Schiebe register und der dritte CLB, nämlich CTL (Fig. 5C), steuert die Funktion des vierten CLB, nämlich SWI (Fig. 5D), der ein genetischer Schalter für die beiden Bit ströme ist.The CLB unit according to FIG. 4 consists of the blocks KAM, MEM, CTL and SWI. The first two CLBs, namely RAM and MEM ( FIGS. 5A and 5B), serve as two shift registers and the third CLB, namely CTL ( FIG. 5C), controls the function of the fourth CLB, namely SWI ( FIG. 5D), which is a genetic switch for the two bit streams.

Fig. 6 LCA-Leitungsführung für den Genetik-Prozessor- Prototyp. Fig. 6 LCA line routing for the genetics processor prototype.

Die Darstellung zeigt die "Verdrahtung" des LCA-Proto typ-Designs zum Zeitpunkt des Einschreibens. Diese Ver drahtung ist mit Fig. 3 zu vergleichen. Die in Fig. 5 dargestellten 4 CLB, welche die Grund-Funktionseinheit bilden, sind wie in Fig. 4 angeordnet. Dieses Verbin dungsschema wird automatisch durch den XILINX-Optimie rungs-Leitungsführer gemäß einem halbautomatischen Ver fahren zur CLB-Einführung und zur Netzbenennung unter Verwendung des grundlegenden 4-CLB-Makros durchgeführt. Die beiden mittleren Reihen und Spalten der CLB werden bei diesem Demonstrations-Design zur Erzeugung der glo balen 1-Bit-Adressensequenz, der Bitströme, mit welchen die Einheiten zu Anfang geladen werden, der Logik zum Abtrennen von dieser Ladequelle und einiger anderer, weniger wichtiger Anzeigeverbindungsaufgaben, die für ein endgültiges Design nicht erforderlich sind, verwen det.The illustration shows the "wiring" of the LCA prototype design at the time of registration. This wiring is to be compared with FIG. 3. The 4 CLBs shown in FIG. 5, which form the basic functional unit, are arranged as in FIG. 4. This connection scheme is carried out automatically by the XILINX optimization guide according to a semi-automatic procedure for CLB introduction and network naming using the basic 4-CLB macro. The two middle rows and columns of the CLB become less important in this demonstration design to generate the global 1-bit address sequence, the bitstreams with which the units are initially loaded, the logic to disconnect from this load source, and some others Display connection tasks that are not required for a final design are used.

Fig. 7 Genetik-Prozessor auf einer XILINX-Demonstra tionsplatte. Fig. 7 genetics processor on a XILINX demonstration plate.

Die Darstellung zeigt die Pinverbindungen und deren Bedeutung in dem Fall, daß der 4003 LCA-Chip mit dem Design des Genetik-Prozessoreinheit-Prototyps geladen ist. Dip-Schalter auf der Demonstrationsplatte steuern den Abtrenn- und Schalter-Rücksetz-Halteprozeß, während LED und segmentierte Anzeigen zur Angabe der Binärzu stände verschiedener repräsentativer Signale verwendet werden, die zwei kollidierende Bitströme und den Zu stand des genetischen Schalters an einem Punkt der LCA einschließen.The illustration shows the pin connections and their Significance in the event that the 4003 LCA chip with the Genetics processor unit prototype design loaded is. Control dip switches on the demonstration panel the disconnect and switch reset hold process while LED and segmented displays to indicate the binary different representative signals used the two conflicting bit streams and the Zu the genetic switch was at a point on the LCA lock in.

Fig. 8 Interface zum Host-Speicher als Strömungsreak tor. Fig. 8 interface to the host memory as a flow reactor tor.

Die Figur zeigt ein schematisch dargestelltes Inter face, das erreicht wird, indem die Grenz-Bitstromver bindungen zu einem Host-Bus geöffnet werden. Weitere Details finden sich in der nachfolgenden Beschreibung. Es sind ferner die Ladebitstrompfade durch das Gitter aus 4005-Chips dargestellt.The figure shows a schematically represented inter face that is achieved by limiting bitstream ver connections to a host bus can be opened. Further Details can be found in the description below. It is also the charge bit current paths through the grid represented from 4005 chips.

Fig. 9 Einheitsgrößen und Packung bei 14 _* 14 CLB LCA (4005). Fig. 9 one size and pack at 14 _* 14 CLB LCA (4005).

Es wird bei den Einheitsmustern davon ausgegangen, daß jede LCA ein identisches Format hat. Die (2 _* 2)-Wieder holung ist daher erforderlich, um alternierende I/O- Richtungen zu ermöglichen, wodurch Isotropie sicherge stellt ist. Bei den komplexesten Designs (3 und 5) müs sen 4 Varianten der Grundeinheit geroutet werden.The unit samples assume that each LCA has an identical format. The (2 _* 2) repetition is therefore necessary to enable alternating I / O directions, which ensures isotropy. For the most complex designs (3 and 5), 4 variants of the basic unit have to be routed.

3: 4 _* 4 _* (2 _* 2) _* 3 = 64 3-CLB-Einheiten
4: 3 _* 3 _* (2 _* 2) _* 4 = 36 4-CLB-Einheiten
5: 3 _* 3 _* (2 _* 2) _* 5 = 36 5-CLB-Einheiten
9: 2 _* 2 _* (2 _* 2) _* 9 = 16 9-CLB-Einheiten
12: 2 _* 2 _* (2 _* 2) _* 12 = 16 12-CLB-Einheiten.3: 4 _* 4 _* (2 _* 2) _* 3 = 64 3-CLB units
4: 3 _* 3 _* (2 _* 2) _* 4 = 36 4-CLB units
5: 3 _* 3 _* (2 _* 2) _* 5 = 36 5-CLB units
9: 2 _* 2 _* (2 _* 2) _* 9 = 16 9-CLB units
12: 2 _* 2 _* (2 _* 2) _* 12 = 16 12-CLB units.

Fig. 10 Lokale Adressenerzeugung mit Einfügung und Löschen. Fig. 10 Local address generation with insertion and deletion.

Diese Zwei-CLB-Konfiguration erzeugt eine lokale Adres se für eine separate Schreibpositionierung und ist fer ner in der Lage, diese Adresse in einem einzelnen Takt zyklus um zwei zu verzögern oder vorwärts zu bewegen, wodurch Einfügen und Löschen in dem ausgehenden Bit strom möglich sind.This two-CLB configuration creates a local address se for separate write positioning and is fer ner able to address this in a single bar cycle to delay two or move forward, thereby inserting and deleting in the outgoing bit electricity are possible.

Die nachfolgende Beschreibung ist wie folgt unterteilt: Im Abschnitt 1 wird die Beschaffenheit einer Familie von gegenwärtig erhältlichen FPGA dargestellt, und zwar die Serie 4000 der Logic-Cell-Arrays (LCA) der Firma XILINX. Im Abschnitt 2 wird das verwendete Prototyp- Design beschrieben. Im Abschnitt 3 wird die Entwicklung einer ausreichend flexiblen Parallel- Programmier umgebung zur Modellerstellung mit einer festen Zahl von LCA erörtert. Im Abschnitt 4 wird ein generischer gene tischer Prozessor, der diese Methodik verwendet, vorge stellt. Schließlich wird im Abschnitt 5 die Art der Simulationsexperimente, die mit dieser Art von Computer durchführbar sind, sowie die Möglichkeit des Restruktu rierens erörtert, um einen Kompromiß zwischen Popula tionsgröße und Rechnergeschwindigkeit zu erhalten. The following description is divided as follows: Section 1 describes the nature of a family represented by currently available FPGA the company's 4000 series of logic cell arrays (LCA) XILINX. In section 2 the prototype used Design described. Section 3 describes the development a sufficiently flexible parallel programming environment for model creation with a fixed number of LCA discussed. Section 4 contains a generic gene tical processor using this methodology poses. Finally, in section 5 the type of Simulation experiments with this type of computer are feasible, as well as the possibility of restructuring rierens discussed a compromise between Popula tion size and computer speed.

1. The XILINX LCA 4000 series

Die Logikzellenanordnungen von XILINX sind auf CMOS VLSI Technologie basierende anwenderprogrammierbare Logikfelder. Der Ausdruck "programmierbar" bezieht sich auf die Tatsache, daß die Schaltung geringster Ordnung auf multipotente Weise konstruiert ist, die durch das Konfigurieren zahlreicher interner Speicherzellen (z. B. 178096 bei dem größten gegenwärtig erhältlichen LCA, dem XC-4010) näher spezifizierbar ist. Diese spezifi zieren sowohl eine Vielzahl logischer Elemente als auch deren Verbindungen: primär die Anordnungen der konfigu rierbaren Logikblöcke (CLB) und der Schaltmatrixen. Darüber hinaus ermöglichen Input-Output-Blöcke (IOB) und breite Detektoren eine programmierbare Kommunika tion mit den externen Pins des Chips, und programmier bare Verbindungspunkte (PIP) und Dreizustands-Puffer (TBUF) spezifizieren die Art der Verbindung zwischen den genannten Elementen, wobei eine Vielzahl von zu sätzlichen lokalen und globalen Verbindungsleitungen (Leitungen mit einfacher Länge, Leitungen mit doppelter Länge, horizontale und vertikale Langleitungen und glo bale Leitungen) verwendet wird. In den Ecken des Chips befinden sich eine Anzahl zusätzlicher spezieller Strukturen, von denen insbesondere einige zur Steuerung der seriellen Programmierung, des Starts und der Wiederholfunktion des Chips verwendbar sind. Die in einem Teil der LCA verfügbaren Mittel sind in Fig. 1 dargestellt. Ein Seriell-Konfigurationsprogramm (das, wie in Abschnitt 4 dargelegt, mittels eines dem Kompi lieren verwandten Vorgangs erzeugt wird) kann aus einem herkömmlichen PROM-(programmierbaren Lesespeicher)-Chip oder von einer Platte über einen Standard-PC oder eine Workstation geladen werden. The logic cell arrangements from XILINX are user-programmable logic fields based on CMOS VLSI technology. The term "programmable" refers to the fact that the lowest order circuit is constructed in a multipotent manner that can be further specified by configuring numerous internal memory cells (e.g. 178096 in the largest currently available LCA, the XC-4010) . These specify a large number of logical elements and their connections: primarily the arrangements of the configurable logic blocks (CLB) and the switching matrixes. In addition, input-output blocks (IOB) and wide detectors enable programmable communication with the external pins of the chip, and programmable connection points (PIP) and tri-state buffers (TBUF) specify the type of connection between the elements mentioned, whereby a variety of additional local and global connection lines (single length lines, double length lines, horizontal and vertical long lines and global lines) is used. In the corners of the chip there are a number of additional special structures, some of which can be used in particular to control the serial programming, the start and the repeat function of the chip. The resources available in part of the LCA are shown in FIG. 1. A serial configuration program (which, as set out in Section 4, is generated by a process related to compiling) can be loaded from a conventional PROM (programmable read-only memory) chip or from a disk via a standard PC or a workstation.

Die CLB können entweder als logische Elemente oder als KAM (Direktzugriffsspeicher) oder als Kombination aus beiden programmiert sein, wie in Fig. 2 gezeigt. Als logisches Element enthält der CLB zwei unabhängige Funktionsgeneratoren (F und G) mit 4 Eingängen und einen dritten Funktionsgenerator (H) mit drei Eingän gen, die Ausgänge von F und G und einen neunten exter nen Eingang. Diese drei Funktionsgeneratoren sind der art programmierbar, daß jegliche Boolesche Funktion ihrer Eingänge realisierbar ist. Jeder dieser drei Aus gänge oder ein zehnter Eingang ist mit den beiden direkten Ausgängen des CLB verbindbar oder durch einen der beiden Flipflops (Einzelbitspeicherung) in dem CLB leitbar, die als weitere externe Eingänge des CLB mit Takt-, Setz-/Rücksetz- und Freigabe-Steuerungen verbun den sein können. Darüber hinaus ist der CLB mit pro grammierbarer schnellverarbeitender Logik versehen, die die Verwendung schneller Zähler und dergleichen über mehrere CLB ermöglicht. Als Speicherelement kann der H- Funktionsgenerator verwendet werden, um die Verwendung von F und G entweder als separate 16-Bit-Speicher (wo bei die 4 Eingänge als Adreßleitungen agieren) oder als kombinierter 32-Bit-KAM zu steuern. (Als zwei sepa rate Speicher wird der Schreibzugriff für beide jedoch von einem einzelnen Schreib-Freigabesignal gesteuert.) Die ausgelesenen Bits können wie zuvor dargelegt sowohl direkt als auch durch die Flipflops ausgegeben werden. Dieses Merkmal ist ein wesentlicher Fortschritt der Serie 4000, da es das Potential einer programmierbaren dichten parallelen Datenverarbeitung unter Verwendung verteilter Speicher ermöglicht. Die Wiederholungsfunk tion liefert eine Seriell-Bitsequenz mit den gegenwär tigen Zuständen sämtlicher Flipflops, einschließlich der in den IOB nicht erwähnten. Der Inhalt wird bei der Initiierung der Wiederholfunktion kopiert, so daß eine normale LCA-Verarbeitung parallel zu der Wiederholfunk tion ablaufen kann. Jedoch werden die Inhalte der als KAM ausgelegten CLB nicht kopiert, so daß der Benutzer die Schreib-Freigabe für alle KAM, deren Werte bei der Wiederholfunktion benötigt werden, synchron herunter ziehen muß.The CLB can be programmed either as logic elements or as KAM (Random Access Memory) or a combination of both, as shown in FIG. 2. As a logical element, the CLB contains two independent function generators (F and G) with 4 inputs and a third function generator (H) with three inputs, the outputs of F and G and a ninth external input. These three function generators are programmable in such a way that any Boolean function of their inputs can be implemented. Each of these three outputs or a tenth input can be connected to the two direct outputs of the CLB or can be routed through one of the two flip-flops (single bit storage) in the CLB, which act as additional external inputs of the CLB with clock, set / reset and enable Controls can be connected. In addition, the CLB is provided with programmable high-speed logic that enables the use of high-speed counters and the like across multiple CLBs. The H function generator can be used as a memory element to control the use of F and G either as separate 16-bit memories (where the 4 inputs act as address lines) or as a combined 32-bit KAM. (As two separate memories, however, the write access for both is controlled by a single write enable signal.) The bits read out can, as explained above, be output both directly and by the flip-flops. This feature is a significant advance in the 4000 series as it enables the potential of programmable dense parallel data processing using distributed memory. The repeat function provides a serial bit sequence with the current states of all flip-flops, including those not mentioned in the IOB. The content is copied when the repeat function is initiated so that normal LCA processing can run in parallel with the repeat function. However, the contents of the CLB designed as KAM are not copied, so that the user must pull down the write release for all KAM whose values are required for the repeat function.

Die Leitungsführung der Benutzerverbindungen, die Ver wendung der Schaltmatrixen und der genannten Verbin dungseinrichtungen, erfolgt zu einem großen Teil auto matisch mit einem Optimierungscode, der ein simuliertes Ausheilen beinhaltet, jedoch muß die Verteilung be stimmter hoher Fan-Out-Signale, einschließlich der Taktsignale und der zuvor erwähnten Schreib-Freigabe, auf die Verwendung globaler Leitungen beschränkt wer den, um mit übermäßiger Belastung einhergehende größere Zeitverzögerungen zu vermeiden. Diese Leitungsführung kann ebenfalls als eine compiler-ähnliche Funktion an gesehen werden. Es sei darauf hingewiesen, daß der eine LCA-Chip durch den Benutzer mittels einer Seriell-Bit sequenz reprogrammierbar ist, um zahlreiche verschie dene Logikschaltungen zu implementieren. Die Bit sequenz ist das Ergebnis der Kompilierung einer Design- Datei, die ihrerseits das Ergebnis des Routens der von dem Benutzer eingegebenen Spezifikationen bezüglich Logik und Verbindungen ist. Es sind keine zusätzlichen Modifikationen der Hardware erforderlich, vorausge setzt, das Design stellt keine neuen Anforderungen an externe Einrichtungen, wie zum Beispiel Drahtverbindun gen zu den IO-Pins des Chips. Weitere Einzelheiten fin den sich in den XILINX Teile-Spezifikationsangaben.The routing of the user connections, the Ver use of the switching matrix and the specified connection equipment is largely car matically with an optimization code that simulates a Healing involves, but the distribution must be tuned high fan-out signals, including the Clock signals and the aforementioned write enable, limited to the use of global lines the larger one associated with excessive stress To avoid time delays. This routing can also act as a compiler-like function be seen. It should be noted that one LCA chip by the user using a serial bit sequence is reprogrammable to numerous different to implement their logic circuits. The bit sequence is the result of compiling a design File, which in turn is the result of routing the from specifications entered to the user Logic and connections is. There are no additional ones Hardware modifications required, prev the design makes no new demands external facilities such as wire connections to the IO pins of the chip. More details fin in the XILINX part specification information.

2. Design prototype for genetics data processing

Genetische molekulare Sequenzen werden modellhaft als binäre Ketten variabler Länge dargestellt. Zunächst sei darauf hingewiesen, daß die Subsequenz 00 als Separator zur Markierung der Lücken zwischen den Ketten verwendet wird. Die Ketten sind in Schieberegister eingebettet, die in einem regelmäßigen Gitter in der gesamten LCA eingebettet und paarweise an einer Knotenpunktanordnung verbunden, die als genetische Schalter bezeichnet wer den. Der Datenfluß der Ketten durch die Schieberegister und die Schalter ist konstant die Taktgeschwindigkeit (die bis zu 3 _* 10⁷ Bits/sek beträgt). Die Schalter steu ern einfach die Art und Weise, in der von zwei konver gierenden Schieberegistern eingehende Bits auf die bei den divergierenden Schieberegister verteilt werden. Das Muster des Datenflusses im vorgegebenen Schalterzustand ist in Fig. 3 dargestellt. Dieses wird gewählt, um die Schieberegister anfänglich zu laden und um räumliche Symmetrie während des freien Schaltens zu bewahren. Bei der einfachen Prototypversion erreicht entweder einer der beiden oder beide eingehende Bitströme die beiden ausgehenden Ströme (Kopie oder Auslagerung), und der Schalter steuert ferner, welcher der beiden Eingangs ströme die Ausgänge erreicht. Somit weist der Schalter 4 Zustände auf, die von zwei Flipflops gesteuert wer den. Veränderungen im Zustand des Schalters treten nur auf, wenn die Schalter-Flipflop-Taktgebung freigegeben wird. Bei dem Prototyp-Design ist die Freigabe aktiv, wenn beide Eingangsströme eine Lücke zwischen den Ket ten (00 Subsequenz) aufweisen, und der neue Schalterzu stand wird sodann durch zwei Bits eines dritten benach barten Bitstroms bestimmt (der Dateneingang der Flip flops). Wenn eine Übereinstimmung von 3 Bits zwischen den gegenwärtigen Bits in einem Eingangsstrom und den jenigen des dritten Nachbarn gegeben ist, wird der Schalter ebenfalls freigegeben. Letzterer ermöglicht das Schalten innerhalb von Ketten und somit eine breite Palette von rekombinatorischen und mutativen Ereignis sen, einschließlich des Teilens und des Verbindens. Es sei darauf hingewiesen, daß bei dem Prototyp-Design das Einfügen oder Löschen einzelner Schritte in das bzw. aus dem Inneren der Ketten nicht zulässig ist. Der Vor teil dieses Designs ist, daß es einen einheitlichen einfachen Rahmen zum Mischen, Verstärken und lokalen Selektieren schafft.Genetic molecular sequences are modeled as binary chains of variable length. First, it should be noted that the sub-sequence 00 is used as a separator to mark the gaps between the chains. The chains are embedded in shift registers, which are embedded in a regular grid throughout the LCA and connected in pairs at a junction arrangement called genetic switches. The data flow of the chains through the shift registers and the switches is constant the clock speed (which is up to 3 _* 10 ⁷ bits / sec). The switches simply control the way in which bits coming from two converging shift registers are distributed to those at the diverging shift registers. The pattern of the data flow in the predetermined switch state is shown in FIG. 3. This is chosen to initially load the shift registers and to maintain spatial symmetry during free switching. In the simple prototype version, either one or both of the incoming bit streams reach the two outgoing streams (copy or swap), and the switch also controls which of the two input streams reaches the outputs. Thus, the switch has 4 states that are controlled by two flip-flops. Changes in the state of the switch only occur when the switch flip-flop timing is enabled. In the prototype design, the release is active when both input streams have a gap between the chains (00 sub-sequence), and the new switch state is then determined by two bits of a third neighboring bit stream (the data input of the flip-flops). If there is a 3 bit match between the current bits in an input stream and those of the third neighbor, the switch is also enabled. The latter enables switching within chains and thus a wide range of recombinant and mutative events, including sharing and joining. It should be noted that in the prototype design, inserting or deleting individual steps in or out of the inside of the chains is not permitted. The part before this design is that it creates a uniform simple framework for mixing, amplifying and local selection.

Eine quadratische lokale Einheit von 4 CLB genügt zum Implementieren zweier 34-Bit Zwischenschalter-Ketten segmente, dem genetischen Schalter und dessen Steue rung. Das Block-Design dieser Einheit ist in Fig. 4 dargestellt. Zwei in den Fig. 5A und 5B dargestellte CLB (KAM und MEM) implementieren zwei benachbarte 2 _* 17- Bit- Schieberegister, welche die F- und G-Funktions generatoren als Speicher und die Flipflops zum Speichern des 17ten Bits verwenden. Das Adressieren der 16-Bit-Speicher erfolgt in einem Zyklus mit 16 Takt schritten, wobei pro Taktschritt ein Bit ausgelesen und geschrieben wird. Normalerweise würde für diesen Zweck ein 4-Bit-Zähler verwendet, jedoch ist ein Zyklus mit 16 verschiedenen 4-Bit-Adressen auch durch konsekutives Überlappen von 4-Bit-Subsequenzen eines einzelnen Bit stroms erzielbar und dies verringert die Zahl der Adreßsignale, die global durch den Chip zu verteilen sind, von 4 auf 1. Die verwendete Sequenz lautete 0000100110101111 und wurde über einen ersten globalen Puffer von der Ecke des Chips zu jeder Gruppe von 4 lokalen Einheiten hin verteilt, wobei ein Flipflop aus jeder der 4 Einheiten zum Wiederherstellen der gegen wärtigen 4-Bit-Adresse verwendet wurde. Ist dieses Prinzip einmal verstanden, so sind ähnliche Adressier varianten leicht zu entwickeln. Der einzige Unterschied zwischen den 4 lokalen Einheiten, welche die in Fig. 3 dargestellte eigentliche LCA-Wiederholungseinheit (mit 8 Ketten und 4 Schaltern) bilden, liegt in der Orien tierung der Eingänge und der Ausgänge.A quadratic local unit of 4 CLB is sufficient to implement two 34-bit intermediate switch chain segments, the genetic switch and its control. The block design of this unit is shown in Fig. 4. Two CLBs (KAM and MEM) shown in FIGS . 5A and 5B implement two adjacent 2 _* 17-bit shift registers which use the F and G function generators as memory and the flip-flops to store the 17th bit. The 16-bit memory is addressed in a cycle with 16 clock steps, with one bit being read and written per clock step. A 4-bit counter would normally be used for this purpose, however a cycle with 16 different 4-bit addresses can also be achieved by consecutively overlapping 4-bit sub-sequences of a single bit stream and this reduces the number of address signals that are global to be distributed by the chip, from 4 to 1. The sequence used was 0000100110101111 and was distributed via a first global buffer from the corner of the chip to each group of 4 local units, with a flip-flop from each of the 4 units to restore the was used against the current 4-bit address. Once this principle is understood, similar addressing variants can be easily developed. The only difference between the 4 local units, which form the actual LCA repetition unit (with 8 chains and 4 switches) shown in FIG. 3, lies in the orientation of the inputs and the outputs.

Der Schalt-CLB (SWI) ist in Fig. 5D dargestellt. Die beiden Flipflops weisen eine Rücksetzleitung auf, die zum Halten des Schalters in dessen Vorgabeposition wäh rend des Initialisierens dienen, sowie eine Schreib freigabe, die, zusätzlich zu den von den oben bezeich neten dritten benachbarten Bitströmen her kommenden Dateneingängen, von dem Steuer-CLB (CTL) her kommen. Einer der Ströme muß den H-Funktionsgenerator durch laufen, um sein Flipflop zu erreichen. Die Flipflops spezifizieren die 4 möglichen Zustände des Schalters:
1. SWAP1: 00 i1-o1 und i2-o2
2. SWAP2: 01 i1-o2 und i2-o1
3. COPY1: 10 i1-o1 und i1-o2
4. COPY2: 11 i2-o1 und i2-o2.The switching CLB (SWI) is shown in Fig. 5D. The two flip-flops have a reset line, which are used to hold the switch in its default position during initialization, and a write enable, which, in addition to the data inputs coming from the third neighboring bit streams referred to above, from the control CLB ( CTL) come here. One of the streams must go through the H function generator to reach its flip-flop. The flip-flops specify the 4 possible states of the switch:
1. SWAP1: 00 i1-o1 and i2-o2
2. SWAP2: 01 i1-o2 and i2-o1
3. COPY1: 10 i1-o1 and i1-o2
4. COPY2: 11 i2-o1 and i2-o2.

Der Rest des CLB (die F- und H-Generatoren) werden zum Implementieren des Schalters verwendet, wobei jeder Funktionsgenerator einen Ausgang spezifiziert. Die vier Eingänge der Funktionsgeneratoren sind in beiden Fällen gleich: die beiden Eingangsbitströme und die beiden Schalterzustandsausgänge der Flipflops.The rest of the CLB (the F and H generators) become the Implement the switch used, each Function generator specified an output. The four Inputs of the function generators are in both cases same: the two input bit streams and the two Switch status outputs of the flip-flops.

Schließlich verwendet der Steuer-CLB (CTL) von Fig. 5C einen Funktionsgenerator zum Untersuchen auf eine dop pelte Lücke in den beiden eingehenden Bitströmen (00 und 01) und den anderen zum Untersuchen auf eine Zwei- Bit-Übereinstimmung zwischen dem ersten eingehenden Strom und dem benachbarten dritten Strom. Ein Flipflop wird zum Aufzeichnen des vorherigen Zustands dieser letztgenannten Übereinstimmung verwendet und sodann zusammen mit den Ausgängen von F und G zur endgültigen Entscheidung in der Logik von H zurückgeleitet (2 auf einanderfolgende Übereinstimmungen (d. h., eine 3-Bit- Übereinstimmung) zwischen dem ersten und dem dritten Bitstrom oder eine doppelte Lücke in Strom 1 und 2), so daß das Schreib-Freigabesignal eine Veränderung des Schalterzustands ermöglichen kann. Das verbleibende Flipflop wird zur vorgenannten Adressendekodierung ver wendet.Finally, the control CLB (CTL) of FIG. 5C uses a function generator to examine for a double gap in the two incoming bit streams (00 and 01) and the others to examine for a two-bit match between the first incoming stream and the neighboring third stream. A flip-flop is used to record the previous state of this latter match and then, together with the outputs of F and G, is fed back for final decision in the logic of H (2 to consecutive matches (ie, a 3-bit match) between the first and the third bit stream or a double gap in streams 1 and 2) so that the write enable signal can enable a change in the switch state. The remaining flip-flop is used for the aforementioned address decoding.

Das Prototyp-Design wurde auf dem kleinsten Chip (4003) der Serie 4000 implementiert, für den eine Demonstra tionsplatine von XILINX zur Verfügung stand. Diese weist eine 10 _* 10 Anordnung von CLB auf und ermöglicht die Verwendung von 64 CLB für 16 zu 4 Gruppen verbun denen lokale Einheiten (32 34-Bit-Ketten und 16 Schal ter), wobei 2 CLB-breite vertikale und horizontale Streifen in der Mitte zum Testen und für einfache On- Chip-Erzeugung der globalen Steuerungen frei blieben (eine Off-Chip-Lösung wird für größere Designs vorge schlagen). Die Leitungsführung der LCA ist in Fig. 6 dargestellt. Nur eine kleine Untergruppe der 84 Pins ist zur Vervollständigung einer toroidalen Geometrie von Bitströmen zum Testen des Einzelchips erforderlich. Die Details der Verbindungen und der Steuerschalter, die bei diesem Demonstrations-Design verwendet wurden, sind in Fig. 8 dargestellt.The prototype design was implemented on the smallest chip (4003) of the 4000 series, for which a demonstration board from XILINX was available. This has a 10 _* 10 arrangement of CLB and enables the use of 64 CLB for 16 to 4 groups connected local units (32 34-bit chains and 16 switches), with 2 CLB-wide vertical and horizontal strips in the Center for testing and easy on-chip generation of the global controls remained free (an off-chip solution is suggested for larger designs). The routing of the LCA is shown in Fig. 6. Only a small subset of the 84 pins is required to complete a toroidal geometry of bitstreams for testing the single chip. The details of the connections and control switches used in this demonstration design are shown in FIG. 8.

Für das endgültige Design sind 100 der großen Chips (20 _* 20) erforderlich, die auf einer Leiterplatte (PCB) verdrahtet sind, um eine Population von 10000 Ketten mit einer durchschnittlichen Länge von 64 Bits zu er reichen. The final design will require 100 of the large chips (20 _* 20) wired on a printed circuit board (PCB) to reach a population of 10,000 chains with an average length of 64 bits.

Die begrenzte Populationsgröße dieses Genetik-Prozes sors kann teilweise dadurch behoben werden, daß ein Öffnen der geschlossenen Prozessor-Bitströme zum Haupt- KAM-Speicher des Host-Computers mit variabler Bitbreite erfolgt. In zwei Host-Buszyklen können bis zu 32 Bits (je nach Host-Bus) parallel in die LCA-Anordnung ein- und ausgegeben werden. Eine große Population von 1 Mil lion 64-Bit-Ketten (8MB) oder mehr kann dann verarbei tet werden, indem die Speicheradresse linear durch den Host-Speicher getaktet wird, wobei die in die LCA ein gegebenen Bits durch die von dieser ausgegebenen er setzt werden. Je nach der Verweilzeit der genetischen Ketten in der LCA sind zahlreiche effektive Popula tionsgrößen erzielbar, indem die Bitbreite des Inter face variiert wird. Eine schematische Darstellung die ses Host-Interface ist in Fig. 8 dargestellt.The limited population size of this genetics processor can be partially remedied by opening the closed processor bit streams to the main KAM memory of the host computer with a variable bit width. Up to 32 bits (depending on the host bus) can be input and output in parallel in the LCA arrangement in two host bus cycles. A large population of 1 million 64-bit chains (8MB) or more can then be processed by clocking the memory address linearly through the host memory, replacing the bits entered into the LCA with the bits it outputs will. Depending on the residence time of the genetic chains in the LCA, numerous effective population sizes can be achieved by varying the bit width of the interface. A schematic representation of this host interface is shown in Fig. 8.

3. Programming environment for hardware simulation

Üblicherweise liegen zwischen dem Design einer Schal tung und deren Implementierung Zeiträume in der Größen ordnung von Monaten. Die optimierte Schaltung muß einer Chip-Herstellerfirma übergeben werden, die sie mittels einer Abfolge von photolithographischen und chemischen Schritten mit einer Auflösung von 1 Mikrometer auf aus gesuchte Silizium-Wafer aufbringt. Die Kosten eines solchen Entwicklungszyklus sind hoch und nur durch große Verkaufsmengen identischer Chips abzudecken. Dies stellt das Entwicklungshindernis für kundenspezifische Chips dar. Bei den FPGA hat ein bestimmtes regelmäßiges und dichtes Design den genannten Entwicklungszyklus durchlaufen und seine Schaltung weist Speicherzellen (Flipflops) auf, deren binäre Zustände über Transisto ren das Öffnen und Schließen einer großen Zahl von Ver bindungen zwischen Design-Komponenten (den PIP) und auch kombinatorische Funktionen ermöglichen, welche eine Gruppe von Signalen und eine Ausgangsleitung ver binden (z. B., die F-, G- und H-Funktionen der Serie 4000 von XILINX). Daraus ergibt sich, daß die Spezifi kation der Zustände dieser Speicherzellen den gleichen Effekt hat, wie das Ausführen einer sehr großen Klasse von Schaltungsdesigns in Hardware, mit einer Verringe rung der erreichbaren Gatterdichte um etwa den Faktor 10. Das Schaltungsdesign eines Chips der Serie 4000 wird durch mehrere 10 000 Speicherzellen gesteuert.Usually lie between the design of a scarf device and its implementation order of months. The optimized circuit must be one Chip manufacturing company will be handed over to them by means of a sequence of photolithographic and chemical Steps with a resolution of 1 micron on applies the desired silicon wafer. The cost of one such development cycle are high and only through to cover large quantities of identical chips. This represents the development obstacle for customized Chips. The FPGA has a certain regular and dense design the development cycle mentioned go through and its circuit has memory cells (Flip-flops) whose binary states are via Transisto open and close a large number of Ver bonds between design components (the PIP) and also enable combinatorial functions, which a group of signals and an output line ver bind (e.g., the F, G and H functions of the series 4000 from XILINX). It follows that the spec cation of the states of these memory cells the same Has effect like running a very large class of circuit designs in hardware, with a ring the achievable gate density by about a factor 10. The circuit design of a 4000 series chip is controlled by several 10,000 memory cells.

Selbstverständlich können damit nicht alle Verbindungen spezifiziert werden. Die FPGA weisen eine regelmäßige Anordnung dichter lokaler Verbindungsmöglichkeiten und eine begrenzte Anzahl schneller Verbindungsleitungen größerer Reichweite auf. Es wurde besonderes Augenmerk auf die Auswirkungen der programmierbaren Verbindungen auf die Ausbreitungszeit und den Spannungspegel der Binär- und der Dreizustandssignale gelegt, jedoch las sen sich diese Probleme auf dieser Stufe nicht von dem Benutzer trennen. Andererseits ist es sehr einfach, Programme in höheren Computersprachen zu erzeugen, die ebenfalls nicht funktionieren. Es sind daher ein intel ligentes Design-Eingangsinterface und ein intelligenter Compiler erforderlich, die Fehler in einem Design begrenzen und den Nutzen eines Designs testen. Da das Produkt mit herkömmlichen Chip-Designs konkurriert, wurde großer Wert auf einen Design-Eingang mit herkömm lichen CAE-Block-Designverfahren gelegt. Das Block- Design ist jedoch völlig allgemein und muß später an die Programmiereinschränkungen der LCA angepaßt werden. Der erfahrene Benutzer kann Beschränkungen in diesem Prozeß spezifizieren, jedoch hängen die Packungsdichte und die Geschwindigkeit (die maximale erzielbare Takt rate) des Designs oftmals von globalen Gesichtspunkten ab, die den derzeit erhältlichen automatischen Imple mentierungsprogrammen, z. B. dem von XILINX angebotenen PPR (Partion place and route) entgehen.Of course, not all connections can do this be specified. The FPGA exhibit a regular Arrangement of dense local connections and a limited number of fast connection lines greater range. It got special attention on the impact of programmable connections on the propagation time and the voltage level of the Binary and tri-state signals set, however read these problems at this level do not differ from that Disconnect users. On the other hand, it's very easy Generate programs in higher computer languages that also don't work. It is therefore an intel intelligent design input interface and an intelligent one Compilers required the bugs in a design limit and test the usefulness of a design. Since that Product competes with conventional chip designs, great value was placed on a design entrance with conven CAE block design process. The block However, design is completely general and needs to be done later the programming restrictions of the LCA are adjusted. The experienced user may have restrictions in this Specify the process, but the packing density depends and the speed (the maximum achievable tact rate) of the design often from a global perspective from the currently available automatic imple mentoring programs, e.g. B. the one offered by XILINX Escape PPR (Partion place and route).

In Übereinstimmung mit dem Konzept der FPGA als regel mäßige Logikanordnungen erfolgte eine Beschränkung auf Designs, die eine regelmäßige Wiederholungseinheit be inhalten. Aufgrund der Symmetrie des Chips ist das Problem der Optimierung auf die Zelleinheit dieser wie derholenden Designstruktur beschränkt. Eine Hierarchie verschachtelter Designs, die verschiedene kleinere Ein heiten zu einer größeren Wiederholungseinheit kombi niert, kann ebenfalls in Betracht gezogen werden. Die Software von XILINX unterstützt bisher noch nicht an wenderdefinierte "harte" Makros, die das Aufbringen vollständig optimierter und bestimmter Subdesigns auf einem Chip ermöglichen, jedoch wird dies als imminent vorausgesetzt. Bei regelmäßigen Anordnungen von Sub designs ist eine systematische Bezeichnung von Netzen (Signalleitungen) im Subdesign erforderlich, um Dupli kationen von Bezeichnungen in internen Signalen zu ver meiden und eine systematische programmierte Verbindung von Subdesign-Eingängen und -Ausgängen zu ermöglichen.In line with the concept of FPGA as a rule moderate logic orders were limited to Designs that be a regular repeat unit content. This is due to the symmetry of the chip Problem of optimization on the cell unit this like limited design structure. A hierarchy nested designs that have different smaller ones units to a larger repetition unit combi nated, can also be considered. The XILINX software does not yet support user-defined "hard" macros that apply completely optimized and specific subdesigns enable a chip, however, this is considered imminent provided. With regular orders from Sub designs is a systematic name for networks (Signal lines) required in the subdesign to Dupli cations of labels in internal signals avoid and a systematic programmed connection of sub-design inputs and outputs.

Es wurden in C geschriebene Programme entwickelt, um das systematische Set von Netzverbindungsbefehlen zu schaffen, die zum Implementieren einer gewünschten regelmäßigen Topologie, wie der des Prototyp-Designs, erforderlich sind. Es wurde ebenfalls ein in C ge schriebenes Programm entwickelt, um die erforderliche Sequenz von "weichen" Makrobefehlen zu erzeugen, die zur Anordnung der Subdesigns auf der LCA erforderlich sind. Da der XILINX LCA Design-Editor (XACT) das Auf rufen von Befehlsdateien ermöglicht, kann der Schritt von den kleinen Subdesigns zu regelmäßigen Anordnungen solcher Subdesigns automatisiert werden, wenn solche in C programmierten Befehlsdateien verwendet werden. Ein besonderes Verbindungsschema (Geometrie) kann sodann in der herkömmlichen Simulationssprache spezifiziert und automatisch auf den Chip übertragen werden. Das Problem einer weitergehend automatischen Optimierung und eines solchen Testens der Subdesigns sei hier vernachlässigt, jedoch sei gesagt, daß ein Maximum von 16-25 CLB als die primitive Einheit für diese Klasse von Computern angesehen wird. XILINX liefert sehr ausgereifte Vali dierungskontrollen beim Einführen von Designs, wenn gewünscht, mit inkrementaler Leitungsführung sowie eine Designregel-Überprüfeinrichtung (Design Rules Checker - DRC), die weitere Fehler auffindet. Es existiert eine automatische Routine, welche die maximalen Verzögerun gen für die Signalausbreitung zwischen Flipflops an gibt, wodurch eine Bestimmung der maximalen Taktge schwindigkeit und weitere Design-Optimierungen möglich sind. Eine Zeitgebungs-Simulation des Designs kann, wenn gewünscht, ebenfalls in Software ausgeführt wer den, wobei die herstellungsbedingten Verzögerungsspezi fikationen zusätzlich zur Berechnung des Signalausbrei tungsverhaltens verwendet werden.Programs written in C have been developed to the systematic set of network connection commands create the ones you want to implement regular topology, like that of prototype design, required are. There was also a ge in C written program designed to the required Sequence of "soft" macro instructions to generate the required to arrange the subdesigns on the LCA are. Because the XILINX LCA Design Editor (XACT) opened the door Calling command files allows the step from the small sub-designs to regular arrangements such subdesigns are automated, if such in C programmed command files can be used. A special connection scheme (geometry) can then in the conventional simulation language specified and are automatically transferred to the chip. The problem a further automatic optimization and one such tests of the subdesigns are neglected here, however, it should be said that a maximum of 16-25 CLB as the primitive unit for this class of computers is seen. XILINX delivers very sophisticated Vali dation controls when introducing designs, if desired, with incremental cable routing as well as a Design Rules Checker - DRC), which finds further errors. There is one automatic routine which the maximum delay conditions for signal propagation between flip-flops gives, whereby a determination of the maximum clock speed and further design optimizations possible are. A timing simulation of the design can if desired, also executed in software the, the manufacturing-related delay spec in addition to calculating the signal spread behavior can be used.

Die Ausbreitung sich wiederholender Einheiten über die Chipgrenzen hinaus erfordert das Einbringen der geome trischen Verteilung der Pins in das C-Programm, welches die logischen Verbindungen zwischen den Subdesigns an der Peripherie zu den IO-Pins des Chips herstellt. Dies ist eine "Once-off"-Aufgabe für jedes Mitglied der 4000-Familie. Die physikalischen Verbindungen zwischen den Chips müssen durch eine PCB erfolgen, d. h., durch harte Verdrahtung; da jedoch die internen Verbindungen den Subdesign-IO und den IO-Pins der Vorrichtung pro grammierbar sind, genügt eine große Zahl paralleler Verbindungen zwischen entsprechenden IO-Pins, um den allgemeinen Charakter der Wiederholung der Designs von Einheiten über mehrere Chips aufrechtzuerhalten. Da nur benachbarte Chips miteinander verbunden sind und eine minimale Auffächerung gegeben ist, sollten Verzögerun gen kurz genug sein, um die Taktgeschwindigkeit des Computers nicht zu beschränken. Es wurde ein C-Kode geschrieben, um den einer vorbestimmten angeordneten Subdesign-IO-Leitung nächsten Verbindungspin aufzufin den und die entsprechenden XACT-Verbindungsbefehle zu erzeugen.The spread of repeating units across the Chip boundaries require the introduction of the geome distribution of the pins in the C program, which the logical connections between the subdesigns the periphery to the IO pins of the chip. This is a "once-off" task for every member of the 4000 family. The physical connections between the chips must be made by a PCB, d. i.e., through hard wiring; however, since the internal connections the subdesign IO and the IO pins of the device pro a large number of parallel ones is sufficient Connections between corresponding IO pins to the general nature of the repetition of the designs of Maintain units across multiple chips. Because only neighboring chips are interconnected and one minimal fanning out should be delayed be short enough to match the clock speed of the Not restricting computers. It became a C code written to be arranged around a predetermined one Subdesign IO line to find next connection pin and the corresponding XACT connection commands produce.

4. A general genetic switch

In den vorangehenden Abschnitten wurde ein großer Teil der allgemeinen Design-Strategie dargelegt. In diesem Abschnitt soll der allgemeine Charakter des Ansatzes verdeutlicht werden, indem das Design einer genetischen Schalteinheit mit separater Schreibadressierung (einem Schreibkopf) dargelegt wird, welche in Abhängigkeit von lokalen Bit-Sequenzcharakteristika (einer der beiden durch die Einheit laufenden Bitströme oder auch ein benachbarter Strom) schrittweise in bezug auf die Lese adresse bewegbar ist. Eine Anordnung solcher Elemente, die Turing-Maschinen ähnlich sind, erlaubt eine gene relle Bearbeitung der eingebetteten genetischen Sequen zen, einschließlich des Einfügens und Löschens inner halb derselben. Fig. 9 zeigt, wie Einheiten in der Größe von 3, 4, 5, 9 und 12 CLB gleichmäßig in der 14 _* 14 4005 LCA eingebettet werden können. Andere Ein heitsgrößen sind mit der 4005 LCA möglich, wenn die Beschränkung bezüglich einer geraden Zahl von Einheiten pro Seite gelockert wird (dies würde jedoch in der end gültigen Anordnung alternierende Varianten des Designs für benachbarte LCA erfordern, um räumliche Isotropie sicherzustellen). In diesem Abschnitt wird eine Einheit mit 9 CLB dargelegt.Much of the general design strategy has been outlined in the previous sections. In this section, the general character of the approach is to be clarified by presenting the design of a genetic switching unit with separate write addressing (a write head), which is dependent on local bit sequence characteristics (one of the two bit streams running through the unit or an adjacent stream) ) can be moved step by step in relation to the reading address. An arrangement of such elements that are similar to Turing machines permits general editing of the embedded genetic sequences, including insertion and deletion within them. Fig. 9 shows how units of 3, 4, 5, 9 and 12 CLB can be embedded evenly in the 14 _* 14 4005 LCA. Other unit sizes are possible with the 4005 LCA if the restriction on an even number of units per side is relaxed (however, in the final arrangement this would require alternating variants of the design for neighboring LCA to ensure spatial isotropy). In this section, a unit with 9 CLB is presented.

Es werden hierbei große Teile der Prototypstruktur bei behalten, welche die genannten 4 CLB betreffen (KAM, MEM, CTL und SWI), und die neue Struktur wird als Er weiterung eingebracht. Die Schreibadresse muß lokal gespeichert sein (wobei 4 Flipflops und 1 Funktions generator in 2 CLB ausreichen), um die hier gewählte 4- Bit-Adresse zu erzeugen. Es wird die zuvor dargestellte normale rotierende Bitsequenz von Adressen verwendet. Die Adresse zum nächsten Taktzeitpunkt kann gegenüber der erwarteten nächsten Adresse inkrementiert oder dekrementiert werden, indem die Zahl der Taktimpulse an den Flipflops von 1 zu 2 oder 0 verändert wird. Die einfachste Art der Vorbewegung der Adresse um einen Schritt über die nächste Adresse hinaus ist die Verwen dung eines Doppelfrequenztakts zur Erzeugung zweier Zyklen in diesem Teil der Schaltung. Das Freigabesignal an den Flipflops kann auf einem niedrigen Pegel gehal ten werden, um eine Aufdatierung zu verhindern und somit eine relative Dekrementierung der Adresse von dem erwarteten nächsten Wert aus zu bewirken. Jedoch ist es möglich, diese zusätzliche Funktion in die beiden CLB zu packen, die zur Erzeugung der lokalen Adresse ohne Verwendung eines Doppelfrequenztakts verwendet werden (welcher zu einer Beschränkung der maximalen Opera tionsfrequenz führen könnte). Dies ist in Fig. 10 dar gestellt. Die Adreßbits 0 und 2 sind in einen CLB ge packt und die Bits 1 und 3 in den anderen. Die schema tische Darstellung zeigt die Dichte, mit der sehr kom plexe Funktionalität ausgedrückt werden kann. Eine Zwei-Bit-Zerlegung des relativen Adreß-Schiebesignals (das erste Bit für ein Dekrement oder kein Dekrement, das zweite für ein Inkrement oder kein Inkrement, wobei das Dekrementieren Vorrang hat, da es mit der Flipflop- Freigabe verbunden ist).Large parts of the prototype structure that relate to the 4 CLBs mentioned (KAM, MEM, CTL and SWI) are retained, and the new structure is introduced as an extension. The write address must be stored locally (whereby 4 flip-flops and 1 function generator in 2 CLB are sufficient) to generate the 4-bit address selected here. The normal rotating bit sequence of addresses shown above is used. The address at the next clock time can be incremented or decremented from the expected next address by changing the number of clock pulses on the flip-flops from 1 to 2 or 0. The simplest way to advance the address one step beyond the next address is to use a double frequency clock to generate two cycles in this part of the circuit. The enable signal on the flip-flops can be kept at a low level to prevent updating and thus to effect a relative decrementing of the address from the expected next value. However, it is possible to pack this additional function into the two CLBs that are used to generate the local address without using a double frequency clock (which could lead to a limitation of the maximum operating frequency). This is shown in FIG. 10. Address bits 0 and 2 are packed in one CLB and bits 1 and 3 in the other. The schematic representation shows the density with which very complex functionality can be expressed. A two-bit decomposition of the relative address shift signal (the first bit for a decrement or no decrement, the second for an increment or no increment, with decrementing taking precedence because it is associated with flip-flop enable).

Das separate Lese- und Schreib-Adressieren eines Bit stroms während dessen Durchlauf durch das zweite 16- Bit-Register (in MEM) erfordert eine Doppeltakt frequenz, um den getrennten Lese- und Schreib-Zugriff zu ermöglichen. Ein 8 : 4-Bit-MUX ist zur alternierenden Auswahl zwischen den Lese- und Schreib-Adressen erfor derlich. Es ist für den Fachmann leicht ersichtlich, wie dieser MUX in 2 CLB vorzusehen ist, so daß hier nicht näher darauf eingegangen wird.The separate read and write addressing of a bit currents during its passage through the second 16- Bit register (in MEM) requires a double clock frequency for separate read and write access to enable. An 8: 4-bit MUX is for alternating Require selection between read and write addresses such. It is readily apparent to those skilled in the art how to provide this MUX in 2 CLB, so here is not discussed in more detail.

Ein endgültiger CLB kann verwendet werden, um die Ab hängigkeit des Inkrementifer-Stationär-Dekrementier-Si gnals von den lokalen und den benachbarten Bitströmen zu erzeugen, wie dies bei der Steuereinheit (CTL) des Pro totyps des genetischen Schalters geschehen ist. Wahl weise ist ein einzelner Funktionsgenerator (H) mit drei Eingängen und ein Flipflop ausreichend, um den Halbfre quenztaktimpuls zu liefern, wenn dies zur Verbesserung der Stabilität erforderlich sein sollte. Diese könnten in einen der beiden MUX-CLB gepackt werden.A final CLB can be used to determine the Ab dependence of the incrementifer-stationary-decrement-Si gnals from the local and neighboring bit streams generate, as in the control unit (CTL) of the Pro totyps of the genetic switch has happened. Choice wise is a single function generator (H) with three Inputs and a flip-flop sufficient to the half fre deliver clock pulse if this is to improve stability should be required. These could be packed in one of the two MUX-CLB.

Das endgültige Design der Einheit weist somit 9 CLB auf (KAM, geteilter R/W MEM, 2 für die Schreib-Adresse, 2 für den MUX, CTL, SWI und die Inkrementier/Dekremen tier-Steuerung). Dies zeigt, wie eine erhebliche neue Funktionalität des Einfügens und Löschens von bis zu 16 Bit mit einbezogen werden kann, während gleichzeitig relativ kleine Einheiten beibehalten werden. Die grund legende Idee des geteilten Adressierens kann zur weite ren Erweiterung der Funktionalität verwendet werden.The final design of the unit is 9 CLB (KAM, shared R / W MEM, 2 for the write address, 2 for the MUX, CTL, SWI and incrementing / decremes tier control). This shows how a significant new one Insert and delete functionality up to 16 Bit can be included while at the same time relatively small units are retained. The reason Legendary idea of shared addressing can go too far extension of the functionality can be used.

5. Experimental suggestions for the genetics processor

Bei der sehr großen Zahl von zu berücksichtigenden Generationen ist der Computer auf ideale Weise ge eignet, Fragen der evolutiven Stagnation und Punktua tion in Zeitmaßstäben anzugehen, die mit der Evolu tionsgeschichte auf der Erde vergleichbar sind. Insbe sondere der schwierige Übergang von unabhängigen Repli katoren zu komplexen funktionell integrierten Komponen ten höherer Organismen kann damit untersucht werden (zugegebenermaßen in der hier modellhaft konstruierten abstrakten logischen genetischen Welt). Darüber hinaus können Fragen der Langzeit-Optimierung in interagieren den Systemen untersucht werden. Zunächst müssen die Mischeigenschaften der sequenzabhängigen Auslagerung in dem Prototyp-Modell und sodann die Populationsdynamik der rein replikativen Kopierfunktionalität untersucht werden, bevor die Variation und die Selektion im gesam ten Prototyp untersucht werden. Bereits hier ergeben sich interessante Fragen bezüglich der Zeitskala für die Optimierung und die evolutionäre Stabilität.With the very large number to be considered The computer has been ideal for generations suitable, questions of evolutionary stagnation and Punktua tion in time scales that are used with the Evolu history on earth are comparable. In particular especially the difficult transition from independent replicas to complex, functionally integrated components It can be used to examine higher organisms (admittedly in the model constructed here abstract logical genetic world). Furthermore questions of long-term optimization can interact in the systems are examined. First of all, the Mixing properties of the sequence dependent outsourcing in the prototype model and then the population dynamics the purely replicative copy functionality examined before the variation and the selection as a whole prototype. Surrender already here interesting questions about the timescale for optimization and evolutionary stability.

Nachdem auf diese Weise das Interesse an solchen auf die Ebene von Giga-Generationen erweiterten logischen Evolutionsprozessen begründet ist, können nunmehr andere biologisch relevante und komplexere Inter aktionsmuster untersucht werden. Andere Untersuchungen der zahlreichen alternativen genetischen Interaktions schemata, die zur Untersuchung funktionaler Interaktio nen in der Evolution verwendbar sind, können sich an schließen. Der in dem vorhergehenden Abschnitt darge stellte, einem Turing-Schalter ähnliche genetische Schalter gibt eine Vorstellung von dem Komplexitäts niveau, das in kleinen Subdesigns erzielbar ist.Having in this way the interest in such the level of Giga generations expanded logical Evolutionary processes can now be established other biologically relevant and complex inter action patterns are examined. Other investigations of the numerous alternative genetic interactions schemes used to study functional interaction can be used in evolution shut down. The Darge in the previous section genetic, similar to a Turing switch Switch gives an idea of the complexity level that can be achieved in small subdesigns.

Claims

1. Device for calculating fine-grained parallel problems that can be described as a plurality of interconnected discretizable processes with

- Several programmable in array form arranged adjacent logic circuits, each having
- several configurable logic circuit blocks which are composed of regularly arranged logic cells and are freely configurable,
- several programmable switch matrices for regularly connecting the logic circuit blocks to one another in accordance with the switch matrix programming,
- a plurality of programmable I / O blocks with I / O connections for regularly connecting the plurality of programmable logic circuits to one another and for connecting at least one of the logic circuits to a host computer,
- - wherein several logic circuit blocks are each grouped into identical groups, each of which is assigned a certain predetermined number of I / O connections, which depends on the size of the group of logic circuit blocks and is used for regularly connecting groups of logic circuit blocks to one another and for connection two groups of logic circuit blocks of adjacent logic circuits are available,
- the geometry of the groups of combined logic circuit blocks is the same from group to group,
- the connection between the groups of logic circuit blocks and the connection between the logic circuit blocks of the groups is the same,
- The connection between the individual logic circuits is such that the number of connections, which is given to the user due to the type of grouping of logic circuit blocks, is maximum, and the connection of groups of logic circuit block groups of adjacent logic circuits is equal to the connection of the United Groups of logic circuit blocks is one and the same logic circuit, and
- A clock generator device, which is connected to the logic circuits and their groups of logic circuit blocks and to their logic cells in such a way that the clocking of all logic cells takes place globally and synchronously.

2. Device for calculating fine-grained parallel problems that can be described as a plurality of interlinked discretizable processes with

- A programmable logic circuit, which has
- several configurable logic circuit blocks which are composed of regularly arranged logic cells and are freely configurable,
- several programmable switch matrices for regularly connecting the logic circuit blocks to one another in accordance with the switch matrix programming,
- at least one programmable I / O block with I / O connections for connecting the logic circuit to a host computer,
- - wherein several logic circuit blocks are each grouped into identical groups, each of which is assigned a certain number of I / O connections, which is predetermined in advance and is dependent on the size of the group of logic circuit blocks, and which are available for regularly connecting groups of logic circuit blocks to one another ,
- the geometry of the groups of combined logic circuit blocks is the same from group to group,
- the connection between the groups of logic circuit blocks and the connection between the logic circuit blocks of the groups is the same,
- The connection between the individual logic circuits is such that the number of connections, which is given to the user due to the type of grouping of logic circuit blocks, is maximum, and
- A clock generator device which is connected to the groups of logic circuit blocks and to their logic cells in such a way that the clocking of all logic cells takes place globally and synchronously.

3. A method for configuring a device for calculating fine-grained parallelizable problems, which can be described as a plurality of interlinked discretizable processes, in which

a predetermined circle of problems to be calculated is represented as a certain geometric arrangement of the same groups of processes, the processes of a problem area being the same or different,
- the processes and the linking of the processes in the problem area are identified and defined,
- Individual processes are combined into groups and these groups are implemented in terms of circuitry by combining and connecting several programmable logic cells to form logic circuit blocks of one or more logic circuits and
- The individual identical groups of logic circuit blocks the connection of the processes of the problem area who connected accordingly, these connections from group to group and within the groups of logic circuit block to logic circuit block are the same.

4. The method according to claim 3, characterized in that multiple programmable logic circuits featured regularly adjacent to each other are seen, the connection pattern between Logic circuit block groups of adjacent logic circuits identical to the connection pattern of the Logic circuit block groups of one and the same Logic circuit is.

5. The method according to claim 3 or 4, characterized records that a connection at least one Logic circuit built up to a host computer becomes.

6. Use of configurable logic circuit Arrays with distributed memories for simulation scientific, business and economist economically and logistically parallelizable fine-grained problems, especially for the simula tion of biotechnological systems, evolving Systems.

7. Use of configurable logic circuit Arrays with distributed memories for calculating Problems of combinatorial optimization esp special of optimal routing.

8. Use of configurable logic circuit Arrays with distributed memories for implementation creation of genetic algorithms, neural Networks or other distributed systems.

9. Use of configurable logic circuit Distributed Storage Arrays to Solve Problems of parallel comparison, Change, sort and / or search for Data strings in databases or the like

10. Use of configurable logic circuit Arrays with distributed storage for development of special digital circuits by Popula dynamic of interactive partial reconfiguration hardware and calculations based on genetic manipulation of local configuration representative data strings.