WO2002009285A2

WO2002009285A2 - System, method and article of manufacture for dynamic programming of one reconfigurable logic device from another reconfigurable logic device

Info

Publication number: WO2002009285A2
Application number: PCT/GB2001/003246
Authority: WO
Inventors: Sanjay Ibrahim Maniku
Original assignee: Celoxica Limited
Priority date: 2000-07-20
Filing date: 2001-07-19
Publication date: 2002-01-31
Also published as: WO2002009285A3; AU2001270876A1

Abstract

A system, method and article of manufacture are provided for dynamically programming a reconfigurable logic device. Configuration data for configuring a first reconfigurable logic device is acquired. A second reconfigurable logic device is utilized to process the configuration data. The second reconfigurable logic device configures the first reconfigurable logic device based on the configuration data.

Description

SYSTEM, METHOD AND ARTICLE OF MANUFACTURE FOR DYNAMIC

PROGRAMMING OF ONE RECONFIGURABLE LOGIC DEVICE FROM

ANOTHER RECONFIGURABLE LOGIC DEVICE

FIELD OF THE INVENTION

The present invention relates to programming of logic devices and more particularly to utilizing one reconfigurable logic device with another logic device.

BACKGROUND OF THE INVENTION

It is well known that software-controlled machines provide great flexibility in that they can be adapted to many different desired purposes by the use of suitable software. As well as being used in the familiar general purpose computers, software-controlled processors are now used in many products such as cars, telephones and other domestic products, where they are known as embedded systems.

However, for a given a function, a software-controlled processor is usually slower than hardware dedicated to that function. A way of overcoming this problem is to use a special software-controlled processor such as a RISC processor which can be made to function more quickly for limited purposes by having its parameters (for instance size, instruction set etc.) tailored to the desired functionality.

Where hardware is used, though, although it increases the speed of operation, it lacks flexibility and, for instance, although it may be suitable for the task for which it was designed it may not be suitable for a modified version of that task which is desired later. It is now possible to form the hardware on reconfigurable logic circuits, such as Field Programmable Gate Arrays (FPGA's) which are logic circuits which can be repeatedly reconfigured in different ways. Thus they provide the speed advantages of dedicated hardware, with some degree of flexibility for later updating or multiple functionality.

In general, though, it can be seen that designers face a problem in finding the right balance between speed and generality. They can build versatile chips which will be software controlled and thus perform many different functions relatively slowly, or they can devise application-specific chips that do only a limited set of tasks but do them much more quickly.

It would be desirable to allow reconfiguration of one reconfigurable logic device from another reconfigurable logic device to provide the speed advantages of dedicated hardware, with the associated degree of flexibility for rapid updating or multiple functionality.

SUMMARY OF THE INVENTION

In accordance with the invention, one reconfigurable logic device is utilized to configure another reconfigurable logic device. Accordingly, a system, method and article of manufacture are provided for dynamically programming a reconfigurable logic device. Configuration data for configuring a first reconfigurable logic device is acquired. The source of the configuration data may be from a number of sources, including a network, server, within the second device, a local data source such as memory connected to the second reconfigurable logic device, or any other data source. A second reconfigurable logic device is utilized to process the configuration data. The second reconfigurable logic device configures the first reconfigurable logic device based on the configuration data.

In one embodiment of the present invention, the reconfigurable logic devices are field programmable gate arrays. The processing of the configuration data can be executed simultaneously with at least one other process on the second reconfigurable logic device.

The communication medium between the first and second reconfigurable logic devices can be a select map interface, a bus, a network such as a local area network or the Internet, a peripheral component interconnect (PCI), a universal serial bus (USB), and/or any other arbitrary bus of communication medium. Preferably, the second reconfigurable logic device checks for errors during configuration of the first reconfigurable logic device.

The invention extends to a computer program comprising program code means for executing the method. BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be better understood when consideration is given to the following detailed description of embodiments thereof. Such description makes reference to the annexed drawings wherein:

Figure 1 is a schematic diagram of a hardware implementation of one embodiment of . the present invention;

Figure 2 is a flow diagram of a process for dynamically programming a reconfigurable logic device;

Figure 3 depicts a structure of a Handel-C module that allows the re-configuration of one of the FPGAs on a hardware board from another FPGA;

Figure 4 is a diagrammatic overview of a board of the resource management device according to an illustrative embodiment of the present invention;

Figure 5 depicts a JTAG chain for the board of Figure 4;

Figure 6 shows a structure of a Parallel Port Data Transmission System according to an embodiment of the present invention;

Figure 7 is a flowchart that shows the typical series of procedure calls when receiving data;

Figure 8 is a flow diagram depicting the typical series of procedure calls when transmitting data; Figure 9 is a flow diagram illustrating several processes running in parallel; and

Figure 10 is a block diagram of an FPGA device according to an exemplary embodiment ofthe present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

A preferred embodiment of a system in accordance with the present invention is preferably practiced in the context of a personal computer such as an IBM compatible personal computer, Apple Macintosh computer or UNIX based workstation. A representative hardware environment is depicted in Figure 1, which illustrates a typical hardware configuration of a workstation in accordance with a preferred embodiment having a central processing unit 110, such as a microprocessor, and a number of other units interconnected via a system bus 112. The workstation shown in Figure 1 includes a Random Access Memory (RAM) 114, Read Only Memory (ROM) 116, an I/O adapter 118 for connecting peripheral devices such as disk storage units 120 to the bus 112, a user interface adapter 122 for connecting a keyboard 124, a mouse 126, a speaker 128, a microphone 132, and/or other user interface devices such as a touch screen (not shown) to the bus 112, communication adapter 134 for connecting the workstation to a communication network (e.g., a data processing network) and a display adapter 136 for connecting the bus 112 to a display device 138. The workstation typically has resident thereon an operating system such as the Microsoft Windows NT or Windows/95 Operating System (OS), the IBM OS/2 operating system, the MAC OS, or UNIX operating system. Those skilled in the art will appreciate that the present invention may also be implemented on platforms and operating systems other than those mentioned.

A preferred embodiment is written using JAVA, C, and the C++ language and utilizes object oriented programming methodology. Object oriented programming (OOP) has become increasingly used to develop complex applications. As OOP moves toward the mainstream of software design and development, various software solutions require adaptation to make use ofthe benefits of OOP. A need exists for these principles of OOP to be applied to a messaging interface of an electronic messaging system such that a set of OOP classes and objects for the messaging interface can be provided. OOP is a process of developing computer software using objects, including the steps of analyzing the problem, designing the system, and constructing the program. An object is a software package that contains both data and a collection of related structures and procedures. Since it contains both data and a collection of structures and procedures, it can be visualized as a self-sufficient component that does not require other additional structures, procedures or data to perform its specific task. OOP, therefore, views a computer program as a collection of largely autonomous components, called objects, each of which is responsible for a specific task. This concept of packaging data, structures, and procedures together in one component or module is called encapsulation.

In general, OOP components are reusable software modules which present an interface that conforms to an object model and which are accessed at run-time through a component integration architecture. A component integration architecture is a set of architecture mechanisms which allow software modules in different process spaces to utilize each others capabilities or functions. This is generally done by assuming a common component object model on which to build the architecture. It is worthwhile to differentiate between an object and a class of objects at this point. An object is a single instance ofthe class of objects, which is often just called a class. A class of objects can be viewed as a blueprint, from which many objects can be formed.

OOP allows the programmer to create an object that is a part of another object. For example, the object representing a piston engine is said to have a composition- relationship with the object representing a piston. In reality, a piston engine comprises a piston, valves and many other components; the fact that a piston is an element of a piston engine can be logically and semantically represented in OOP by two objects.

OOP also allows creation of an object that "depends from" another object. If there are two objects, one representing a piston engine and the other representing a piston engine wherein the piston is made of ceramic, then the relationship between the two objects is not that of composition. A ceramic piston engine does not make up a piston engine. Rather it is merely one kind of piston engine that has one more limitation than the piston engine; its piston is made of ceramic. In this case, the object representing the ceramic piston engine is called a derived object, and it inherits all ofthe aspects ofthe object representing the piston engine and adds further limitation or detail to it. The object representing the ceramic piston engine "depends from" the object representing the piston engine. The relationship between these objects is called inheritance.

When/the object or class representing the ceramic piston engine inherits all ofthe aspects ofthe objects representing the piston engine, it inherits the thermal characteristics of a standard piston defined in the piston engine class. However, the ceramic piston engine object overrides these ceramic specific thermal characteristics, which are typically different from those associated with a metal piston. It skips over the original and uses new functions related to ceramic pistons. Different kinds of piston engines have different characteristics, but may have the same underlying functions associated with it (e.g., how many pistons in the engine, ignition sequences, lubrication, etc.). To access each of these functions in any piston engine object, a programmer would call the same functions with the same names, but each type of piston engine may have different/overriding implementations of functions behind the same name. This ability to hide different implementations of a function behind the same name is called polymorphism and it greatly simplifies communication among objects.

With the concepts of composition-relationship, encapsulation, inheritance and polymorphism, an object can represent just about anything in the real world. In fact, one's logical perception ofthe reality is the only limit on determining the kinds of things that can become objects in object-oriented software. Some typical categories are as follows: Objects can represent physical objects, such as automobiles in a traffic-flow simulation, electrical components in a circuit-design program, countries in an economics model, or aircraft in an air-traffic-control system.

Objects can represent elements ofthe computer-user environment such as windows, menus or graphics objects.

An object can represent an inventory, such as a personnel file or a table ofthe latitudes and longitudes of cities.

An object can represent user-defined data types such as time, angles, and complex numbers, or points on the plane.

With this enormous capability of an object to represent just about any logically separable matters, OOP allows the software developer to design and implement a computer program that is a model of some aspects of reality, whether that reality is a physical entity, a process, a system, or a composition of matter. Since the object can represent anything, the software developer can create an object which can be used as a component in a larger software project in the future.

If 90% of a new OOP software program consists of proven, existing components made from preexisting reusable objects, then only the remaining 10% ofthe new software project has to be written and tested from scratch. Since 90% already came from an inventory of extensively tested reusable objects, the potential domain from which an error could originate is 10% ofthe program. As a result, OOP enables software developers to build objects out of other, previously built objects.

This process closely resembles complex machinery being built out of assemblies and sub-assemblies. OOP technology, therefore, makes software engineering more like hardware engineering in that software is built from existing components, which are available to the developer as objects. All this adds up to an improved quality ofthe software as well as an increased speed of its development. Programming languages are beginning to fully support the OOP principles, such as encapsulation, inheritance, polymorphism, and composition-relationship. With the advent ofthe C++ language, many commercial software developers have embraced OOP. C++ is an OOP language that offers a fast, machine-executable code. Furthermore, C++ is suitable for both commercial-application and systems- programming projects. For now, C++ appears to be the most popular choice among many OOP programmers, but there is a host of other OOP languages, such as Smalltalk, Common Lisp Object System (CLOS), and Eiffel. Additionally, OOP capabilities are being added to more traditional popular computer programming languages such as Pascal.

The benefits of object classes can be summarized, as follows:

• Objects and their corresponding classes break down complex programming problems into many smaller, simpler problems.

• Encapsulation enforces data abstraction through the organization of data into small, independent objects that can communicate with each other. Encapsulation protects the data in an object from accidental damage, but allows other objects to interact with that data by calling the object's member functions and structures.

• Subclassing and inheritance make it possible to extend and modify objects through deriving new kinds of objects from the standard classes available in the system. Thus, new capabilities are created without having to start from scratch.

• Polymorphism and multiple inheritance make it possible for different programmers to mix and match characteristics of many different classes and create specialized objects that can still work with related objects in predictable ways.

• Class hierarchies and containment hierarchies provide a flexible mechanism for modeling real- world objects and the relationships among them. • Libraries of reusable classes are useful in many situations, but they also have some limitations. For example:

• Complexity. In a complex system, the class hierarchies for related classes can become extremely confusing, with many dozens or even hundreds of classes. • Flow of control. A program written with the aid of class libraries is still responsible for the flow of control (i.e., it must control the interactions among all the objects created from a particular library). The programmer has to decide which functions to call at what times for which kinds of objects.

• Duplication of effort. Although class libraries allow programmers to use and reuse many small pieces of code, each programmer puts those pieces together in a different way. Two different programmers can use the same set of class libraries to write two programs that do exactly the same thing but whose internal structure (i.e., design) may be quite different, depending on hundreds of small decisions each programmer makes along the way. Inevitably, similar pieces of code end up doing similar things in slightly different ways and do not work as well together as they should.

Class libraries are very flexible. As programs grow more complex, more programmers are forced to reinvent basic solutions to basic problems over and over again. A relatively new extension of the class library concept is to have a framework of class libraries. This framework is more complex and consists of significant collections of collaborating classes that capture both the small scale patterns and major mechanisms that implement the common requirements and design in a specific application domain. They were first developed to free application programmers from the chores involved in displaying menus, windows, dialog boxes, and other standard user interface elements for personal computers.

Frameworks also represent a change in the way programmers, think about the interaction between the code they write and code written by others. In the early days of procedural programming, the programmer called libraries provided by the operating system to perform certain tasks, but basically the program executed down the page from start to finish, and the programmer was solely responsible for the flow of control. This was appropriate for printing out paychecks, calculating a mathematical table, or solving other problems with a program that executed in just one way.

The development of graphical user interfaces began to turn this procedural programming arrangement inside out. These interfaces allow the user, rather than program logic, to drive the program and decide when certain actions should be performed. Today, most personal computer software accomplishes this by means of an event loop which monitors the mouse, keyboard, and other sources of external events and calls the appropriate parts ofthe programmer's code according to actions that the user performs. The programmer no longer determines the order in which events occur. Instead, a program is divided into separate pieces that are called at unpredictable times and in an unpredictable order. By relinquishing control in this way to users, the developer creates a program that is much easier to use. Nevertheless, individual pieces ofthe program written by the developer still call libraries provided by the operating system to accomplish certain tasks, and the programmer must still determine the flow of control within each piece after it's called by the event loop. Application code still "sits on top of the system.

Even event loop programs require programmers to write a lot of code that should not need to be written separately for every application. The concept of an application framework carries the event loop concept further. Instead of dealing with all the nuts and bolts of constructing basic menus, windows, and dialog boxes and then making these things all work together, programmers using application frameworks start with working application code and basic user interface elements in place. Subsequently, they build from there by replacing some ofthe generic capabilities ofthe framework with the specific capabilities ofthe intended application. Application frameworks reduce the total amount of code that a programmer has to write from scratch. However, because the framework is really a generic application that displays windows, supports copy and paste, and so on, the programmer can also relinquish control to a greater degree than event loop programs permit. The framework code takes care of almost all event handling and flow of control, and the programmer's code is called only when the framework needs it (e.g., to create or manipulate a proprietary data structure).

A programmer writing a framework program not only relinquishes control to the user (as is also true for event loop programs), but also relinquishes the detailed flow of control within the program to the framework. This approach allows the creation of more complex systems that work together in interesting ways, as opposed to isolated programs, having custom code, being created over and over again for similar problems.

Thus, as is explained above, a framework basically is a collection of cooperating classes that make up a reusable design solution for a given problem domain. It typically includes objects that provide default behavior (e.g., for menus and windows), and programmers use it by inheriting some of that default behavior and overriding other behavior so that the framework calls application code at the appropriate times.

There are three main differences between frameworks and class libraries: • Behavior versus protocol. Class libraries are essentially collections of behaviors that you can call when you want those individual behaviors in your program. A framework, on the other hand, provides not only behavior but also the protocol or set of rules that govern the ways in which behaviors can be combined, including rules for what a programmer is supposed to provide versus what the framework provides. • Call versus override. With a class library, the code the programmer instantiates objects and calls their member functions. It's possible to instantiate and call objects in the same way with a framework (i.e., to treat the framework as a class library), but to take full advantage of a framework's reusable design, a programmer typically writes code that overrides and is called by the framework.

The framework manages the flow of control among its objects. Writing a program involves dividing responsibilities among the various pieces of software that are called by the framework rather than specifying how the different pieces should work together. • Implementation versus design. With class libraries, programmers reuse only implementations, whereas with frameworks, they reuse design. A framework embodies the way a family of related programs or pieces of software work. It represents a generic design solution that can be adapted to a variety of specific problems in a given domain. For example, a single framework can embody the way a user interface works, even though two different user interfaces created with the same framework might solve quite different interface problems.

Thus, through the development of frameworks for solutions to various problems and programming tasks, significant reductions in the design and development effort for software can be achieved. A preferred embodiment ofthe invention utilizes HyperText Markup Language (HTML) to implement documents on the Internet together with a general-purpose secure communication protocol for a transport medium between the client and the Newco. HTTP or other protocols could be readily substituted for HTML without undue experimentation. Information on these products is available in T. Berners-Lee, D. Connoly, "RFC 1866: Hypertext Markup Language - 2.0" (Nov. 1995); and R. Fielding, H, Frystyk, T. Berners-Lee, J. Gettys and J.C. Mogul, "Hypertext Transfer Protocol ~ HTTP/1.1 : HTTP Working Group Internet Draft" (May 2, 1996). HTML is a simple data format used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic sr semantics that are appropriate for representing information from a wide range of domains. HTML has been in use by the World-Wide Web global information initiative since 1990. HTML is an application of ISO Standard 8879; 1986 Information Processing Text and Office Systems; Standard Generalized Markup Language (SGML).

To date, Web development tools have been limited in their ability to create dynamic Web applications which span from client to server and interoperate with existing computing resources. Until recently, HTML has been the dominant technology used in development of Web-based solutions. However, HTML has proven to be inadequate in the following areas:

Poor performance;

Restricted user interface capabilities;

Can only produce static Web pages; • Lack of interoperability with existing applications and data; and

Inability to scale.

Sun Microsystem's Java language solves many of he client-side problems by:

• Improving performance on the client side; • , Enabling the creation of dynamic, real-time Web applications; and

• Providing the ability to create a wide variety of user interface components.

With Java, developers can create robust User Interface (UI) components. Custom "widgets" (e.g., real-time stock tickers, animated icons, etc.) can be created, and client- side performance is improved. Unlike HTML, Java supports the notion of client-side validation, offloading appropriate processing onto the client for improved performance. Dynamic, real-time Web pages can be created. Using the above-mentioned custom UI components, dynamic Web pages can also be created. Sun's Java language has emerged as an industry-recognized language for "programming the Internet." Sun defines Java as: "a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multithreaded, dynamic, buzzword-compliant, general-purpose programming language. Java supports programming for the Internet in the form of platform-independent Java applets." Java applets are small, specialized applications that comply with Sun's Java Application Programming Interface (API) allowing developers to add "interactive content" to Web documents (e.g., simple animations, page adornments, basic games, etc.). Applets execute within a Java-compatible browser (e.g., Netscape Navigator) by copying code from the server to client. From a language standpoint, Java's core feature set is based on C++. Sun's Java literature states that Java is basically, "C++ with extensions from Objective C for more dynamic method resolution."

Another technology that provides similar function to JAVA is provided by Microsoft and ActiveX Technologies, to give developers and Web designers wherewithal to build dynamic content for the Internet and personal computers. ActiveX includes tools for developing animation, 3-D virtual reality, video and other multimedia content. The tools use Internet standards, work on multiple platforms, and are being supported by over 100 companies. The group's building blocks are called ActiveX Controls, small, fast components that enable developers to embed parts of software in hypertext markup language (HTML) pages. ActiveX Controls work with a variety Of programming languages including Microsoft Visual C++, Borland Delphi, Microsoft Visual Basic programming system and, in the future, Microsoft's development tool for Java, code named "Jakarta." ActiveX Technologies also includes ActiveX Server Framework, allowing developers to create server applications. One of ordinary skill in the art readily recognizes that ActiveX could be substituted for JAVA without undue experimentation to practice the invention. Handel-C is a programming language that enables a software or hardware engineer to target directly FPGAs (Field Programmable Gate Arrays) in a similar fashion to classical microprocessor cross-compiler development tools, without recourse to a Hardware Description Language. Thereby allowing the designer to directly realise the raw real-time computing capability ofthe FPGA.

Handel-C is designed to enable the compilation of programs into synchronous hardware; it is aimed at compiling high level algorithms directly into gate level hardware.

The Handel-C syntax is based on that of conventional C so programmers familiar with conventional C will recognize almost all the constructs in the Handel-C language.

Sequential programs can be written in Handel-C just as in conventional C but to gain the most benefit in performance from the target hardware its inherent parallelism must be exploited.

Handel-C includes parallel constructs that provide the means for the programmer to exploit this benefit in his applications. The compiler compiles and optimizes Handel-C source code into a file suitable for simulation or a netlist which can be placed and routed on a real FPGA.

The simulator allows a user to test a program without using real hardware. It can display the state of every variable (register) in your program at every clock cycle if required, the simulation steps and the number of cycles simulated being under program control.

Optionally the source code that was executed at each clock cycle as well as the program state may be displayed in order to assist in the debugging ofthe source code. Further debugging options are provided in the toolset, notably the 'Logic Estimator'. This tool displays the source code in a color highlighted form which relates to the logic depth and usage. So providing feedback to the designer for further optimizations.

Dynamic Reconfiguration of One Reconfigurable Logic Device from Another Reconfigurable Logic Device

Figure 2 is a flow diagram of a process 200 for dynamically programming a reconfigurable logic device. In operation 202, configuration data for configuring a first reconfigurable logic device is acquired. The source ofthe configuration data may be from a number of sources, including a network, server, within the second device, a local data source such as memory connected to the second reconfigurable logic device, or any other data source. A second reconfigurable logic device is utilized in operation 204 to process the configuration data. In operation 206, the second reconfigurable logic device configures the first reconfigurable logic device based on the configuration data.

In one embodiment ofthe present invention, the reconfigurable logic devices are field programmable gate arrays. The processing ofthe configuration data can be executed simultaneously with at least one other process on the second reconfigurable logic device.

The communication medium between the first and second reconfigurable logic devices can be a select map interface, a bus, a network such as a local area network or the Internet, a peripheral component interconnect (PCI), a universal serial bus (USB), and/or any other arbitrary bus of communication medium. Preferably, the second reconfigurable logic device checks for errors during configuration ofthe first reconfigurable logic device. The ability to change the functionality of an FPGA on-the-fly lies at the heart of reconfigurable computing. The remaining portion of this section describes a Handel-C module that allows the re-configuration of one ofthe FPGAs on a hardware board from another FPGA. See Figure 4 and the related discussion, below, for a description of such a hardware board.

Some ofthe features of FPGA to FPGA reconfiguration include:

• Independent process running in parallel to any existing processes.

• Accepts an entire .bit file as is and performs all the required parsing and reformatting internally

• Allows the controlling process to abort the configuration at any time.

• Works with Virtex FPGAs used in board configuration.

Operation

Figure 3 gives an overview ofthe structure ofthe Handel-C module. The module implements SelectMAP configuration of FPGAs. Two concurrent processes are used, a controlling process 302 which acquires the configuration data, in the form of a .bit file, from an external source and then passes is to a configuration process 304 which controls the actual configuration of the FPGA.

The .bit file is passed to the configuration process on demand, one byte at a time. The configuration process performs the parsing ofthe .bit file and all other processing ofthe byte stream required to bring the data into a suitable form for configuration.

Messages are passed between the two processes using 2 3 -bit status registers. Initialization, resource arbitration and status checking are performed using these registers. Each process has write access to one of these registers. The configuration pins ofthe FPGA being configured are accessed via the Blizzard board CPLD, by setting various FP commands. The strobing ofthe configuration clock while the /write signal is low is also carried out by the CPLD and controlled through the use of FP commands. At certain times during configuration the configuration process requires sole access to the Flash RAM data pins as these are also connected to the configuration pins ofthe other FPGA. This becomes a problem if the other process is acquiring the configuration data from the Flash RAM. In order to avoid any resource conflicts the configuration process will set the BUSY flag in the configuration status registers. While this flag is set the controlling process must not try to access any resources via the CPLD or write any FP commands to the CPLD. As soon as sole access is no longer required the BUSY flag in the status register is de-asserted.

The various states ofthe status registers and the flow ofthe processes is discussed in the next section.

For more information on the actual configuration process refer to the Xilinx Virtex FPGA Datasheet, available from the Xilinx website http ://www.xilinx. com, and which is herein incorporated by reference.

Usage

Include a file, configure.h, in the design. This contains the FPGA_Configure_Proc() procedure which controls the configuration process. It has the following interface:

FPGA_Configure_Proc(config_chan, status_in, status_out);

config_chan - an 8- bit wide channel into which the individual bytes ofthe .bit file are written. status_in - the control status register which allows other processes to send status messages to the configuration process.

status_out - the configuration status register which allows the configuration process to communicate with the outside world.

These should be defined in the configure.h file.

This process is executed in parallel with the process that generates the configuration data. When the configuration process is ready status_out will be set to CONFIG_IDLE. Once this is the case the configuration is started by setting status_in to CONFIG_START_0 or CONFIG_START_l (depending on which FPGA you wish to configure) and start writing the configuration data (the .bit file) into the channel. After each byte wait for 1 clock cycle before checking the status_out register. The configuration process will set status_out to CONFIG_BUSY while the data just input is being processed.

Do not attempt to access the Flash RAM or any other resource requiring the use of FP commands while the process is in this state. When the process is ready for the next byte status_out will be set to CONFIG_GET_NEXT. At this point the controlling process can access any ofthe resources and when ready write the next byte to the channel. This cycle is repeated until status_out equals either CONFIGJDONE or CONFIG_ERROR, indicating either that configuration has been successful or that an error has occurred. At this point the controlling process sets status_in to CONFIG_IDLE and waits until the configuration process responds by setting status_out to CONFIG_IDLE. This indicates that the configuration process has completed and that the Flash RAM and other resources can now be accessed as normal. The configuration process can be started again at any time by setting status_in to CONFIG_START_0 or CONFIG_START_l and following the above procedure. The controlling process can abort the configuration operation by setting status_in to CONFIG_ABORT and holding it until the configuration process responds by setting status_out to CONFIG_ERROR. During this time a prialt() statement should be used to write random data to the data channel should it be required.

Performance

An illustrative rate at which reconfiguration data can be passed into a Virtex FPGA is 50 MHz. For a VirtexlOOO, which requires a stream of 765,968 bytes for full configuration, this gives a minimum configuration time of around 15ms. Preferably, a minimum of 3 Handel-C clock cycles are allowed per configuration byte, and has an overhead of around 200 cycles for the parsing and error checking procedures.

The module would have to be clocked at a Handel-C rate of 150 MHz to exceed the 50 MHz limit to reconfiguration. The 150 MHz figure however far exceeds the rate at which FPGA circuits can currently be clocked and hence the performance ofthe module is limited by the rate at which the data can be provided and other characteristics of the circuit. •

In a preferred embodiment, the FPGAs are clocked at a Handel-C rate of 20 MHz and this allows the FPGAs to be configured in around 115ms, provided the configuration data can be provided at that rate.

Illustrative Device Development Platform

Figure 4 is a diagrammatic overview of a system board 400 according to an illustrative embodiment ofthe present invention. It should be noted that the following description is set forth as an illustrative embodiment ofthe present invention and, therefore, the . various embodiments ofthe present invention should not be limited by this description. As shown, the board can include two Xilinx Virtex™ 2000e FPGAs 402, 404, an Intel StrongARM SAl 110 processor 406, a large amount of memory 408, 410 and a number of I/O ports 412. Its main features are listed below:

Two XCV 2000e FPGAs each with sole access to the following devices: Two banks (1 MB each) of SRAM (256Kx32 bits wide) Parallel port Serial port ATA port

The FPGAs share the following devices: VGA monitor port Eight LEDs 2 banks of shared SRAM (also shared with the CPU)

USB interface (also shared with the CPU)

The FPGAs are connected to each other through a General Purpose I/O (GPIO) bus, a 32 bit SelectLink bus and a 32 bit Expansion bus with connectors that allow external devices to be connected to the FPGAs. The FPGAs are mapped to the memory ofthe StrongARM processor, as variable latency I/O devices.

The Intel StrongARM SAl 110 processor has access to the following: 64Mbytes of SDRAM 16Mbytes of FLASH memory

LCD port IRD A port Serial port It shares the USB port and the shared SRAM with the FPGAs. In addition to these the board also has a Xilinx XC95288XL CPLD to implement a number of glue logic functions and to act as a shared RAM arbiter, variable rate clock generators and JTAG and MultiLinx SelectMAP support for FPGA configuration..

A number of communications mechanisms are^'possible between the ARM processor and the FPGAs. The FPGAs are mapped into the ARM's memory allowing them to be accessed from the ARM as through they were RAM devices. The FPGAs also share two 1 MB banks of SRAM with the processor, allowing DMA transfers to be performed. There are also a number of direct connections between the FPGAs and the ARM through the ARM's general purpose I/O (GPIO) registers.

The board is fitted with 4 clocks, 2 fixed frequency and 2 PLLs. The PLLs are programmable by the ARM processor.

The ARM is configured to boot into Angel, the ARM onboard debugging momtor, on power up and this can be connected to the ARM debugger on the host PC via a serial link. This allows applications to be easily developed on the host and run on the board.

There are a variety of ways by which the FPGAs can be configured. These are:

• By an external host using JTAG or MultiLinx SelectMAP

• By the ARM processor, using data stored in either ofthe Flash RAMs or data acquired through one to the serial ports (USB, IRDA or RS232).

• By the CPLD from power-up with data stored at specific locations in the FPGA FlashRAM.

• By one of the other FPGAs. StrongARM

The board is fitted with an Intel SAl 110 Strong ARM processor. This has 64Mbytes of. SDRAM connected to it locally and lόMbytes of Intel StrataFLASH™ from which the processor may boot. The processor has direct connections to the FPGAs, which are mapped to its memory map as SRAM like variable latency I/O devices, and access to various I/O devices including USB, IRDA, and LCD screen connector and serial port. It also has access to 2MB of SRAM shared between the processor and the FPGAs.

Memory Map

The various devices have been mapped to the StrongARM memory locations as shown in Table 1:

Table 1

Address Location

The suggested settings for the StrongARM' s internal memory configuration registers are shown in Table 2: Table 2

Where the acronyms are defined as: MDCNFG - DRAM configuration register

MSCO, 1,2 - Static memory control registers for banks 0,1,2

MDREF -DRAM refresh control register

MDCAS - CAS rotate control register for DRAM banks

The CPU clock should be set to 191.7MHz (CCF = 9). Please refer to the StrongARM Developers Manual, available from Intel Corporation, for further information on how to access these registers.

FLASH memory

The Flash RAM is very slow compared to the SRAM or SDRAM. It should only be used for booting from; it is recommended that code be copied from Flash RAM to SDRAM for execution. If the StrongARM is used to update the Flash RAM contents then the code must not be running from the Flash or the programming instructions in the Flash will get corrupted. SDRAM

A standard 64MB SDRAM SODIMM is fitted to the board and this provides the bulk of the memory for the StrongARM. Depending upon the module fitted the SDRAM may not appear contiguous in memory.

Shared RAM banks

These RAM banks are shared with both FPGAs. This resource is arbitrated by the

CPLD and may only be accessed once the CPLD has granted the ARM permission to do so. Requesting and receiving permission to access the RAMs is carried out through CPLD register 0x10. Refer to the CPLD section of this document for more information about accessing the CPLD and its internal registers from the ARM processor. See Appendix D.

FPGA access

The FPGAs are mapped to the ARM's memory and the StrongARM can access the FPGAs directly using the specified locations. These locations support variable length accesses so the FPGA is able to prevent the ARM from completing the access until the FPGA is ready to receive or transmit the data. To the StrongARM these will appear as static memory devices, with the FPGAs having access to the Data, Address and Chip Control signals ofthe RAMs.

The FPGAs are also connected to the GPIO block ofthe processor via the SAIO bus. The GPIO pins map to the SAIO bus is shown in Table 3.

Table 3

GPIO pins SAIO lines

Of these SAIO[0:10] connect to the FPGAs and SAIO[0:14] connect to connector CN25 on the board. The FPGAs and ARM are also able to access 2MB of shared memory, allowing DMA transfers between the devices to be performed.

I/O Devices

The following connectors are provided: • LCD Interface connector with backlight connector

• IRDA connector (not 5V tolerant)

• GPIO pins (not 5 V tolerant)

• Serial port

• Reset button to reboot the StrongARM

The connections between these and the ARM processor are defined below in Tables 4-

7:

Table 4: ARM - LCD connections (CN27)

LCD connector

ARM pin Description wππtm

Table 5: ARM IRDA connections (CN8A)

■ IHIKl connector pin ARM pin Description

Table 6: ARM GPIO - CN20AP connections

Table 7: ARM - Serial Port connections (CN23)

Serial Port

ARM pin Description connector pin no.

The serial port is wired in such away that two ports are available with a special lead if handshaking isn't required.

Angel

Angel is the onboard debug monitor for the ARM processor. It communicates with the host PC over the serial port (a null modem serial cable will be required). The ARM is setup to automatically boot into Angel on startup - the startup code in the ARM's Flash RAM will need to be changed if this is not required.

When Angel is in use 32MBs of SDRAM are mapped to 0x00000000 in memory and are marked as cacheable and bufferable (except the top 1MB). The Flash memory is remapped to 0x40000000 and is read only and cacheable. The rest of memory is mapped one to one and is not cacheable or bufferable.

Under Angel it is possible to run the FPGA programmer software which takes a bitfile from the host machine and programs the FPGAs with it. As the .bit files are over 1MB in size and a serial link is used for the data transfer this is however a very slow way of configuring the FPGAs.

Virtex FPGA's

Two Virtex 2000e FPGAs are fitted to the board. They may be programmed from a variety of sources, including at power up from the FLASH memory. Although both devices feature the same components they have different pin definitions; Handel-C header files for the two FPGAs are provided.

One ofthe devices has been assigned 'Master', the other 'Slave'. This is basically a means of identifying the FPGAs, with the Master having priority over the Slave when requests for the shared memory are processed by the CPLD. The FPGA below the serial number is the Master.

One pin on each ofthe FPGAs is defined as the Master/Slave define pin. This pin is pulled to GND on the Master FPGA and held high on the Slave. The pins are:

Master FPGA : C9 Slave FPGA: D33

The following part and family parameters should be used when compiling a Handel-C program for these chips:

set family = Xilinx4000E set part = "XV2000e-β-fg680" ;

Clocks Two socketed clock oscillator modules may be fitted to the board. CLKA is fitted with a 50 MHz oscillator on dispatch and the CLKB socket is left to be fitted by the user should other or multiple frequencies to required. A +5V oscillator module should be used for CLKB.

Two on board PLLs, VCLK and MCLK, provide clock sources between 8MHz and 100MHz (125MHz may well be possible). These are programmable by the ARM processor. VCLK may also be single stepped by the ARM.

This multitude of clock sources allows the FPGAs to be clocked at different rates, or to let one FPGA have multiple clock domains.

The clocks are connected to the FPGAs, as described in Table 8 and Appendices A and B.

Table 8 gsmn Master FPGA Slave FPGA

Programming the FPGAs

The FPGAs may be programmed from a variety of sources:

• Parallel III cable JTAG

• MultiLinx JTAG • MultiLinx SelectMAP

• ARM processor

• From the other FPGA

• Power up from FLASH memory ( FPGA FLASH memory section).

When using any ofthe JTAG methods of programming the FPGAs you must ensure that the Bitgen command is passed the option "-g startupclk:jtagclk ". You will also need a jed file for the CPLD or a .bsd file, which may be found in

"Xilinx\xc9500xl\data\xc95288XL_tql44.bsd". The StrongARM also requires a .bsd file, which may be found on the Intel website http://developer.intel.com/design/ strong bsdl/sal 110 b 1.bsd. When downloaded this file will contain HTML headers and footers which will need to be removed first. Alternatively, copies ofthe required .bsd files are included on the supplied disks.

The JTAG chain 500 for the board is shown in Figure 5.

The connections when using the Xilinx Parallel III cable and the 'JTAG Programmer' are set forth in Table 9:

Table 9: Parallel III Cable JTAG

CN24 trin number JTAG Connector

With the Xilinx cables it may be easier to fit the flying ends into the Xilinx pod so that a number of cables may be connected to the board in one go.

MultiLinx JTAG

The board has support for programming using MultiLinx. CN3 is the only connector required for JTAG programming with MultiLinx and is wired up as described in Table 10. (Note that not used signals may be connected up to the MultiLinx if required.)

Table 10

MultiLinx SelectMAP

JP3 must be fitted when using MulitLinx SelectMap to configure the FPGAs. This link prevents the CPLD from accessing the FPGA databus to prevent bus contention. This also prevents the ARM accessing the FPGA Flash memory and from attempting FPGA programming from power up. Connectors CN3 and CN4 should be used for Master FPGA programming and CN10 and CNl 1 for programming the Slave FPGA. See Table 11-12.

Table 11

CN3/CN10 pin MultiLinx

Table 12

In practice MultiLinx SelectMap was found to be a very tiresome method of progran ming the FPGAs due to the large number of flying leads involved and the fact that the lack of support for multi FPGA systems means that the leads have to connected to a different connector for configuring each ofthe FPGA.

ARM processor

The ARM is able to program each FPGA via the CPLD. The FPGAs are set up to be configured in SelectMap mode. Please refer to the CPLD section of this document and Xilinx Datasheets on Virtex configuration for more details of how to access the prograrrrming pins ofthe FPGAs and the actual configuration process respectively. An ARM program for configuring the FPGAs with a .bit file from the host PC under Angel is supplied. This is a very slow process however as the file is transferred over a serial link. Data could also be acquired from a variety of other sources including USB and IRDA or the onboard Flash RAMs and this should allow an FPGA to be configured in under 0.5 seconds.

Configuring one FPGA from the other FPGA

One FPGA is able to configure the other through the CPLD in a manner similar to when the ARM is configuring the FPGAs. Again, please refer to the CPLD section of this document and the Xilinx data sheets for more information. Configuring on power up from Flash Memory

The board can be set to boot the FPGAs using configuration data stored in this memory on power up. The following jumpers should be set if the board is required to boot from the Flash RAM:

• JP1 should be fitted if the Master FPGA is to be programmed from power up

• JP2 should be fitted if the Slave FPGA is to be programmed from power up.

If these jumpers are used the Flash RAM needs to be organized as shown in Table 13:

Table 13

The configuration data must be the configuration bit stream only, not the entire .bit file. The .bit file contains header information which must first be stripped out and the bytes ofthe configuration stream as stored in the .bit file need to be mirrored - i.e. a configuration byte stored as 00110001 in the bit file needs to be applied to the FPGA configuration data pins are 10001100.

For more information on configuration of Xilinx FPGAs and the .bit format refer to the ' appropriate Xilinx datasheets. ^•

FPGA FLASH Memory

16 MB of Intel StrataFLASH ™ Flash memory is available to the FPGAs. This is shared between the two FPGAs and the CLPD and is connected directly to them. The Flash RAM is much slower than the SRAMs on the board, having a read cycle time of 120ns and a write cycle of around 80ns.

The FPGAs are able to read and write to the memory directly, while the ARM processor has access to it via the CPLD. Macros for reading and writing simple commands to the Flash RAM's. internal state machine are provided in the klib.h macro library (such as retrieving identification and status information for the RAM), but it is left up to the developer to enhance these to implement the more complex procedures such as block programming and locking. The macros provided are intended to illustrate the basic mechanism for accessing the Flash RAM.

When an FPGA requires access to the Flash RAM it is required to notify the CLPD by setting the Flash Bus Master signal low. This causes the CPLD to tri-state its Flash RAM pins to avoid bus contention. Similarly, as both FPGAs have access to the Flash RAM over a shared bus, care has to be taken that they do not try and access the memory at the same time (one or both ofthe two FPGAs may be damaged if they are driven against each other). It is left up to the developer to implement as suitable arbitration system if the sharing of this RAM across both FPGAs is required. The connections between this RAM and the FPGAs are set forth in Table 14:

Table 14

Local SRAM

Each FPGA has two banks of local SRAM, arranged as 256K words x 32bits. They have an access time of 15ns.

In order to allow single cycle accesses to these RAMs it is recommended that the external clock rate is divided by 2 or 3 for the Handel-C clock rate. I.e. include the following line in your code:

set clock = external divide "A20" 2; // or higher

For an external divide 2 clock rate the RAM should be defined as:

macro expr sram_local bankO spec

{ offchip = 1, egate = 1, data = DATA_pins, addr = ADDRESS_pins, cs = { "E2", "FI", "J4", "F2", "H3"}, we = { "H4" }, oe = { "El" }

^}; If the clock is divided by more than 2 replace the we gate parameter with

westart=2, welength=l,

The connections to these RAMs are as follows:

Table 15

Master Slave FPGA Master FPGA Slave FPGA FPGA isπroiii EEuππ

Shared SRAM Each FPGA has access two banks of shared SRAM, again arranged as 256K words x 32bits. These have a 16ns access time. A series of quick switches are used to switch these RAMs between the FPGAs and these are controlled by the CPLD which acts as an arbiter. To request access to a particular SRAM bank the REQUEST pin should be pulled low. The code should then wait until the GRANT signal is pulled low by the CPLD in response.

The Handel-C code to implement this is given below:

// define the Request and Grant interfaces for the Shared SRAM unsigned 1 shared_bankO_request=l; unsigned 1 shared_bankl_request=l;

interface bus_out ( ) sharedbkOreg (shared_bankO_request) with sram_shared_bankO_request_pin; interface bus_out ( ) sharedbklreg (shared_bankl_request) with sram_shared_bankl_request_pin; interface bus_clock_in (unsigned 1) shared_bankO_grant ( ) with sram_shared_bankO_grant_pin; interface bus_clock_in (unsigned 1) shared_bankl_grant ( ) with sram_shared_bankl_grant_pin;

// Access to a shared RAM bank { shared_bankO_request=0 ; while ( shared_bankO_grant . in) delay;

}

// perform accesses .... // release bank shared_bankO_request=l;

The RAMs should be defined in the same manner as the local RAMs. (See above.)

The connections to the shared RAMs are given in Table 16:

Table 16

Connections to the StrongARM processor

The FPGAs are mapped to the StrongARMs memory as variable latency I/O devices, and are treated as by the ARM as though they were 1024 entry by 32bit RAM devices. The address, data and control signals associated with these RAMs are attached directly to the FPGAs. The manner in which the FPGAs interact with the ARM using these signals is left to the developer.

The connections are as shown in Table 17:

Table 17

ARM pin Master FPGA pin Slave FPGA pin

Some ofthe ARM's general purpose I/O pins are also connected to the FPGAs. These go through connector CN25 on the board, allowing external devices to be connected to them (see also ARM section). See Table 18.

Table 18

SAIO bus ARM GPI/O Master FPGASlave FPG

(ARMGPIO) pins

CPLD Interfacing

Listed in Table 19 are the pins used for setting the Flash Bus Master signal and FP_COMs. Refer to the CPLD section for greater detail on this.

Table 19

Local I/O devices available to each FPGA

ATA port

33 FPGA I/O pins directly connect to the ATA port. These pins have 100Ω series termination resistors which make the port 5 V IO tolerant. These pins may also be used as I/O if the ATA port isn't required. See Table 20. Table 20

Parallel port

A conventional 25pin D-type connector and a 26way box header are provided to access this port. The I/O pins have 100Ω series termination resistors which also make the port 5 V I/O tolerant. These pins may also be used as I/O if the parallel port isn't required. See Table 21. See also Appendix C.

Table 21

PP line no. Parallel port pin Master FPGA pin Slave FPGA pin

Serial port

A standard 9pin D-type connector with a RS232 level shifter is provided. This port may be directly connected to a PC with a Null Modem cable. A box header with 5V tolerant I/O is also provided. These signals must NOT be connected to a standard RS232 interface without an external level shifter as the FPGAs may be damaged. See Table 22.

Table 22

Serial line no. Serial port pin no. Master FPGA pin Slave FPGA pin

Serial Header

Each FPGA also connects to a 10 pin header (CN9/CN16). The connections are shown in Table 23:

Table 23

(CN9/CN16) Master Slave

Header pin no. FPGA pin FPGA pin

Shared I/O Devices

These devices are shared directly between the two FPGAs and great care should be taken as to which FPGA accesses which device at any given time.

VGA Monitor

A standard 15pin High Density connector with an on-board 4bit DAC for each colour (Red, Green, Blue) is provided. This is connected to the FPGAs as set forth in Table 24:

Table 24

VGA line Master FPGA pin Slave FPGA pin

LEDs

Eight ofthe twelve LEDs on the board are connected directly to the FPGAs. See Table 25. ^' Table 25

■BMΪ1 Master FPGA pin Slave FPGA pin

GPIO connector

A 50way Box header with 5V tolerant I/O is provided. 32 data bits ('E' bus) are available and two clock signals. The connector may be used to implement a SelectLink to another FPGA. +3 V3 and +5V power supplies are provided via fuses. See Table 26.

Table 26

Expansion GPI/O Master Slave F bus line header pin FPGA pin pin

SelectLink Interface

There is another 32bit general purpose bus connecting the two FPGAs which may be used to implement a SelectLink interface to provide greater bandwidth between the two devices. The connections are set forth in Table 27:

Table 27

USB

The FPGAs have shared access to the USB chip on the board. As in the case ofthe Flash RAM, the FPGA needs to notify the CPLD that it has taken control ofthe USB chip by setting the USBMaster pin low before accessing the chip. For more information on the USB chip refer to the USB section of this document. Table 28

CPLD

The board is fitted with a Xilinx XC95288XL CPLD which provides a number of Glue Logic functions for shared RAM arbitration, interfacing between the ARM and FPGA and configuration ofthe FPGAs. The later can be used to either configure the FPGAs from power up or when one FPGA re-configures the other (Refer to section 2.3.2 . 'Programming the FPGAs'). A full listing of ABEL code contained in the CPLD can be found in Appendix D .

Shared SRAM bank controller The CPLD implements a controller to manage the shared RAM banks. A Request -

Grant system has been implemented to allow each SRAM bank to be accessed by one of the three devices. A priority system is employed if more than one device requests the SRAM bank at the same time.

Highest priority ARM

Master FPGA Lowest priority : Slave FPGA

The FPGAs request access to the shared SRAM by pulling the corresponding REQUEST signals low and waiting for the CPLD to pull the GRANT signals low in response. Control is relinquished by setting the REQUEST signal high again. The ARM processor is able to request access to the shared SRAM banks via some registers within the CPLD - refer to the next section.

CPLD Registers for the ARM

The ARM can access a number of registers in the CPLD, as shown in Table 29:

Table 29

CPLD Registers for the FPGA 's

The FPGAs can access the CPLD by setting a command on the FPCOM pins. Data is transferred on the FPGA (Flash RAM) databus. See Table 30.

Table 30

0x7 No Operation

These commands will mainly be used when one FPGA reconfigures the other. Refer to the FPGA configuration section and the appropriate Xilinx datasheets for more information.

CPLD LEDs

Four LED's are directly connected to the CPLD. These are used to indicate the following:

DO DONE LED for the Master FPGA Flashes during programming

Dl DONE LED for the Slave FPGA Flashes during programming

D2 Not used

D3 Flashes until an FPGA becomes programmed

Other Devices

USB

The board has a SCAN Logic SL1 IH USB interface chip, capable of full speed 12Mbits/s transmission. The chip is directly connected to the FPGAs and can be accessed by the ARM processor via the CLPD (refer to the CPLD section of this document for further information).

The datasheet for this chip is available at http://www.scanlogic.com/pdf/sll lh /si l lhspec.pdf

PSU

This board maybe powered from an external 12V DC power supply through the 2.1mm

DC JACK. The supply should be capable of providing at least 2.4A. Handel-C Library Reference

Introduction

This section describes the Handel-C libraries written for the board. The klib.h library provides a number of macro procedures to allow easier access to the various devices on the board, including the shared memory, the Flash RAM, the CPLD and the LEDs. Two other libraries are also presented, parallel_port.h and serial_port.h, which are generic Handel-C libraries for accessing the parallel and serial ports and communicating over these with external devices such as a host PC.

Also described is an example program which utilizes these various libraries to implement an echo server for the parallel and serial ports.

Also described here is a host side implementation of ESL's parallel port data transfer protocol, to be used with the data transfer macros in parallel_port.h.

The klib.h Library

Shared RAM arbitration

A request - grant mechanism is implemented to arbitrate the shared RAM between the two FPGAs and the ARM processor. Four macros are provided to make the process of requesting and releasing the individual RAM banks easier.

KRequestMemoryBankOO ; KReqnestMemoryBankl (); KReleaseMemoryBankOQ; KReleaseMemoryBankl 0 ;

Arguments

None.

Return Values

None.

Execution Time KRequestMemoryBank#() requires at least one clock cycle. KReleaseMemoryBank#() takes one clock cycle.

Description

These macro procedures will request and relinquish ownership of their respective memory banks. When a request for a memory bank is made the procedure will block the thread until access to the requested bank has been granted.

Note: The request and release functions for different banks may be called in parallel with each other to gain access to or release both banks in the same cycle.

Flash RAM Macros

These macros are provided as a basis through which interfacing to the Flash RAM can be carried out. The macros retrieve model and status information from the RAM to illustrate how the read/write cycle should work. Writing actual data to the Flash RAM is more complex and the implementation of this is left to the developer.

KSetFPGAFBMQ KReleaseFPGAFBMQ Arguments

None.

Return Values

None.

Execution Time

Both macros require one clock cycle.

Description

Before any communication with the Flash RAM is carried out the FPGA needs to let the CPLD know that it is taking control ofthe Flash RAM. This causes the CLPD to Instate the Flash bus pins, avoiding resource contention. KSetFPGAFBMQ sets the Flash Bus Master (FBM) signal and KReleaseFPGAFBM() releases it. This macro is generally called by higher level macros such as KReadFlash() or KWriteFlash().

Note: These two procedures access the same signals and should NOT be called in parallel to each other.

KEnableFlashQ KDisableFlashQ

Arguments None.

Return Values

None. Execution Time

Both macros require one clock cycle.

Description These macros raise and lower the chip-select signal ofthe Flash RAM and tri-state the FPGA Flash RAM lines (data bus, address bus and control signals). This is necessary if the Flash RAM is to be shared between the two FPGAs as only one chip can control the Flash at any give time. Both FPGAs trying to access the Flash RAM simultaneously can cause the FPGAs to 'latch up' or seriously damage the FPGAs or Flash RAM chip. This macro is generally called by higher level macros such as KReadFlash() or KWriteFlash().

Note: These macros access the same signals and should NOT be called in parallel with each other.

KWriteFlash(address, data)

KReadFlashfaddress, data)

Arguments 24 bit address to be written or read. 8 bit data byte.

Return Values

KReadFlashQ returns the valμe ofthe location specified by address in the data parameter.

Execution Time

Both procedures take 4 cycles. The procedures are limited by the timing characteristics ofthe Flash RAM device. A read cycle takes at least 120ns, a write cycle 100ns. The procedures have been set up for a Handel-C clock of 25MHz.

Description

The macros read data from and write data to the address location specified in the address parameter.

KSetFlashAddr ess (address)

Arguments 24 bit address value.

Return Values

None.

Execution Time

This macro requires one clock cycle.

Description

The macro sets the Flash address bus to the value passed in the address parameter. This macro is used when a return value ofthe data at the specified location is not required, as may be the case when one FPGA is configuring the other with data from the Flash RAM since the configuration pins ofthe FPGAs are connected directly to the lower 8 data lines of the Flash RAM. KReadFlashID(flash_componentχD, manufacturer _ID) KReadFlashStatus (status)

Arguments 8 bit parameters to hold manufacturer, component and status information.

Return Values

The macros return the requested values in the parameters passed to it.

Execution Time

KReadFlashStatusQ, requires 10 cycles, KReadFlashlDO requires 14 cycles.

Description The macros retrieve component and status information from the Flash RAM. This is done by performing a series of writes and reads to the internal Flash RAM state machine.

Again, these macros are limited by the access time ofthe Flash RAM and the number of cycles required depends on rate the design is clocked at. These macros are designed to be used with a Handel-C clock rate of 25MHz or less.

Although a system is in place for indicating to the CPLD that the Flash RAM is in use (by using the KSetFPGAFBM() and KReleaseFPGAFBM() macros) it is left up to the developers to devise a method of arbitration between the two FPGAs. As all the Flash RAM lines are shared between the FPGAs and there is no switching mechanism as in the shared RAM problems will arise if both FPGAs attempt to access the Flash RAM simultaneously. Note: These macros access the same signals and should NOT be called in parallel with each other. Also note that these macros provide a basic interface for communication with the Flash RAM. For more in-depth please refer to the Flash RAM datasheet.

CPLD Interfacing

The following are macros for reading and writing to the CPLD status and control registers:

KReadCPLDStatus (status)

KWriteCPLDControl(control)

Arguments

8 bit word

Return Values

KReadStatusO returns an 8 bit word containing the bits ofthe CPLD's status register. (Refer to the CPLD section for more information)

Execution Time

Both macros require six clock cycles, at a Handel-C clock rate of 25MHz or less.

Description

These macros read the status register and write to the control register ofthe CPLD.

KSetFPCOM(j _command)

Arguments

3 bit word. Return Values

None.

Execution Time

This macro requires three clock cycles, at a Handel-C clock rate of 25MHz or less.

Description

This macro is provided to make the sending of FP_COMMANDs to the CPLD easier. FP_COMMANDs are used when the reconfiguration of one FPGA from the other is desired (refer to the CPLD section for more information).

The different possible fp_command (s) are set forth in Table 31:

Table 31

FP_SET_IDLE Sets CPLD to idle

FP_READ_STATUS Read the status register ofthe CPLD

FP_WRITE_CONTROL Write to the control register of the CPLD

FP_CCLK_LOW Set the configuration clock low

FP_CCLK_HIGH Set the configuration clock high

e.g.

KSetFPCOM(FP_READ_STATUS); KSetFPCOM(FP_SET_IDLE);

Note: These macros access the same signals and should NOT be called in parallel with each other. LEDs

KSetLEDs(maskByte)

Arguments

8 bit word.

Return Values

None.

Execution Time

One clock cycle.

Description This macro procedure has been provided for controlling the LEDs on the board. The maskByte parameter is applied to the LEDs on the board, with a 1 indicating to turn a light on and a 0 to turn it off. The MSB of maskByte corresponds to D12 and the LSB to D5 on the board.

Note: Only one of the FPGAs may access this function. If both attempt to do so the FPGAs will drive against each other and may 'latch-up', possibly damaging them.

Using the Parallel Port

Introduction

The library parallel_port.h contains routines for accessing the parallel port. This implements a parallel port controller as an independent process, modeled closely on the parallel port interface found on an IBM PC. The controller allows simultaneous access to the control, status and data ports (as defined on an IBM PC) ofthe parallel interface. These ports are accessed by reading and writing to channels into the controller process. The reads and writes to these channels are encapsulated in other macro procedures to provide an intuitive API.

Figure 6 shows a structure of a Parallel Port Data Transmission System 600 according to an embodiment ofthe present invention. An implementation of ESL's parallel data transfer protocol has also been provided, allowing data transfer over the parallel port, to and from a host computer 602. This is implemented as a separate process which utilizes the parallel port controller layer to implement the protocol. Data can be transferred to and from the host by writing and reading from channels into this process. Again macro procedure abstractions are provided to make the API more intuitive.

A host side application for data transfer under Windows95/98 and NT is provided. Data transfer speeds of around 100 Kbytes/s can be achieved over this interface, limited by the speed ofthe parallel port.

Accessing the parallel port directly. The 17 used pins ofthe port have been split into data, control and status ports as defined in the IBM PC parallel port specification. See Table 32.

Table 32

The parallel port controller process needs to be run in parallel with those part ofthe program wishing to access the parallel port. It is recommended that this is done using a par{} statement in the mainQ procedure.

The controller procedure is:

parallel_port( pp_data_send_channel, pp_data_read_channel, pp_control_port_read, pp_status_port_read, pp_status_port_yvrite);

where the parameters are all channels through which the various ports can be accessed.

Parallel Port Macros

It is recommended that the following macros be used to access the parallel port rather than writing to the channels directly.

PpWriteData yte)

PpReadData yte)

Arguments

Unsigned 8 bit word.

Return Values

PpReadDataQ returns the value ofthe data pins in the argument byte.

Execution Time Both macros require one clock cycle.

Description

These write the argument byte to the register controlling the data pins of the port, or return the value ofthe data port within the argument byte respectively, with the MSB ofthe argument corresponding to data[7]. Whether or not the value is actually placed on the data pins depends on the direction settings ofthe data pins, controlled by bit 6 ofthe status register.

PpReadControl( control _port) Arguments

Unsigned 4 bit word.

Return Values

PpReadControlO returns the value ofthe control port pins in the argument byte.

Execution Time

This macro requires one clock cycle.

Description

This procedure returns the value ofthe control port. The 4 bit nibble is made up of [nSelect_in @ Init @ nAutofeed @ nStrobe], where nSelect_in is the MSB.

PpReadStatus (status _port)

PpSetStatus (status _port)

Arguments

Unsigned 6 bit word.

Return Values

PpReadStatus() returns the value ofthe status port register in the argument byte.

Execution Time This macro requires one clock cycle.

Description

These read and write to the status port. The 6 bit word passed to -the macros is made up of [pp_direction @ busy @ nAck @ PE @ Select @ nError], where pp_direction indicates the direction ofthe data pins (i.e. whether they are in send [1] or receive [0] mode). It is important that this bit is set correctly before trying to write or read data from the port using PpWriteData() or PpReadData().

Note: All of the ports may be accessed simultaneously, but only one operation may be performed on each at any given time. Calls dealing with a particular port should not be made in parallel with each other.

Transferring data to and from the host PC

The library parallel jport.h also contains routines for transferring data to and from a host PC using ESL's data transfer protocol. The data transfer process, ρp_coms(), which implements the transfer protocol should to be run in parallel to the parallel port controller process, again preferably in the main par{} statement. A host side implementation ofthe protocol, ksendexe, is provided also.

pp_coms(pp_send_chan, — channel to write data tp when sending pp_recv_chan, — channel to read data from when receiving pp_command, - channel to write commands to pp_error) — channel to receive error messaged from.

The following macros provide interfaces to the data transfer process:

OpenPP (error) - open the parallel port for data transfer ClosePP (error) — close the port

Note: Make sure that the host side application, ksend.exe, is running. The macros will try and handshake with the host and will block (or timeout) until a response is received. Also note that the following macros all access the same process and should NOT be called in parallel with each other.

Arguments Unsigned 2 bit word.

Return Values

The argument will return an error code indicating the success or failure ofthe command.

Execution Time

This macro requires one clock cycle.

Description These two macros open and close the port for receiving or sending data. They initiate a handshaking procedure to start communications with the host computer.

SetSendMode (error) —set the port to send mode

SetRecvMode (error) — set the port to receive mode

Arguments

Unsigned 2 bit word.

Return Values The argument will return an error code indicating the success or failure ofthe command.

Execution Time

This macro requires one clock cycle. Description

These set the direction of data transfer and the appropriate mode should be set before attempting to send or receive data over the port.

SendPP (byte, error) — send a byte over the port ReadPP(byte, error) - read a byte from the port

Arguments Unsigned 8 bit and unsigned 2 bit words.

Return Values

ReadPPQ returns the 8 bit data value read from the host in the byte parameter.

Both macros will return an error code indicating the success or failure ofthe command.

Execution Time

How quickly these macros execute depend on the Host. The whole sequence of handshaking actions for each byte need to be completed before the next byte can be read or written.

Description

These two macros will send and receive a byte over the parallel port once this has been initialized and placed in the correct mode.

The procedures return a two bit error code indicating the result ofthe operation. These codes are defined as: #defme PP_NO_ERROR 0

#define PP_HOST_BUFFER_NOT_FINISHED 1 #define PP_OPEN_TIMEOUT 2

5 Note: SendPP and ReadPP will block the thread until a byte is transmitted or the timeout value is reached. If you need to do some processing while waiting for a communication use a 'prialt' statement to read from the global pp_recv_chan channel or write to the pp_send_chan channel.

10. Typical macro procedure calls during Read / Write

Figure 7 is a flowchart that shows the typical series of procedure calls 700 when receiving data. Figure 8 is a flow diagram depicting the typical series of procedure calls . 800 when transmitting data.

15

The Ksend application

The ksend.exe application is designed to transfer data to and from the board FPGAs over the parallel port. It implements the ESL data transfer protocol. It is designed to 0 communicate with Has>pp_comsQ process running on the FPGA. This application is still in the development stage and may have a number of bugs in it.

Two versions ofthe program exist, one for Windows95/98 and one for WindowsNT. The NT version requires the GenPort driver to be installed. Refer to the GenPort 5 documentation for details of how to do this.

In its current for the ksend application is mainly intended for sending data to the board, as is done in the esl_boardtest program. It is how ever also able to accept output form the board. Again, please refer to the application note or the ksend help (invoked by calling ksend without any parameters) for further details.

Serial Port

Introduction

Each FPGA has access to a RS232 port allowing it to be connected to a host PC. A driver for transferring data to and from the FPGAs from over the serial port is contained in the file serial_port.h.

RS232A Interface

There are numerous ways of implementing RS232 interfacing, depending on the capabilities ofthe host and device and what cables are used. This interface is implemented for a cross wired null modem cable which doesn't require any hardware handshaking - the option of software flow control is provided, though this probably won't be necessary as the FPGA will be able to deal with the data at a much faster rate than the host PC can provide it. When soft flow control is used the host can stop and start the FPGA transmitting data by sending the XON and XOFF tokens. This is only necessary when dealing with buffers that can fill up and either side needs to be notified.

Serial port macros

Serial port communications have been implemented as a separate process that runs in parallel to the processes that wish to send/ receive data. Figure 9 is a flow diagram illustrating several processes 902, 904 running in parallel.

The serial port controller process is serial_port(sp_input, sp_output);

where sp_input and sp_output are n bit channels through which data can be read or written out form the port. These reads and writes are again encapsulated in separate macro procedures to provide the user with a more intuitive API.

SpReadDataφyte) - read a data byte from the port SpWriteData yte) - write a byte to the port

Arguments n bit words, where n is the number of data bits specified.

Return Values

SpReadData() returns an n bit value corresponding to the transmitted byte in the argument.

Execution Time

The execution time depends to the protocol and the baud rate being used.

Description

These procedures send and receive, data over the serial port using the RS232 protocol. The exact communications protocol must be set up using a series of ^defines before including the serial jport.h library. To use an 8 data bit, 1 start and 1 stop bit protocol at 115200 baud on a null modem cable with no flow control the settings would be: #define BAUD_RATE 115200 #define START_BIT ((unsigned 1)0) #define STOP_BIT ^' ((unsigned 1)1) #define NUM DATA BITS 8

Other options are:

For soft flow control:

#define SOFTFLOW #define XON <ASCII CHARACTER CODE> #define XOFF <ASCII CHARACTER CODE>

RTS/CTS flow control:

#define HARDFLO

The default settings are:

Baud rate 9600

Start bit ^" 0

Stop bit 1

Num. data bits 8

XON 17

XOFF 19

Flow control off

Any ofthe standard baud rate settings will work provided that the Handel-C clock rate is at least 8 times higher than the baud rate. Also ensure that the macro CLOCK_RATE is defined, this is generally found in the pin definition header for each ofthe FPGAs. - e. g.

#define CLOCK_RATE 25000000 // define the clock rate

Example Program

Shown here is an example Handel-C program that illustrates how to use the parallel and serial port routines found in the serial_port.h and parallel_port.h libraries. The program implements a simple echo server on the serial and parallel ports. The SetLEDsQ function from the klib.h library is used to display the ASCII value received over the serial port on the LEDs in binary.

// Include the necessary header files

#define MASTER #ifdef MASTER

#include "KompressorMaster .h"

#else

#include "KompressorSlave.h"

#endif

#include "stdlib.h"

#include "parallel_port .h" tinclude "klib.h"

// Define the protocol and include the file

#define BAUD_RATE 9600

#define NϋM_DATA_BITS 8

#define NULLMODEM

#include "serial_port . h" //////////////////////////////////

// Process to echo any data received by the parallel port

// to verify it is working properly

macro proc EchoPPO

{ ^' unsigned 8 pp_data_in; unsigned 2- error with {warn = 0}; unsigned 1 done; .

OpenPP (error) ; // initiate contact with host while ( !done)

{ // ^' read a byte

SetRecvMode (error) ; ^■

ReadPP (pp_data_in, error);

// echo it SetSendMode (error) ;

WritePP (pp_data__in, error); } ClosePP (error) ; // close connection

}

//////////////////////////////////

// Process to echo any data received by the serial port

// to verify it is working properly. We are always // listening on the serial port so there is no need to open it.

macro proc EchoSPO

{ unsigned 8 serial_in data;

while (1) {

SpReadData (serial_in_data) ; // read a byte from the serial port

SetLEDs (serial_in_data) ; SpWriteData (serial_in_data) ; // write it back out

} delay; // avoid combinational cycles

} void main (void)

{ while (1)

^■{ par

{

EchoPP(); //Parallel port thread

EchoSPO; // Serial port thread

////// Start the services //////// // Parallel Port stuff pp_coms (pp_send_chan, pp_recv_chan, pp_command, pp_error) ;

parallel_port (pp_data_send_channel, pp_data_read_channel, pp_control_port_read, pp_status_port_read,pp_status_port_write) ;

// Serial port stuff // serial_port (sp_input, sp output);

}

The code can be compiled for either FPGA by simple defining or un-defining the MASTER macro - lines 1 to 5

More Information

Useful information pertaining to the subjects of this described herein can be found in the following: The Programmable Logic Data Book, Xilinx 1996; Handel-C Preprocessor Reference Manual, Handel-C Compiler Reference Manual, and Handel-C Language Reference Manual, Embedded Solutions Limited 1998; and Xilinx Datasheets and Application notes, available from the Xilinx website http ://www.xilinx. com, and which are herein incorporated by reference.

Illustrative Embodiment According to an embodiment ofthe present invention, a device encapsulates the Creative MP3 encoder engine in to an FPGA device. Figure 10 is a block diagram of an FPGA device 1000 according to an exemplary embodiment ofthe present invention. The purpose ofthe device is to stream audio data directly from a CD 1002 or CDRW into the FPGA, compress the data, and push the data to a USB host 1004 which delivers it to the OASIS(Nomad 2) decoder. The entire operation of this device is independent of a PC.

The design ofthe FPGA uses the "Handel-C" compiler, described above, from Embedded Solutions Limited (ESL). The EDA tool provided by ESL is intended to rapidly deploy and modify software algorithms through the use of FPGAs without the need to redevelop silicon. Therefore the ESL tools can be utilized as an alternative to silicon development and can be used in a broader range of products.

Feature Overview

The FGPA preferably contains the necessary logic for the following:

- MP3 Encoder 1006

- User Command Look Up Table - play

- pause

- eject

- stop

- skip song (forward / reverse) - scan song (forward / reverse)

- record (rip to MP3) -> OASIS Unit

- ATAPI

- command and control

- command FIFO - data bus

- command bus

- (2) 64 sample FIFOs (16bit * 44.100 kHz)

- Serial Port (16550 UART) optionally EEPROM Interface (I2C & I2S) - USB Interface to host controller

- SDRAM controller

- 32-bit ARM or RISC processor

In addition to the FPGA the following is preferably provided: - USB Host / Hub controller (2 USB ports)

- 4MB SDRAM

- 128K EEPROM 9-pin serial port

- 6 control buttons. - 40-Pin IDE Interface for CD or CDRW

Interfaces

ATAPI (IDE) Interface

User Interface

USB Interface

While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any ofthe above described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

1. A method for programming a reconfigurable logic device, comprising the steps of:

(a) acquiring configuration data for configuring a first reconfigurable logic device; and

(b) utilizing a second reconfigurable logic device for processing the configuration data, wherein the second reconfigurable logic device configures the first reconfigurable logic device based on the configuration data.

2. A method as recited in claim 1, wherein the reconfigurable logic devices are field programmable gate arrays.

3. A method as recited in claim 1 or 2, wherein the processing ofthe configuration data is executed simultaneously with at least one other process on the second reconfigurable logic device.

4. A method as recited in claim 1, 2 or 3, wherein the configuration data is acquired from at least one of a network server, a local data source, and the second reconfigurable logic device.

5. A method as recited in claim 1, 2, 3 or 4, wherein the second reconfigurable logic device communicates with the first reconfigurable logic device via at least one of a select map interface, a bus, a network, a peripheral component interconnect, and a universal serial bus.

,

6. A method as recited in claim 1, 2, 3, 4, or 5, wherein the second reconfigurable logic device checks for errors during configuration ofthe first reconfigurable logic device.

7. A computer program embodied on a computer readable medium for programming a reconfigurable logic device, comprising:

(a) a code segment that acquires configuration data for configuring a first reconfigurable logic device; and (b) a code segment that instructs a second reconfigurable logic device for processing the configuration data, wherein the second reconfigurable logic device configures the first reconfigurable logic device based on the configuration data.

8. A computer program as recited in claim 7, wherein the reconfigurable logic devices are field programmable gate arrays.

9. A computer program as recited in claim 7 or 8, wherein the processing ofthe configuration data is executed simultaneously with at least one other process on the second reconfigurable logic device.

10. A computer program as recited in claim 7, 8 or 9, wherein the configuration data is acquired from at least one of a network server, a local data source, and the second reconfigurable logic device.

11. A computer program as recited in claim 7, 8, 9, or 10, wherein the second reconfigurable logic device communicates with the first reconfigurable logic device via at least one of a select map interface, a bus, a network, a peripheral component interconnect, and a universal serial bus.

12. A computer program as recited in claim 7, 8, 9, 10 or 11, wherein the second reconfigurable logic device checks for errors during configuration ofthe first reconfigurable logic device.

13. A system for programming a reconfigurable logic device, comprising:

(a) a first reconfigurable logic device;

(b) logic that acquires configuration data for configuring a first reconfigurable logic device; and (c) a second reconfigurable logic device for processing the configuration data, wherein the second reconfigurable logic device configures the first reconfigurable logic device based on the configuration data.

14. A system as recited in claim 13, wherein the reconfigurable logic devices are field programmable gate arrays.

15. A system as recited in claim 13 or 14, wherein the processing ofthe configuration data is executed simultaneously with at least one other process on the second reconfigurable logic device.

16. A system as recited in claim 13, 14 or 15, wherein the configuration data is acquired from at least one of a network server, a local data source, and the second reconfigurable logic device.

17. A system as recited in claim 13, 14, 15 or 16, wherein the second reconfigurable logic device communicates with the first reconfigurable logic device via at least one of a select map interface, a bus, a network, a peripheral component interconnect, and a universal serial bus.

18. A system as recited in claim 13, 14, 15, 16, or 17, wherein the second reconfigurable logic device checks for errors during configuration ofthe first reconfigurable logic device.