US20100242014A1 - Symmetric multi-processor operating system for asymmetric multi-processor architecture - Google Patents

Symmetric multi-processor operating system for asymmetric multi-processor architecture Download PDF

Info

Publication number
US20100242014A1
US20100242014A1 US12/405,555 US40555509A US2010242014A1 US 20100242014 A1 US20100242014 A1 US 20100242014A1 US 40555509 A US40555509 A US 40555509A US 2010242014 A1 US2010242014 A1 US 2010242014A1
Authority
US
United States
Prior art keywords
processors
processor
processing
instructions
functions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/405,555
Inventor
Xiaohan Zhu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Sony Electronics Inc
Original Assignee
Sony Corp
Sony Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp, Sony Electronics Inc filed Critical Sony Corp
Priority to US12/405,555 priority Critical patent/US20100242014A1/en
Assigned to SONY ELECTRONICS INC., SONY CORPORATION reassignment SONY ELECTRONICS INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ZHU, XIAOHAN
Publication of US20100242014A1 publication Critical patent/US20100242014A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/451Code distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This invention pertains generally to microprocessor devices and computing, and more particularly to multi-processing on an asymmetric architecture.
  • SMP symmetric multi-processor
  • microprocessors are ubiquitous and found in various forms doing various functions at various levels in the hierarchy, within a system or even within a single embedded system. It will be noted that in these diverse multi-level computing environments each of the microprocessor is optimized for different purposes to achieve the best performance and power requirement.
  • SOC System on Chip
  • one processor is perhaps optimized for performing digital signal processing (e.g., as a DSP) for video decoding, while another processor is directed at running applications and decoding audio.
  • An architecture of this form is referred to as an asymmetric multi-processor (AMP) or (ASP) architecture.
  • AMP asymmetric multi-processor
  • ASP asymmetric multi-processor
  • each processor may have completely different instruction sets and memory configurations.
  • one processor may have SIMD (Single instruction multiple data) instructions, while other processors may only provide standard RISC instructions.
  • Some processors may have specialized local memory and DMA engines attached.
  • SMP based operating systems e.g., Linux
  • all the processors have exactly the same instruction set and they share a unified memory view.
  • the system normally ensures cache coherency among the caches, so that when one processor modified the contents of an address, all other processors in the system immediately see the same changes.
  • This cache coherency scheme is often accomplished by a Snoop protocol.
  • the Snoop protocol is a TCP-aware link layer protocol designed to improve the performance of TCP over networks of wired and single-hop wireless links.
  • the most computational intensive process is that of video analysis and processing, in particular if the original video is in high definition.
  • a high performance processor is required, for instance, to first decode the original video sequence and then analyze each video and audio frame.
  • video and audio are normally encoded by specialized hardware, while the generic computing power for the microprocessor on the device can be very limited.
  • the camcorders represent many device which require processors tailored for specific forms of processing, whereby conventional SMP multiprocessing approaches are not applicable.
  • the present invention is directed at optimizing the use of processing resources within an AMP architecture. Toward this end the ability is provided for assigning tasks to the underlying AMP processing elements as in an SMP operating system, yet while retaining the ability to run programs optimized for asymmetric processors within the system.
  • processing power within the AMP environment can be cast according to the invention into an SMP architecture which takes advantage of the processor computing power of the asymmetric processing elements.
  • the present invention in essence creates a symmetric multi-processor operating system, or environment, for an asymmetric multi-processor architecture which contains processors having processor specific functionality.
  • both the typical hardware and software of the AMP environment must be modified. Instructions sets within processors having different functionalities are modified so that a portion of the functionality of these processors overlaps within a common set of instructions.
  • the invention also teaches compiler, assembler, and linker modifications which allow the binary code to be generated for execution on these diverse processors, and the execution of generic tasks, using the shared instructions, on any of the processors within the multiple processors. It will be noted, however, that the code loaded on one or more of these processors can be changed, such as in response to different operating modes.
  • the code generated for generic functions can be equivalent on different processors, while code containing function specific instructions can be based on similar generic functions therein allowing respectively for maximum reusability and minimum development effort.
  • the present invention can reduce processor requirements, because the processing load is shared across a diverse set of processors.
  • software latency can be reduced as tasks are performed on processors having fewer active tasks.
  • the invention is particularly well suited for use in SOC based embedded systems, such as for example associated with video and audio systems.
  • the invention is amenable to being embodied in a number of ways, including but not limited to the following descriptions.
  • One embodiment of the invention is an apparatus for asymmetric multi-processing, comprising: (a) a plurality of processors configured for executing instructions in response to tasks scheduled for execution within the plurality of processors; (b) a communication pathway interconnecting processors within the plurality of processors, wherein each of the processors in the plurality of processors is configured for executing an instruction set which includes a set of common instructions which are common to all processors in the plurality of processors; (c) one or more of the processors is configured with processor specific instructions for controlling processor specific functions which can not be executed by the other processors within the plurality of processors wherein the multi-processor apparatus is asymmetric; and (d) a task scheduler configured for assigning tasks containing only common instructions to any of the plurality of processors, while tasks containing processor specific functions are assigned to one or more specific processors configured for executing those specific functions.
  • processor specific functions can be supported within the apparatus, including digital signal processing, stream processing, video processing, audio processing, digital control, acceleration processing, single-instruction multiple-data processing (SIMD) and combinations thereof.
  • processors within the AMP system are embodied in an SOC device.
  • the instructions for execution by the plurality of processors are generated by a compiler or assembler, which is configured for generating binary code for each processor with common instructions generated for each processor, and including processor specific instructions generated within the binary code for processors configured for performing the associated processor specific functions.
  • a compiler or assembler configured for generating binary code for each processor with common instructions generated for each processor, and including processor specific instructions generated within the binary code for processors configured for performing the associated processor specific functions.
  • the task scheduler for the apparatus is preferably executed in response to programming executing on at least one of the plurality of processors, such as within an operating system.
  • One embodiment of the invention is an apparatus for generating binary code in response to compiling or assembling source code for execution within an asymmetric multi-processing system, comprising: (a) receiving source code containing a plurality of functions for execution by processors within an asymmetric multi-processing system; (b) mapping functions from within said source code to indicate which system functions are generic and thus contain common instructions for all processors in the asymmetric multi-processing system, and which functions contain instructions directed to one or more specific processors capable of executing the processor specific instructions; (c) outputting binary code containing common instructions for each processor in said asymmetric multi-processing system, and a combination of common instructions and processor specific instructions for processors within the asymmetric multi-processing system which support processor specific functions.
  • the binary code generated for common instructions is configured for execution by at least one task executing on any of the processors within the asymmetric multi-processing system, and the binary code which is generated contains processor specific instructions configured for execution by at least one task configured for execution on one or more of the processors within the asymmetric multi-processing system which supports the processor specific functions.
  • the binary code (programming) generated by the apparatus is configured for execution, such as by tasks scheduled by an operating system that determines which tasks should be assigned to which processors in response to function mapping for the specific plurality of processors in the target system. It will be appreciated that directives are decoded from within the source code to tell the compiler/assembler which functions are directed to which specific processors, or alternatively to all processors.
  • a header and footer designate the portion of source code whose associated binary code is to be generated for one or more specific processors.
  • a macro designates the portion of source code whose associated binary code is to be generated for one or more specific processors.
  • text within a function definition designates whether the function is directed to any of the processors, or to one or more specific processors.
  • a linker is preferably adapted for assigning absolute addresses to functions for each of the processors within the asymmetric multi-processing system. It should be appreciated that the apparatus can support any desired processor specific instructions, including but not limited to, digital signal processing, stream processing, video processing, audio processing, digital control, acceleration processing, single-instruction multiple-data processing and combinations thereof.
  • processors within the asymmetric multi-processing system have an instruction set adapted so as to have a portion of the instruction set for each processor being shared in common, as common instructions, with other processors to be used within the asymmetric multi-processing system. Yet, one or more of the processors have processor specific instructions which extend beyond the common instructions that cannot be executed on all the other processors in the asymmetric multi-processing system.
  • the binary code generated by the apparatus is configured so that tasks using the task generic functions can be executed on any of the processors within the asymmetric multi-processing system, while tasks using processor specific functions can be executed only by one or more specific processors which are capable of executing those processor specific functions.
  • One embodiment of the invention is a method of controlling execution of general (e.g., generic, common, shared), and processor-specific tasks within an asymmetric multi-processing system having multiple interconnected processors capable of performing different functionality, comprising: (a) adapting the instruction set of each processing element within a multi-processing system so that a portion of the instruction set for each processor is shared in common, as common instructions, while one or more of the processors include processor specific instructions, associated with processor specific functions, which cannot be executed on all the other processors in the asymmetric multi-processing system; (b) generating binary code for execution on each of the processors within the asymmetric multi-processing system by, (b)(i) outputting binary code of the common instructions for each of the processors within the asymmetric multi-processing system, (b)(ii) creating a function map indicating which system functions are generic and which functions are directed to one or more specific processors capable of executing processor specific instructions, and (b)(iii) outputting binary code of the processor specific instructions for one
  • the method can be configured with a linker which is adapted to assign absolute addresses to functions for each of the processors within said asymmetric multi-processing system.
  • one or more of the processors is configured for executing a task scheduler that assigns generic tasks, those containing only common instructions, to any of the plurality of processors, while tasks containing processor specific functions are assigned to one or more specific processors configured for executing those specific functions.
  • the method can support any desired core of common processing functionality (and their respective instructions) and any desired processor specific functions (and respective instructions extending the core) including functions such as digital signal processing, stream processing, video processing, audio processing, digital control processing, hardware acceleration processing, single-instruction multiple-data processing and combinations thereof.
  • the present invention provides a number of beneficial aspects which can be implemented either separately or in any desired combination without departing from the present teachings.
  • An aspect of the invention is to provide SMP processing functionality within an AMP architecture.
  • Another aspect of the invention is to allow performing tasks within the AMP architecture on any processor having suitable processor functionality.
  • Another aspect of the invention is a method of modifying diverse processor instruction sets to overlap within a common instruction set, wherein generic tasks can be executed on any of the processors within the system.
  • Another aspect of the invention is a method of extending the common instruction set to support specific functions on one or more processors within the target asymmetric multi-processing system.
  • Another aspect of the invention is to provide a compiler or assembler which is adapted for generating binary code, while taking into account the common instructions and respecting the processor specific functions.
  • Another aspect of the invention is a system which can utilize available processor bandwidth from one processor to perform generic system tasks or tasks for another processor.
  • a still further aspect of the invention is a method of reducing the computing power of processing elements and their requisite cost.
  • FIG. 1 is a block diagram of hardware within an asymmetric multi-processing core according to an aspect of the present invention.
  • FIG. 2 is a block diagram of general purpose tasks and processor specific tasks configured for being executed within an asymmetric multi-processing system according to an aspect of the present invention.
  • FIG. 3 is a task-data flow diagram of generic tasks and processor specific tasks being scheduled on different processor cores according to an aspect of the present invention.
  • FIG. 4-6 are pseudo-source code listings showing examples of designating which processor or processors a given section of code, or function, are directed to according to an aspect of the present invention.
  • FIG. 7 is a flowchart of generating binary code through compilation and linking processes according to an aspect of the present invention.
  • FIG. 8 is a timing diagram of task processing within an example asymmetric multi-processing system containing four processors according to an aspect of the present invention.
  • FIG. 1 through FIG. 8 for illustrative purposes the present invention is embodied in the apparatus generally shown in FIG. 1 through FIG. 8 . It will be appreciated that the apparatus may vary as to configuration and as to details of the parts, and that the method may vary as to the specific steps and sequence, without departing from the basic concepts as disclosed herein.
  • the invention teaches changes to both the hardware and software for existing AMP architectures.
  • the proposed architecture modifies the AMP architecture wherein a portion of it maps an SMP architecture, without sacrificing processor specific functionality.
  • Each processor in the system is configured so that at least a portion of processor instructions are shared within a common instruction set with associated op-codes.
  • a generic software tool chain can then be configured which includes compiler, assembler and linker providing an SMP view of this architecture.
  • compilers, assemblers and linkers are software which execute as programming from the memory of a computer adapted for receiving source codes and generating binary code. Therefore, as the configuration of general purpose computers for running compilers, assemblers and linkers is well known it need not be discussed.
  • these tool chains are only aware of the common instruction op-code of the processors and therefore the generated binary code files can be executed on any of the processors in the system.
  • the common instruction op-codes different instruction extensions are provided for specific processors.
  • processors may have instruction extensions optimized for signal processing applications (DSP), such as video, while other processors may have instruction extensions optimized for audio or stream processing types of applications.
  • DSP signal processing applications
  • processors may also have local memory, or even digital control and/or acceleration processing (e.g., memory management unit, array processors and so forth), or single-instruction multiple-data (SIMD) processing that is not visible to other processors.
  • SIMD single-instruction multiple-data
  • the operating system When the operating system loads an executable, it first checks if the software task is a generic task or a special optimized task.
  • a generic task is generated by the common tool chain and thus uses common instructions for the set of processors.
  • the generic tasks are treated as a normal process in the operating system.
  • the operating system uses standard context switches to schedule these tasks among all the processors in the system.
  • Special optimized tasks contain instruction op-codes that are optimized for one or more designated processors in the system.
  • the scheduler of the operating system is aware that this task can only be assigned to one or more designated cores in the system. According to one implementation of the invention, these special tasks can be executed in one of two modes. In a first mode, the task can be context switched out of the target processor by a generic software task or another specialized task that is targeting the same processor.
  • the other mode is an exclusive mode, wherein the operating system marks the processor as busy until the task explicitly exits, and wherein the scheduler would not trigger any context switches to this processor.
  • FIG. 1 illustrates an embodiment 10 showing the hardware and software architecture for an example of the inventive system. It should be appreciated that the figure is shown by way of example and not limitation, wherein the number of cores, types of extensions, types of switching, connection to I/O (input/output) and memory, as well as other variations can be implemented by one of ordinary skill in the art without departing from the teachings of the present invention.
  • Core 0 is shown in block 12 as an operating system (OS) host, also referred to as a scheduler, with an extension block 14 shown as optional (e.g., with “*”).
  • OS operating system
  • extension block 14 shown as optional (e.g., with “*”).
  • scheduling may be performed by more than one processor and configured in a number of different ways as will be understood by one of ordinary skill in the art.
  • Interprocessor communication (communication pathway) 16 is represented as a cross-bar switch which allows moving information and tasks between processors.
  • Core 1 in block 18 is shown coupled with audio extensions 20 .
  • Core 1 is thus configured for handling audio processing, but has a core which can perform the generic tasks.
  • Core 2 in block 22 is adapted with extensions 24 for processing video, such as performing digital signal processing.
  • Core 3 in block 26 is similarly adapted with extensions 28 for processing video.
  • Digital input/output 30 is represented as high speed I/O.
  • An interface to memory is depicted by way of example through a double data rate (DDR) controller 34 connected to the set of processing cores through data pipe 32 coupled through switch connection 16 .
  • DDR controllers are known in the art, such as for providing double speed access and control in relation to synchronous dynamic random access memories.
  • One of ordinary skill in the art will appreciate that different forms of memory and memory interfacing can be utilized without departing from the teachings of the present invention.
  • Interfacing with analog I/O is shown in block 36 representing analog-to-digital (A/D) conversion as well as digital-to-analog (D/A) conversion, therein allowing analog signals to be measured and/or generated.
  • A/D analog-to-digital
  • D/A digital-to-analog
  • Block 38 depicts the connection of low speed digital I/O, for example that which is directed from or to a user does not require rapid updates, and can in many instances be performed within a background task, or other “as-time-permits” processing (e.g., lowest priority task, polling loops, and so forth).
  • FIG. 2 illustrates that different generic and processor specific types of tasks can be executed on the asymmetric (AMP) system.
  • the tasks shown in the upper portion of the figure are represented with a circle as a task type designation for a generic task (associated with generic or common, instructions) that can be performed on any of the processors.
  • these tasks are shown with the same size and shape blocks (cylinders), it should be appreciated that the amount, form, and complexity of the task can vary as desired.
  • the tasks shown in the lower portion of the figure are specially optimized tasks configured for being directed to processors having specific computational resources. To represent these resources and the different types of computation being performed, these cylindrical blocks are shown in different sizes and shown with geometric indicia (e.g., triangle, square, and star).
  • geometric indicia e.g., triangle, square, and star
  • FIG. 3 illustrates one mode of scheduling according to the present invention in regards to the architecture shown in FIG. 1 .
  • the tasks which need to be processed are shown containing generic tasks, represented here as circles, in addition to three different set of specific tasks, represented herein with triangles, squares, and stars.
  • a scheduler block 12 , 14 as shown here can itself process generic tasks (circles) while scheduling out the remainder to other processors.
  • the scheduler oversees the execution of all the function specific tasks to be performed on the function specific processors.
  • processor block 18 , 20 is shown receiving both generic tasks and tasks specific to its processor configuration, herein depicted as a triangle symbol.
  • blocks 22 , 24 process generic tasks as well as specific tasks represented as squares
  • blocks 26 , 28 process generic tasks and specific tasks represented as stars.
  • each of the cores can vary in response to the application being performed. For example, if the architecture shown in FIG. 1 is operating in an internet TV mode (IPTV), such as a portable media player, then block 14 of core 0 may provide memory management functionality, while Core 3 may be put into a low-power state as not being needed.
  • IPTV internet TV mode
  • processors performing specific task functionality can be subject to substantially different power requirements, wherein the system, such as in response to scheduler directive, is adapted to determine whether or not to power down cores when their specific functions are not being used and sufficient processing resource exists to execute the generic tasks.
  • the cores can be adapted for use in other ways, thus again optimizing processor utilization in response to the type of activity, level of activity, power consumption and other factors.
  • the present invention can be implemented with different forms of task “scheduling” as well as different forms of syntax for controlling a compiler in generating the necessary binary code.
  • An assembler configured according to the present invention can automatically determine if the code is directed to specific processors in response to detecting processor specific instructions within a given function, wherein this information can be passed into a function map.
  • a compiler e.g., generating binary code from high level coding, instead of from assembly coding
  • a compiler does not often yield a one-to-one correspondence between source code instructions and processor instructions, wherein it is preferred that directives be included in the high level source code as to which processor should fulfill the request.
  • processor specific functionality is not limited to instruction set, as certain processors may for example have access to select I/O or memory addresses, which may need to be accessed to fulfill specific tasks.
  • a compiler could actually generate binary code for either a generic processor or a specific processor using the extended instruction set.
  • the source code for the functions designate in some manner whether the source is to be rendered with generic instructions, or in response to one or more processor specific instruction extensions.
  • the following teachings provide a few examples of designating to the compiler which processor core the source code is to be compiled for.
  • FIG. 4 through FIG. 6 illustrate example coding styles to allow the programmer to direct compilation of code executable on the processors within the system, such as exemplified by FIG. 1 .
  • FIG. 4 depicts a mechanism (e.g., syntax) for directing the compiler to direct a group of instructions toward a specific processor.
  • the body of instructions between the header and footer are compiled for the specific processor listed as “CORE 1 ”.
  • FIG. 5 illustrates a second example in which macro instructions are used, which the compiler then expands out and directs to the specific processor. In this example three sequential instructions are to be performed by “CORE 1 ” within a set of generic commands represented as “--------” in the example.
  • FIG. 6 illustrates a third alternative and/or additional mechanism which may be adopted, in which a specifier is encoded within the function definition as to whether a given function can be directed to any of the target processors, or must be directed at one or more of the specific processors within the target system.
  • FIG. 7 illustrates an example embodiment 50 of generating code in response to functions accessed by tasks to be executed on the system as a whole.
  • the software source code in block 52 is received as written per FIG. 4-6 into a compiler 54 which generates object code for each function 56 and provides mapping 58 of the functions for each of the cores.
  • the functions have names (non-absolute addressing) and association with specific cores, or are generic (for any cores) as shown in block 60 .
  • Compiled code is then linked 62 generating a linked object code 64 with absolute function-core mapping 66 , an example shown in block 68 depicting absolute addresses for functions within the various cores.
  • FIG. 8 illustrates an example 70 of how the scheduler in the OS assigns tasks to Cores.
  • the diagram depicts processing for each of the cores (Core 0 through Core 3 ) with respect to time. Four general time period sections are shown to identify different portions of the function execution diagram.
  • the Main function 72 starts on Core 0 and issues a system call to create tasks with arguments of function address and execution priority.
  • the OS can determine which task should be assigned to which core using the function-core map as generated by the compiler and in response to execution priority.
  • func 2 _for_core 2 represented in block 74 and func 3 _general are not executed at this point.
  • func_for_core 2 on Core 2 issues a system call to tell the scheduler that it needs to wait for an event (e.g., “pend”) from the system and sleep until then, as seen in block 74 .
  • the OS suspends func_for_core 2 and assign func 2 _for_core 2 to Core 2 .
  • the OS receives an event from the system. Since func_for_core 1 and func_for_core 2 are waiting for the event, and func 3 _general and func 2 _for core 2 have lower priority than the others.
  • the present invention thus teaches a method and apparatus for multi-processing on an asymmetric system. Different aspects of this invention are described including target hardware and software, tools required for generating binary code for the target, and the method of creating an SMP like environment over an AMP, asymmetric, system. It will be appreciated that the figures herein are shown by way of example toward understanding aspects of the present invention and are not intended to limit the practice of the invention. One of ordinary skill in the art will appreciate that the teachings of the present invention may practiced in various ways and with various mechanisms without departing from the present invention.

Abstract

A method and system for supporting multi-processing within an asymmetric processor architecture in which processors support different processor specific functionality. Instruction sets within processors having different functionalities are modified so that a portion of the functionality of these processors overlaps within a common set of instructions. Code generation for the multi-processor system (e.g., compiler, assembler, and/or linker) is performed in a manner to allow the binary code to be generated for execution on these diverse processors, and the execution of generic tasks, using the shared instructions, on any of the processors within the multiple processors. Processor specific tasks are only executed by the processors having the associated processor specific functionality. Source code directives are exemplified for aiding the compiler or assembler in properly creating binary code for the diverse processors. The invention can reduce processor computation requirements, reduce software latency, and increase system responsiveness.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • Not Applicable
  • STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
  • Not Applicable
  • INCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED ON A COMPACT DISC
  • Not Applicable
  • NOTICE OF MATERIAL SUBJECT TO COPYRIGHT PROTECTION
  • A portion of the material in this patent document is subject to copyright protection under the copyright laws of the United States and of other countries. The owner of the copyright rights has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the United States Patent and Trademark Office publicly available file or records, but otherwise reserves all copyright rights whatsoever. The copyright owner does not hereby waive any of its rights to have this patent document maintained in secrecy, including without limitation its rights pursuant to 37 C.F.R. §1.14.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • This invention pertains generally to microprocessor devices and computing, and more particularly to multi-processing on an asymmetric architecture.
  • 2. Description of Related Art
  • In traditional multi-processor operating systems, all the processors in the system are exactly the same. The operating system can assign a task or a process to any of the processors within the computer system. Computer architectures and operating systems of this kind are referred to as symmetric multi-processor (SMP) systems.
  • However, in the marketplace today microprocessors are ubiquitous and found in various forms doing various functions at various levels in the hierarchy, within a system or even within a single embedded system. It will be noted that in these diverse multi-level computing environments each of the microprocessor is optimized for different purposes to achieve the best performance and power requirement. As an example, in a SOC (System on Chip) for portable media players, one processor is perhaps optimized for performing digital signal processing (e.g., as a DSP) for video decoding, while another processor is directed at running applications and decoding audio. An architecture of this form is referred to as an asymmetric multi-processor (AMP) or (ASP) architecture.
  • It should be appreciated that in an AMP system, each processor may have completely different instruction sets and memory configurations. For example, one processor may have SIMD (Single instruction multiple data) instructions, while other processors may only provide standard RISC instructions. Some processors may have specialized local memory and DMA engines attached. As a consequence of these many differences, it is not surprising that different compilers, assemblers and linkers can be required for generating the code for each of the processors. While it is well understood that the generated binary code may only be loaded onto the designated processor.
  • Accordingly, it is not possible for current operating systems, such as SMP based operating systems (e.g., Linux) to take advantage of the multi-processor computing power which is available on diverse computational systems. In an SMP system, all the processors have exactly the same instruction set and they share a unified memory view. In most configurations, there are caches attached to each processor. The system normally ensures cache coherency among the caches, so that when one processor modified the contents of an address, all other processors in the system immediately see the same changes. This cache coherency scheme is often accomplished by a Snoop protocol. The Snoop protocol is a TCP-aware link layer protocol designed to improve the performance of TCP over networks of wired and single-hop wireless links.
  • By way of example, in systems such as video cameras the most computational intensive process is that of video analysis and processing, in particular if the original video is in high definition. A high performance processor is required, for instance, to first decode the original video sequence and then analyze each video and audio frame. In an embedded device like a camcorder, video and audio are normally encoded by specialized hardware, while the generic computing power for the microprocessor on the device can be very limited. Thus, the camcorders represent many device which require processors tailored for specific forms of processing, whereby conventional SMP multiprocessing approaches are not applicable.
  • Accordingly a need exists for a system and method of performing a form of multiprocessing utilizing the processing resources found within an asymmetric processing environment. These needs and others are met within the present invention, which overcomes the deficiencies of previously developed multiprocessing systems and methods.
  • BRIEF SUMMARY OF THE INVENTION
  • The present invention is directed at optimizing the use of processing resources within an AMP architecture. Toward this end the ability is provided for assigning tasks to the underlying AMP processing elements as in an SMP operating system, yet while retaining the ability to run programs optimized for asymmetric processors within the system. Thus, processing power within the AMP environment can be cast according to the invention into an SMP architecture which takes advantage of the processor computing power of the asymmetric processing elements. The present invention in essence creates a symmetric multi-processor operating system, or environment, for an asymmetric multi-processor architecture which contains processors having processor specific functionality.
  • In order to create this SMP environment over an AMP framework, both the typical hardware and software of the AMP environment must be modified. Instructions sets within processors having different functionalities are modified so that a portion of the functionality of these processors overlaps within a common set of instructions. The invention also teaches compiler, assembler, and linker modifications which allow the binary code to be generated for execution on these diverse processors, and the execution of generic tasks, using the shared instructions, on any of the processors within the multiple processors. It will be noted, however, that the code loaded on one or more of these processors can be changed, such as in response to different operating modes. The code generated for generic functions can be equivalent on different processors, while code containing function specific instructions can be based on similar generic functions therein allowing respectively for maximum reusability and minimum development effort.
  • It should be appreciated that the present invention can reduce processor requirements, because the processing load is shared across a diverse set of processors. In addition, software latency can be reduced as tasks are performed on processors having fewer active tasks. The invention is particularly well suited for use in SOC based embedded systems, such as for example associated with video and audio systems.
  • The invention is amenable to being embodied in a number of ways, including but not limited to the following descriptions.
  • One embodiment of the invention is an apparatus for asymmetric multi-processing, comprising: (a) a plurality of processors configured for executing instructions in response to tasks scheduled for execution within the plurality of processors; (b) a communication pathway interconnecting processors within the plurality of processors, wherein each of the processors in the plurality of processors is configured for executing an instruction set which includes a set of common instructions which are common to all processors in the plurality of processors; (c) one or more of the processors is configured with processor specific instructions for controlling processor specific functions which can not be executed by the other processors within the plurality of processors wherein the multi-processor apparatus is asymmetric; and (d) a task scheduler configured for assigning tasks containing only common instructions to any of the plurality of processors, while tasks containing processor specific functions are assigned to one or more specific processors configured for executing those specific functions. Any processor specific functions can be supported within the apparatus, including digital signal processing, stream processing, video processing, audio processing, digital control, acceleration processing, single-instruction multiple-data processing (SIMD) and combinations thereof. In one implementation of the invention the processors within the AMP system are embodied in an SOC device.
  • In the above apparatus the instructions for execution by the plurality of processors are generated by a compiler or assembler, which is configured for generating binary code for each processor with common instructions generated for each processor, and including processor specific instructions generated within the binary code for processors configured for performing the associated processor specific functions. It will be noted that conventional multi-processing is restricted to operation on symmetric architectures where each processor has the same instruction set, and the compiler/assembler need not modulate its binary code generation for the system in response to the different functionality of each processor and their multi-processing interrelationship. The task scheduler for the apparatus is preferably executed in response to programming executing on at least one of the plurality of processors, such as within an operating system.
  • One embodiment of the invention is an apparatus for generating binary code in response to compiling or assembling source code for execution within an asymmetric multi-processing system, comprising: (a) receiving source code containing a plurality of functions for execution by processors within an asymmetric multi-processing system; (b) mapping functions from within said source code to indicate which system functions are generic and thus contain common instructions for all processors in the asymmetric multi-processing system, and which functions contain instructions directed to one or more specific processors capable of executing the processor specific instructions; (c) outputting binary code containing common instructions for each processor in said asymmetric multi-processing system, and a combination of common instructions and processor specific instructions for processors within the asymmetric multi-processing system which support processor specific functions. In response to this compilation/assembly the binary code generated for common instructions is configured for execution by at least one task executing on any of the processors within the asymmetric multi-processing system, and the binary code which is generated contains processor specific instructions configured for execution by at least one task configured for execution on one or more of the processors within the asymmetric multi-processing system which supports the processor specific functions.
  • The binary code (programming) generated by the apparatus is configured for execution, such as by tasks scheduled by an operating system that determines which tasks should be assigned to which processors in response to function mapping for the specific plurality of processors in the target system. It will be appreciated that directives are decoded from within the source code to tell the compiler/assembler which functions are directed to which specific processors, or alternatively to all processors. In one mode of the invention, a header and footer designate the portion of source code whose associated binary code is to be generated for one or more specific processors. In another mode, a macro designates the portion of source code whose associated binary code is to be generated for one or more specific processors. In another example mode, text within a function definition designates whether the function is directed to any of the processors, or to one or more specific processors. In addition to the compiler/assembler, a linker is preferably adapted for assigning absolute addresses to functions for each of the processors within the asymmetric multi-processing system. It should be appreciated that the apparatus can support any desired processor specific instructions, including but not limited to, digital signal processing, stream processing, video processing, audio processing, digital control, acceleration processing, single-instruction multiple-data processing and combinations thereof.
  • It should be appreciated that the processors within the asymmetric multi-processing system have an instruction set adapted so as to have a portion of the instruction set for each processor being shared in common, as common instructions, with other processors to be used within the asymmetric multi-processing system. Yet, one or more of the processors have processor specific instructions which extend beyond the common instructions that cannot be executed on all the other processors in the asymmetric multi-processing system. The binary code generated by the apparatus is configured so that tasks using the task generic functions can be executed on any of the processors within the asymmetric multi-processing system, while tasks using processor specific functions can be executed only by one or more specific processors which are capable of executing those processor specific functions.
  • One embodiment of the invention is a method of controlling execution of general (e.g., generic, common, shared), and processor-specific tasks within an asymmetric multi-processing system having multiple interconnected processors capable of performing different functionality, comprising: (a) adapting the instruction set of each processing element within a multi-processing system so that a portion of the instruction set for each processor is shared in common, as common instructions, while one or more of the processors include processor specific instructions, associated with processor specific functions, which cannot be executed on all the other processors in the asymmetric multi-processing system; (b) generating binary code for execution on each of the processors within the asymmetric multi-processing system by, (b)(i) outputting binary code of the common instructions for each of the processors within the asymmetric multi-processing system, (b)(ii) creating a function map indicating which system functions are generic and which functions are directed to one or more specific processors capable of executing processor specific instructions, and (b)(iii) outputting binary code of the processor specific instructions for one or more of the processors which include processor specific instructions.
  • The method can be configured with a linker which is adapted to assign absolute addresses to functions for each of the processors within said asymmetric multi-processing system. In one implementation of the invention one or more of the processors is configured for executing a task scheduler that assigns generic tasks, those containing only common instructions, to any of the plurality of processors, while tasks containing processor specific functions are assigned to one or more specific processors configured for executing those specific functions. The method can support any desired core of common processing functionality (and their respective instructions) and any desired processor specific functions (and respective instructions extending the core) including functions such as digital signal processing, stream processing, video processing, audio processing, digital control processing, hardware acceleration processing, single-instruction multiple-data processing and combinations thereof.
  • The present invention provides a number of beneficial aspects which can be implemented either separately or in any desired combination without departing from the present teachings.
  • An aspect of the invention is to provide SMP processing functionality within an AMP architecture.
  • Another aspect of the invention is to allow performing tasks within the AMP architecture on any processor having suitable processor functionality.
  • Another aspect of the invention is a method of modifying diverse processor instruction sets to overlap within a common instruction set, wherein generic tasks can be executed on any of the processors within the system.
  • Another aspect of the invention is a method of extending the common instruction set to support specific functions on one or more processors within the target asymmetric multi-processing system.
  • Another aspect of the invention is to provide a compiler or assembler which is adapted for generating binary code, while taking into account the common instructions and respecting the processor specific functions.
  • Another aspect of the invention is a system which can utilize available processor bandwidth from one processor to perform generic system tasks or tasks for another processor.
  • A still further aspect of the invention is a method of reducing the computing power of processing elements and their requisite cost.
  • Further aspects of the invention will be brought out in the following portions of the specification, wherein the detailed description is for the purpose of fully disclosing preferred embodiments of the invention without placing limitations thereon.
  • BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING(S)
  • The invention will be more fully understood by reference to the following drawings which are for illustrative purposes only:
  • FIG. 1 is a block diagram of hardware within an asymmetric multi-processing core according to an aspect of the present invention.
  • FIG. 2 is a block diagram of general purpose tasks and processor specific tasks configured for being executed within an asymmetric multi-processing system according to an aspect of the present invention.
  • FIG. 3 is a task-data flow diagram of generic tasks and processor specific tasks being scheduled on different processor cores according to an aspect of the present invention.
  • FIG. 4-6 are pseudo-source code listings showing examples of designating which processor or processors a given section of code, or function, are directed to according to an aspect of the present invention.
  • FIG. 7 is a flowchart of generating binary code through compilation and linking processes according to an aspect of the present invention.
  • FIG. 8 is a timing diagram of task processing within an example asymmetric multi-processing system containing four processors according to an aspect of the present invention.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Referring more specifically to the drawings, for illustrative purposes the present invention is embodied in the apparatus generally shown in FIG. 1 through FIG. 8. It will be appreciated that the apparatus may vary as to configuration and as to details of the parts, and that the method may vary as to the specific steps and sequence, without departing from the basic concepts as disclosed herein.
  • In order to create an SMP environment for optimizing processor utilization, the invention teaches changes to both the hardware and software for existing AMP architectures.
  • On the hardware side the proposed architecture modifies the AMP architecture wherein a portion of it maps an SMP architecture, without sacrificing processor specific functionality. Each processor in the system is configured so that at least a portion of processor instructions are shared within a common instruction set with associated op-codes. Accordingly, a generic software tool chain can then be configured which includes compiler, assembler and linker providing an SMP view of this architecture. It is well known that compilers, assemblers and linkers are software which execute as programming from the memory of a computer adapted for receiving source codes and generating binary code. Therefore, as the configuration of general purpose computers for running compilers, assemblers and linkers is well known it need not be discussed. In one mode of the invention, these tool chains are only aware of the common instruction op-code of the processors and therefore the generated binary code files can be executed on any of the processors in the system. On top of the common instruction op-codes, different instruction extensions are provided for specific processors.
  • By way of example and not limitation, some processors may have instruction extensions optimized for signal processing applications (DSP), such as video, while other processors may have instruction extensions optimized for audio or stream processing types of applications. These processors may also have local memory, or even digital control and/or acceleration processing (e.g., memory management unit, array processors and so forth), or single-instruction multiple-data (SIMD) processing that is not visible to other processors. Using the extended instruction set with these processors can require specific compilation and assembly techniques targeting each processor which is to be used. The same linker, when modified to be cognizant of the function-core mapping, can be used to link all of the sections of object code into a final executable.
  • Accordingly, on the software side, changes have to be made to the scheduler and loaders for the operating system. When the operating system loads an executable, it first checks if the software task is a generic task or a special optimized task. A generic task is generated by the common tool chain and thus uses common instructions for the set of processors. The generic tasks are treated as a normal process in the operating system. In one simple implementation of the present invention, the operating system uses standard context switches to schedule these tasks among all the processors in the system.
  • Special optimized tasks contain instruction op-codes that are optimized for one or more designated processors in the system. The scheduler of the operating system is aware that this task can only be assigned to one or more designated cores in the system. According to one implementation of the invention, these special tasks can be executed in one of two modes. In a first mode, the task can be context switched out of the target processor by a generic software task or another specialized task that is targeting the same processor. The other mode is an exclusive mode, wherein the operating system marks the processor as busy until the task explicitly exits, and wherein the scheduler would not trigger any context switches to this processor.
  • FIG. 1 illustrates an embodiment 10 showing the hardware and software architecture for an example of the inventive system. It should be appreciated that the figure is shown by way of example and not limitation, wherein the number of cores, types of extensions, types of switching, connection to I/O (input/output) and memory, as well as other variations can be implemented by one of ordinary skill in the art without departing from the teachings of the present invention.
  • Core 0 is shown in block 12 as an operating system (OS) host, also referred to as a scheduler, with an extension block 14 shown as optional (e.g., with “*”). In the configuration shown Core 0 in block 12 would largely perform scheduling in addition to duties such as user interface functions. It should be appreciated that scheduling may be performed by more than one processor and configured in a number of different ways as will be understood by one of ordinary skill in the art. Interprocessor communication (communication pathway) 16 is represented as a cross-bar switch which allows moving information and tasks between processors.
  • Core 1 in block 18 is shown coupled with audio extensions 20. In this example Core 1 is thus configured for handling audio processing, but has a core which can perform the generic tasks. Core 2 in block 22 is adapted with extensions 24 for processing video, such as performing digital signal processing. Core 3 in block 26 is similarly adapted with extensions 28 for processing video.
  • Digital input/output 30 is represented as high speed I/O. An interface to memory is depicted by way of example through a double data rate (DDR) controller 34 connected to the set of processing cores through data pipe 32 coupled through switch connection 16. It will be noted that DDR controllers are known in the art, such as for providing double speed access and control in relation to synchronous dynamic random access memories. One of ordinary skill in the art will appreciate that different forms of memory and memory interfacing can be utilized without departing from the teachings of the present invention. Interfacing with analog I/O is shown in block 36 representing analog-to-digital (A/D) conversion as well as digital-to-analog (D/A) conversion, therein allowing analog signals to be measured and/or generated. It will be appreciated that different applications will have different levels of need for analog functionality, and that these aspects are shown merely by way of example of processor specific functionality for which processor specific instructions are included in the instruction set. Block 38 depicts the connection of low speed digital I/O, for example that which is directed from or to a user does not require rapid updates, and can in many instances be performed within a background task, or other “as-time-permits” processing (e.g., lowest priority task, polling loops, and so forth).
  • FIG. 2 illustrates that different generic and processor specific types of tasks can be executed on the asymmetric (AMP) system. By way of example, the tasks shown in the upper portion of the figure are represented with a circle as a task type designation for a generic task (associated with generic or common, instructions) that can be performed on any of the processors. Although these tasks are shown with the same size and shape blocks (cylinders), it should be appreciated that the amount, form, and complexity of the task can vary as desired. The tasks shown in the lower portion of the figure are specially optimized tasks configured for being directed to processors having specific computational resources. To represent these resources and the different types of computation being performed, these cylindrical blocks are shown in different sizes and shown with geometric indicia (e.g., triangle, square, and star). One of ordinary skill in the art will appreciate that the indicia and shape of the blocks is only used as a means of describing task difference.
  • FIG. 3 illustrates one mode of scheduling according to the present invention in regards to the architecture shown in FIG. 1. The tasks which need to be processed are shown containing generic tasks, represented here as circles, in addition to three different set of specific tasks, represented herein with triangles, squares, and stars. A scheduler block 12, 14, as shown here can itself process generic tasks (circles) while scheduling out the remainder to other processors. In addition, the scheduler oversees the execution of all the function specific tasks to be performed on the function specific processors. For example processor block 18, 20 is shown receiving both generic tasks and tasks specific to its processor configuration, herein depicted as a triangle symbol. Similarly, blocks 22, 24 process generic tasks as well as specific tasks represented as squares, while blocks 26, 28 process generic tasks and specific tasks represented as stars.
  • Typically, the majority of applications in the operating system would run as generic tasks to take advantage of the multi-processor platform. It will be noted that typically performance critical tasks may rely on libraries or middle-ware functionality which can be optimized for operation on special processors (e.g., non-generic).
  • It should also be appreciated that the functions performed by each of the cores can vary in response to the application being performed. For example, if the architecture shown in FIG. 1 is operating in an internet TV mode (IPTV), such as a portable media player, then block 14 of core 0 may provide memory management functionality, while Core 3 may be put into a low-power state as not being needed. It will be noted that processors performing specific task functionality can be subject to substantially different power requirements, wherein the system, such as in response to scheduler directive, is adapted to determine whether or not to power down cores when their specific functions are not being used and sufficient processing resource exists to execute the generic tasks. In other modes, such as a camcorder mode, the cores can be adapted for use in other ways, thus again optimizing processor utilization in response to the type of activity, level of activity, power consumption and other factors.
  • It should be appreciated that the present invention can be implemented with different forms of task “scheduling” as well as different forms of syntax for controlling a compiler in generating the necessary binary code. An assembler configured according to the present invention can automatically determine if the code is directed to specific processors in response to detecting processor specific instructions within a given function, wherein this information can be passed into a function map. A compiler (e.g., generating binary code from high level coding, instead of from assembly coding) according to the present invention, however, does not often yield a one-to-one correspondence between source code instructions and processor instructions, wherein it is preferred that directives be included in the high level source code as to which processor should fulfill the request. In this way the compiler can readily determine which set of processor instructions to use when generating the binary code, such as for a specific function. It should be noted that processor specific functionality is not limited to instruction set, as certain processors may for example have access to select I/O or memory addresses, which may need to be accessed to fulfill specific tasks. In some instances where a specific processor is not tied to a specific I/O, such as in regard to digital accelerator functions, a compiler could actually generate binary code for either a generic processor or a specific processor using the extended instruction set. In these instances it is also important that the source code for the functions designate in some manner whether the source is to be rendered with generic instructions, or in response to one or more processor specific instruction extensions. The following teachings provide a few examples of designating to the compiler which processor core the source code is to be compiled for.
  • FIG. 4 through FIG. 6 illustrate example coding styles to allow the programmer to direct compilation of code executable on the processors within the system, such as exemplified by FIG. 1. FIG. 4 depicts a mechanism (e.g., syntax) for directing the compiler to direct a group of instructions toward a specific processor. In response to the delineation of header and footer, the body of instructions between the header and footer are compiled for the specific processor listed as “CORE1”. FIG. 5 illustrates a second example in which macro instructions are used, which the compiler then expands out and directs to the specific processor. In this example three sequential instructions are to be performed by “CORE1” within a set of generic commands represented as “--------” in the example. Typically, absolute addresses are assigned to the functions after linking. FIG. 6 illustrates a third alternative and/or additional mechanism which may be adopted, in which a specifier is encoded within the function definition as to whether a given function can be directed to any of the target processors, or must be directed at one or more of the specific processors within the target system.
  • FIG. 7 illustrates an example embodiment 50 of generating code in response to functions accessed by tasks to be executed on the system as a whole. The software source code in block 52 is received as written per FIG. 4-6 into a compiler 54 which generates object code for each function 56 and provides mapping 58 of the functions for each of the cores. At this point the functions have names (non-absolute addressing) and association with specific cores, or are generic (for any cores) as shown in block 60. Compiled code is then linked 62 generating a linked object code 64 with absolute function-core mapping 66, an example shown in block 68 depicting absolute addresses for functions within the various cores.
  • FIG. 8 illustrates an example 70 of how the scheduler in the OS assigns tasks to Cores. The diagram depicts processing for each of the cores (Core0 through Core3) with respect to time. Four general time period sections are shown to identify different portions of the function execution diagram.
  • In the first (1) time period the Main function 72 starts on Core0 and issues a system call to create tasks with arguments of function address and execution priority. The OS can determine which task should be assigned to which core using the function-core map as generated by the compiler and in response to execution priority. In this example case, func2_for_core2 represented in block 74, and func3_general are not executed at this point.
  • Moving into the second (2) time period, func_for_core2 on Core 2 issues a system call to tell the scheduler that it needs to wait for an event (e.g., “pend”) from the system and sleep until then, as seen in block 74. In response, the OS suspends func_for_core2 and assign func2_for_core2 to Core2.
  • Moving into the third (3) time period, the same pend status is shown arising in regard to Core1, with representative operations shown in block 76. In this case, even if func_for_core2 is ready to execute, it can go only to Core2; wherein func3_general which can be executed on any Core is assigned to Core1.
  • Finally, in moving through the fourth (4) time period, the OS receives an event from the system. Since func_for_core1 and func_for_core2 are waiting for the event, and func3_general and func2_for core2 have lower priority than the others.
  • The present invention thus teaches a method and apparatus for multi-processing on an asymmetric system. Different aspects of this invention are described including target hardware and software, tools required for generating binary code for the target, and the method of creating an SMP like environment over an AMP, asymmetric, system. It will be appreciated that the figures herein are shown by way of example toward understanding aspects of the present invention and are not intended to limit the practice of the invention. One of ordinary skill in the art will appreciate that the teachings of the present invention may practiced in various ways and with various mechanisms without departing from the present invention.
  • Although the description above contains many details, these should not be construed as limiting the scope of the invention but as merely providing illustrations of some of the presently preferred embodiments of this invention.
  • Therefore, it will be appreciated that the scope of the present invention fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the present invention is accordingly to be limited by nothing other than the appended claims, in which reference to an element in the singular is not intended to mean “one and only one” unless explicitly so stated, but rather “one or more.” All structural and functional equivalents to the elements of the above-described preferred embodiment that are known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the present claims. Moreover, it is not necessary for a device or method to address each and every problem sought to be solved by the present invention, for it to be encompassed by the present claims. Furthermore, no element, component, or method step in the present disclosure is intended to be dedicated to the public regardless of whether the element, component, or method step is explicitly recited in the claims. No claim element herein is to be construed under the provisions of 35 U.S.C. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for.”

Claims (20)

1. An apparatus for asymmetric multi-processing, comprising:
a plurality of processors configured for executing instructions in response to tasks scheduled for execution within said plurality of processors;
a communication pathway interconnecting individual processors within said plurality of processors;
wherein each of said processors in said plurality of processors is configured for executing an instruction set which includes a set of common instructions which are common to all processors in said plurality of processors;
wherein one or more of said processors is configured with processor specific instructions for controlling processor specific functions which can not be executed by the other processors within said plurality of processors wherein said multi-processor apparatus is asymmetric; and
a task scheduler configured for assigning tasks containing only common instructions to any of said plurality of processors, while tasks containing processor specific functions are assigned to one or more specific processors configured for executing those specific functions.
2. An apparatus as recited in claim 1, wherein said instructions for execution by said plurality of processors are generated by a compiler or assembler, which is configured for generating binary code for each processor with common instructions generated for each processor, and including processor specific instructions generated within the binary code for processors configured for performing the associated processor specific functions.
3. An apparatus as recited in claim 1, wherein said processor specific functions are selected from the group of processing activities consisting of digital signal processing, stream processing, video processing, audio processing, digital control, acceleration processing, single-instruction multiple-data processing (SIMD), and combinations thereof.
4. An apparatus as recited in claim 1, wherein said task scheduler is executed on programming which executes on at least one of said plurality of processors.
5. An apparatus as recited in claim 1, wherein said task scheduler is executed within an operating system.
6. An apparatus for generating binary code in response to compiling or assembling source code for execution within an asymmetric multi-processing system, comprising:
a computer;
programming configured for executing from said computer for,
receiving source code containing a plurality of functions for execution by processors within an asymmetric multi-processing system,
mapping functions from within said source code to indicate which system functions are generic containing common instructions for all processors in the asymmetric multi-processing system, and which functions contain instructions directed to one or more specific processors capable of executing processor specific instructions,
outputting binary code containing common instructions for each processor in said asymmetric multi-processing system, and a combination of common instructions and processor specific instructions for processors within the asymmetric multi-processing system which support processor specific functions,
wherein said binary code generated for common instructions is configured for execution by at least one task configured for execution on any of the processors within the asymmetric multi-processing system, and said binary code generated containing processor specific instructions is configured for execution by at least one task configured for execution on one or more of the processors within the asymmetric multi-processing system which supports processor specific functions.
7. An apparatus as recited in claim 6, wherein said binary code is configured for execution directed by an operating system which determines which tasks should be assigned to which processors in response to said mapping of functions.
8. An apparatus as recited in claim 6, further comprising decoding directives contained within said source code indicating which functions are directed to a specific processor.
9. An apparatus as recited in claim 8, wherein a header and footer designate a portion of source code whose associated binary code is to be generated for one or more specific processors.
10. An apparatus as recited in claim 8, wherein a macro designates a portion of source code whose associated binary code is to be generated for one or more specific processors.
11. An apparatus as recited in claim 8, wherein text within a function definition designate whether the function is directed to any of the processors, or to one or more specific processors.
12. An apparatus as recited in claim 6, further comprising a linker adapted to assign absolute addresses to functions for each of the processors within the asymmetric multi-processing system.
13. An apparatus as recited in claim 6, wherein said processor specific instructions are selected from the group of non-generic processing activities consisting of digital signal processing, stream processing, video processing, audio processing, digital control, acceleration processing, single-instruction multiple-data processing (SIMD), and combinations thereof.
14. An apparatus as recited in claim 6:
wherein the processors within the asymmetric multi-processing system have an instruction set adapted with a portion of the instruction set for each processor being shared in common, as common instructions, with other processors to be used within the asymmetric multi-processing system; and
wherein one or more of the processors have processor specific instructions which extend beyond the common instructions that cannot be executed on all the other processors in the asymmetric multi-processing system.
15. An apparatus as recited in claim 6, wherein said binary code generated by said apparatus is configured so that tasks using generic functions can be executed by any of the processors within the asymmetric multi-processing system, while tasks using processor specific functions can be executed only by one or more specific processors which are capable of executing those processor specific functions.
16. A method of controlling execution of general and processor-specific tasks within an asymmetric multi-processing system, comprising:
adapting the instruction set of each processing element within a multi-processing system so that a portion of the instruction set for each processor is shared in common, as common instructions, while one or more of the processors includes processor specific instructions, associated with processor specific functions, which cannot be executed on all the other processors in the asymmetric multi-processing system;
generating binary code for execution on each of the processors within the asymmetric multi-processing system by,
outputting binary code of the common shared instructions for each of the processors within the asymmetric multi-processing system,
creating a function map indicating which system functions are generic and which functions are directed to one or more specific processors capable of executing processor specific instructions, and
outputting binary code of the processor specific instructions for said one or more of the processors which include processor specific instructions.
17. A method as recited in claim 16, further comprising a linker adapted to assign absolute addresses to functions for each of the processors within said asymmetric multi-processing system.
18. A method as recited in claim 16, wherein processors within the asymmetric multi-processing system are interconnected with a communication pathway.
19. A method as recited in claim 16, wherein one or more of said processors within the asymmetric multi-processing system is configured for executing a task scheduler which is configured for assigning tasks containing only common instructions to any of said plurality of processors, while tasks containing processor specific functions are assigned to one or more specific processors configured for executing those specific functions.
20. A method as recited in claim 16, wherein said processor specific functions comprise functions selected from the group of processing activities consisting of digital signal processing, stream processing, video processing, audio processing, digital control processing, hardware acceleration processing, single-instruction multiple-data processing (SIMD), and combinations thereof.
US12/405,555 2009-03-17 2009-03-17 Symmetric multi-processor operating system for asymmetric multi-processor architecture Abandoned US20100242014A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/405,555 US20100242014A1 (en) 2009-03-17 2009-03-17 Symmetric multi-processor operating system for asymmetric multi-processor architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/405,555 US20100242014A1 (en) 2009-03-17 2009-03-17 Symmetric multi-processor operating system for asymmetric multi-processor architecture

Publications (1)

Publication Number Publication Date
US20100242014A1 true US20100242014A1 (en) 2010-09-23

Family

ID=42738749

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/405,555 Abandoned US20100242014A1 (en) 2009-03-17 2009-03-17 Symmetric multi-processor operating system for asymmetric multi-processor architecture

Country Status (1)

Country Link
US (1) US20100242014A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120005535A1 (en) * 2009-03-17 2012-01-05 Toyota Jidosha Kabushiki Kaisha Failure diagnostic system, electronic control unit for vehicle, failure diagnostic method
US20120096445A1 (en) * 2010-10-18 2012-04-19 Nokia Corporation Method and apparatus for providing portability of partially accelerated signal processing applications
US20130061237A1 (en) * 2011-09-06 2013-03-07 Ofer Zaarur Switching Tasks Between Heterogeneous Cores
CN103019742A (en) * 2012-12-31 2013-04-03 清华大学 Method for generating automatic codes on multiple DSP (Digital Signal Processor) platform
CN103294554A (en) * 2012-03-05 2013-09-11 中兴通讯股份有限公司 SOC multiprocessor dispatching method and apparatus
CN104077106A (en) * 2013-03-26 2014-10-01 威盛电子股份有限公司 Asymmetric multi-core processor with native switching mechanism
WO2014133784A3 (en) * 2013-02-26 2014-10-23 Qualcomm Incorporated Executing an operating system on processors having different instruction set architectures
EP2799989A3 (en) * 2013-04-11 2014-12-24 Samsung Electronics Co., Ltd Apparatus and method of parallel processing execution
US9396012B2 (en) 2013-03-14 2016-07-19 Qualcomm Incorporated Systems and methods of using a hypervisor with guest operating systems and virtual processors
US9442758B1 (en) 2008-01-21 2016-09-13 Marvell International Ltd. Dynamic processor core switching
CN106030538A (en) * 2014-02-19 2016-10-12 华为技术有限公司 System and method for isolating I/O execution via compiler and OS support
GB2539037A (en) * 2015-06-05 2016-12-07 Advanced Risc Mach Ltd Apparatus having processing pipeline with first and second execution circuitry, and method
US9606818B2 (en) 2013-03-14 2017-03-28 Qualcomm Incorporated Systems and methods of executing multiple hypervisors using multiple sets of processors
US9705985B1 (en) * 2013-03-18 2017-07-11 Marvell International Ltd. Systems and methods for cross protocol automatic sub-operation scheduling
JP2018509716A (en) * 2015-03-17 2018-04-05 華為技術有限公司Huawei Technologies Co.,Ltd. Multi-multidimensional computer architecture for big data applications
US10114756B2 (en) 2013-03-14 2018-10-30 Qualcomm Incorporated Externally programmable memory management unit
WO2023010014A1 (en) * 2021-07-27 2023-02-02 Sonical Sound Solutions Fully customizable ear worn devices and associated development platform

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5835775A (en) * 1996-12-12 1998-11-10 Ncr Corporation Method and apparatus for executing a family generic processor specific application
US5867704A (en) * 1995-02-24 1999-02-02 Matsushita Electric Industrial Co., Ltd. Multiprocessor system shaving processor based idle state detection and method of executing tasks in such a multiprocessor system
US5999734A (en) * 1997-10-21 1999-12-07 Ftl Systems, Inc. Compiler-oriented apparatus for parallel compilation, simulation and execution of computer programs and hardware models
US20060179275A1 (en) * 2005-02-08 2006-08-10 Takeshi Yamazaki Methods and apparatus for processing instructions in a multi-processor system
US7325232B2 (en) * 2001-01-25 2008-01-29 Improv Systems, Inc. Compiler for multiple processor and distributed memory architectures
US20080066056A1 (en) * 2006-09-08 2008-03-13 Sony Computer Entertainment Inc. Inspection Apparatus, Program Tampering Detection Apparatus and Method for Specifying Memory Layout
US20080114937A1 (en) * 2006-10-24 2008-05-15 Arm Limited Mapping a computer program to an asymmetric multiprocessing apparatus
US20080229051A1 (en) * 2006-06-01 2008-09-18 International Business Machines Corporation Broadcasting Instructions/Data to a Plurality of Processors in a Multiprocessor Device Via Aliasing
US20100023709A1 (en) * 2008-07-22 2010-01-28 International Business Machines Corporation Asymmetric double buffering of bitstream data in a multi-core processor
US7840778B2 (en) * 2002-02-19 2010-11-23 Hobson Richard F Processor cluster architecture and associated parallel processing methods

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867704A (en) * 1995-02-24 1999-02-02 Matsushita Electric Industrial Co., Ltd. Multiprocessor system shaving processor based idle state detection and method of executing tasks in such a multiprocessor system
US5835775A (en) * 1996-12-12 1998-11-10 Ncr Corporation Method and apparatus for executing a family generic processor specific application
US5999734A (en) * 1997-10-21 1999-12-07 Ftl Systems, Inc. Compiler-oriented apparatus for parallel compilation, simulation and execution of computer programs and hardware models
US7325232B2 (en) * 2001-01-25 2008-01-29 Improv Systems, Inc. Compiler for multiple processor and distributed memory architectures
US7840778B2 (en) * 2002-02-19 2010-11-23 Hobson Richard F Processor cluster architecture and associated parallel processing methods
US20060179275A1 (en) * 2005-02-08 2006-08-10 Takeshi Yamazaki Methods and apparatus for processing instructions in a multi-processor system
US20080229051A1 (en) * 2006-06-01 2008-09-18 International Business Machines Corporation Broadcasting Instructions/Data to a Plurality of Processors in a Multiprocessor Device Via Aliasing
US20080066056A1 (en) * 2006-09-08 2008-03-13 Sony Computer Entertainment Inc. Inspection Apparatus, Program Tampering Detection Apparatus and Method for Specifying Memory Layout
US20080114937A1 (en) * 2006-10-24 2008-05-15 Arm Limited Mapping a computer program to an asymmetric multiprocessing apparatus
US20100023709A1 (en) * 2008-07-22 2010-01-28 International Business Machines Corporation Asymmetric double buffering of bitstream data in a multi-core processor

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9442758B1 (en) 2008-01-21 2016-09-13 Marvell International Ltd. Dynamic processor core switching
US8656216B2 (en) * 2009-03-17 2014-02-18 Toyota Jidosha Kabushiki Kaisha Failure diagnostic system, electronic control unit for vehicle, failure diagnostic method
US20120005535A1 (en) * 2009-03-17 2012-01-05 Toyota Jidosha Kabushiki Kaisha Failure diagnostic system, electronic control unit for vehicle, failure diagnostic method
US20120096445A1 (en) * 2010-10-18 2012-04-19 Nokia Corporation Method and apparatus for providing portability of partially accelerated signal processing applications
US20130061237A1 (en) * 2011-09-06 2013-03-07 Ofer Zaarur Switching Tasks Between Heterogeneous Cores
US9069553B2 (en) * 2011-09-06 2015-06-30 Marvell World Trade Ltd. Switching tasks between heterogeneous cores
EP2824569A4 (en) * 2012-03-05 2016-06-01 Zte Microelectronics Technology Co Ltd Method and device for scheduling multiprocessor of system on chip (soc)
CN103294554A (en) * 2012-03-05 2013-09-11 中兴通讯股份有限公司 SOC multiprocessor dispatching method and apparatus
CN103019742A (en) * 2012-12-31 2013-04-03 清华大学 Method for generating automatic codes on multiple DSP (Digital Signal Processor) platform
US10437591B2 (en) 2013-02-26 2019-10-08 Qualcomm Incorporated Executing an operating system on processors having different instruction set architectures
WO2014133784A3 (en) * 2013-02-26 2014-10-23 Qualcomm Incorporated Executing an operating system on processors having different instruction set architectures
US10114756B2 (en) 2013-03-14 2018-10-30 Qualcomm Incorporated Externally programmable memory management unit
US9606818B2 (en) 2013-03-14 2017-03-28 Qualcomm Incorporated Systems and methods of executing multiple hypervisors using multiple sets of processors
US9396012B2 (en) 2013-03-14 2016-07-19 Qualcomm Incorporated Systems and methods of using a hypervisor with guest operating systems and virtual processors
US10133598B2 (en) 2013-03-14 2018-11-20 Qualcomm Incorporated Systems and methods of using a hypervisor to assign virtual processor priority based on task priority and to schedule virtual processors for guest operating systems
US9705985B1 (en) * 2013-03-18 2017-07-11 Marvell International Ltd. Systems and methods for cross protocol automatic sub-operation scheduling
CN104077106A (en) * 2013-03-26 2014-10-01 威盛电子股份有限公司 Asymmetric multi-core processor with native switching mechanism
US10423216B2 (en) 2013-03-26 2019-09-24 Via Technologies, Inc. Asymmetric multi-core processor with native switching mechanism
EP2784674A3 (en) * 2013-03-26 2015-04-29 VIA Technologies, Inc. Asymmetric multi-core processor with native switching mechanism
EP2799989A3 (en) * 2013-04-11 2014-12-24 Samsung Electronics Co., Ltd Apparatus and method of parallel processing execution
EP3092567A4 (en) * 2014-02-19 2017-05-31 Huawei Technologies Co., Ltd. System and method for isolating i/o execution via compiler and os support
US9772879B2 (en) 2014-02-19 2017-09-26 Futurewei Technologies, Inc. System and method for isolating I/O execution via compiler and OS support
CN106030538A (en) * 2014-02-19 2016-10-12 华为技术有限公司 System and method for isolating I/O execution via compiler and OS support
JP2018509716A (en) * 2015-03-17 2018-04-05 華為技術有限公司Huawei Technologies Co.,Ltd. Multi-multidimensional computer architecture for big data applications
GB2539037A (en) * 2015-06-05 2016-12-07 Advanced Risc Mach Ltd Apparatus having processing pipeline with first and second execution circuitry, and method
GB2539037B (en) * 2015-06-05 2020-11-04 Advanced Risc Mach Ltd Apparatus having processing pipeline with first and second execution circuitry, and method
US11074080B2 (en) 2015-06-05 2021-07-27 Arm Limited Apparatus and branch prediction circuitry having first and second branch prediction schemes, and method
WO2023010014A1 (en) * 2021-07-27 2023-02-02 Sonical Sound Solutions Fully customizable ear worn devices and associated development platform

Similar Documents

Publication Publication Date Title
US20100242014A1 (en) Symmetric multi-processor operating system for asymmetric multi-processor architecture
US10331615B2 (en) Optimization of loops and data flow sections in multi-core processor environment
US7577826B2 (en) Stall prediction thread management
Teich et al. Invasive computing: An overview
EP2460073B1 (en) Mapping processing logic having data parallel threads across processors
EP1912119B1 (en) Synchronization and concurrent execution of control flow and data flow at task level
US20070250682A1 (en) Method and apparatus for operating a computer processor array
KR101622266B1 (en) Reconfigurable processor and Method for handling interrupt thereof
KR101738941B1 (en) Reconfigurable array and control method of reconfigurable array
EP1365321A3 (en) Multiprocessor system
Gschwind et al. An open source environment for cell broadband engine system software
CN114895965A (en) Method and apparatus for out-of-order pipeline execution implementing static mapping of workloads
JP2004152305A (en) Hyper-processor
KR101603752B1 (en) Multi mode supporting processor and method using the processor
US20050278720A1 (en) Distribution of operating system functions for increased data processing performance in a multi-processor architecture
JP2021034023A (en) Methods and apparatus for configuring heterogenous components in accelerator
Gerzhoy et al. Nested mimd-simd parallelization for heterogeneous microprocessors
Aumage et al. Task-based performance portability in hpc
KR20080066402A (en) The method of designing parallel embedded software with common intermediate code
Rogers HSA overview
Song et al. A Low Cost Cross-Platform Video/Image Process Framework Empowers Heterogeneous Edge Application
Wan et al. Core-Based Parallelism
Porada A Many-core Parallelizing Processor
Lewis Performance and Programmability Trade-offs in the OpenCL 2.0 SVM and Memory Model
Verhulst Non-sequential processing: history and future of bridging the semantic gap left by the von Neumann architecture.

Legal Events

Date Code Title Description
AS Assignment

Owner name: SONY ELECTRONICS INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ZHU, XIAOHAN;REEL/FRAME:022591/0699

Effective date: 20090411

Owner name: SONY CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ZHU, XIAOHAN;REEL/FRAME:022591/0699

Effective date: 20090411

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION