US8819242B2 - Method and system to transfer data utilizing cut-through sockets - Google Patents

Method and system to transfer data utilizing cut-through sockets Download PDF

Info

Publication number
US8819242B2
US8819242B2 US11/468,942 US46894206A US8819242B2 US 8819242 B2 US8819242 B2 US 8819242B2 US 46894206 A US46894206 A US 46894206A US 8819242 B2 US8819242 B2 US 8819242B2
Authority
US
United States
Prior art keywords
operating system
source
destination
message
offload stack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/468,942
Other versions
US20080059644A1 (en
Inventor
Mark A. Bakke
David Patrick Thompson
Timothy J. Kuik
Paul Harry Gleichauf
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cisco Technology Inc
Original Assignee
Cisco Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cisco Technology Inc filed Critical Cisco Technology Inc
Priority to US11/468,942 priority Critical patent/US8819242B2/en
Assigned to CISCO TECHNOLOGY, INC. reassignment CISCO TECHNOLOGY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GLEICHAUF, PAUL HARRY, THOMPSON, DAVID PATRICK, BAKKE, MARK A., KUIK, TIMOTHY J.
Publication of US20080059644A1 publication Critical patent/US20080059644A1/en
Application granted granted Critical
Publication of US8819242B2 publication Critical patent/US8819242B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/2895Intermediate processing functionally located close to the data provider application, e.g. reverse proxies
    • H04L67/2814
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/2871Implementation details of single intermediate entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/289Intermediate processing functionally located close to the data consumer application, e.g. in same machine, in same home or in same sub-network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/161Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/161Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
    • H04L69/162Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields involving adaptations of sockets based mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L67/28
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/326Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the transport layer [OSI layer 4]

Definitions

  • This application relates to a method and system to transfer data utilizing cut-through sockets.
  • Recent trends in CPU chip design provide multiple CPU cores on the same die.
  • the cores may share a common communications bus and main memory, but cache designs may vary to include separate L1 and L2, options for shared L3, shared L2 but separate L1, and direct access L1 across cores.
  • shared memory allocation techniques that can draw memory from a large shared pool has been used in some data processing approaches. Both multi-core CPUs and shared memory allocation techniques are now used on high-performance servers.
  • the large amount of CPU power that they provide may, in some cases, be under-utilized. Therefore, server users have begun deploying virtualization software that permits running multiple operating system instances (guest operating systems) on a single server.
  • the opportunities provided by virtualization, real time monitor operating systems, and multi-core CPU chipsets may be combined and improved to produce a flexible open platform for I/O control and protection along with a common management interface as a beneficial side-effect.
  • one or more processors of an endpoint device may be dedicated as a network core.
  • the network core may be configured to host a common offload stack to provide a unified network interface for the multiple operating system instances running on the endpoint device or host.
  • the common offload stack may appear to the guest operating systems as being on the network.
  • the network, file, and storage I/O functionality may allow the offload stack to function, in effect, as an intermediate embedded network device capable of bridging, switching or even routing between operating systems on the server, and off of the server when operating in conjunction with other (external) network devices deeper in a network.
  • An offload stack in the Open Systems Interconnection Reference Model (OSI model) may include, among other components, a Transmission Control Protocol (TCP) layer, an Internet Protocol (IP) layer and an Ethernet driver.
  • TCP Transmission Control Protocol
  • IP Internet Protocol
  • Data exchange between two operating systems using a TCP stack may include first converting the data into TCP segments, adding IP headers with IP addresses to the data and adding MAC addresses when the data is received at the offload stack and then sending the data from the offload stack and stripping the previously added headers and reassembling the data from the TCP segments.
  • the same operations may need to be performed even when data is exchanged between two operating systems residing on the same hardware.
  • FIG. 1 shows a network environment within which an example embodiment may be implemented
  • FIG. 2 is a block diagram illustrating a system utilizing a common offload stack, in accordance with an example embodiment
  • FIG. 3 is a block diagram illustrating example operations performed by various components of a system to effectuate cut-through socket data transfer, in accordance with an example embodiment
  • FIG. 4 is a flow chart illustrating a method to effectuate cut-through socket data transfer, in accordance with an example embodiment
  • FIG. 5 illustrates a diagrammatic representation of an example machine in the form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed.
  • the physical memory pages that comprise the send and receive buffer space may be assigned by a virtual machine monitor to any of the images at any time.
  • the entire TCP stack, along with the associated data copies and packetization can be avoided by providing a cut-through socket layer that may be implemented as a part of a common offload stack.
  • data sent on the source socket may be directly put into the receiving socket's buffer.
  • the flow control may be provided to the sender based on the receiver's state, rather than based on the state of the sender's send buffer.
  • the memory pages storing the data that is to be transferred from a source image to a destination image may be remapped into the memory of the destination image, thereby avoiding any data copies.
  • the data transfer may be effectuated by changing the ownership of the associated data pages from the sender operating system image to the recipient operating system image instead of sending the transfer request through the TCP layer of the common offload stack.
  • the technique described herein is not limited to a TCP layer of the offload stack, but may be used, in some embodiments, to optimize data transfers between operating system images that utilize other network protocols that are capable of transferring a data stream or a message via an IP network, e.g., User Datagram Protocol (UDP) or Stream Control Transmission Protocol (SCTP).
  • UDP User Datagram Protocol
  • SCTP Stream Control Transmission Protocol
  • Example embodiments may be implemented in the context of a network environment.
  • An example of such a network is illustrated in FIG. 1 .
  • a network environment 100 may include a plurality of endpoint devices, such as an endpoint device 110 and an endpoint device 120 , coupled to a communications network 130 .
  • the communications network 130 may be a public network (e.g., the Internet, a wireless network, etc.) or a private network (e.g., LAN, WAN, Intranet, etc.).
  • the endpoint devices 110 and 120 may be, for example, server systems and may include a number of resources, such as multiple processor cores and memory, that are shared between operating systems 111 , 112 and 113 .
  • Each one of the operating systems 111 , 112 and 113 may be allocated some portion of the shared memory and some portion or all processing bandwidth of one or more processor cores.
  • Such a system may be referred to as a virtual system because, while the operating systems 111 , 112 and 113 may share resources, each of the operating systems may operate independently, utilizing their allocated resources, as if each was operating in a separate computer system.
  • the operating systems 111 and 112 both reside of the same device 110 , the operating systems 111 and 112 may function as separate network nodes (or, in other words, as separate end points in a network to which or from which data can be routed).
  • a common offload stack may be run as a guest operating system, rather than as a software element that requires a dedicated processor core. This approach may allow the common offload stack to be hosted on only a portion of a core, on an entire core, or on a plurality of cores or, alternatively, it may run within a hyper thread on a CPU. Thus, a plurality of other guest operating systems running on other cores, CPUs, or virtualized domains can share networking, block, and file services provided by the common offload stack.
  • An example common offload stack may operate as described in the U.S. provisional patent application Ser. No. 60/693,133, entitled “Network Stack Offloading Approaches” filed on Jun. 22, 2005, and in U.S. patent application Ser. No. 11/386,487, entitled “Zero-copy Network and File Offload for Web and Application Servers” filed on Mar. 22, 2006, which are herein incorporated by reference.
  • the common offload stack 114 on the endpoint device 110 may be utilized when the data exchange is requested between the operating systems residing on the same endpoint device, as well as when the data exchange is requested between the operating systems residing on different endpoint devices.
  • the common offload stack 114 may be configured such that the communications between the operating systems residing on the same endpoint device (e.g., communications between the operating systems 111 and 112 ) bypass the network layers of the common offload stack 114 .
  • FIG. 2 is a block diagram illustrating components of a system 200 utilizing a common offload stack.
  • one or more processor cores host guest operating systems 214 A and 214 B, each of which hosts applications 216 A and 216 B respectively.
  • the applications 216 A and 216 B may be completely unrelated and perform different functions or provide different services.
  • a common offload stack 250 may be hosted by a separate operating system, for example, by a BSD, Linux, Microsoft Windows, or embedded operating system that may be simplified with fewer functions than a typical general-purpose operating system and that may be structured with enhanced security features. Further, in an alternative embodiment, the functionality provided by the hosting operating system and the common offload stack 250 may be implemented in an optimized hardware such as in a special-purpose CPU core.
  • a guest operating system may host a common stack interface (CSI) front end, e.g., 222 A, 222 B, which provide a secure interface to the common offload stack 250 .
  • the applications 216 A and 216 B may establish socket interfaces to the common offload stack 250 utilizing socket calls modules 230 A and 230 B and the CSI front end (e.g., 222 A and 222 B) in order to obtain certain functions from the common offload stack 250 .
  • the common offload stack 250 comprises a CSI back end to receive calls from the guest operating systems, a kernel socket layer 254 to process the calls, a network protocol layer 256 and a network driver layer 258 .
  • the kernel socket layer 254 may, in turn, comprise a source/destination analyzer 255 A to determine whether the source and the destination associated with the received call reside on the same hardware system, and a cut-though socket module 255 B to process the call without invoking the functionality of the network protocol layers 256 and the network driver layers 258 .
  • a ring buffer interface 232 may be interposed between the guest operating systems 214 A and 214 B and the common offload stack 250 .
  • the ring buffer interface 232 may be configured to mediate mutually exclusive calls that may originate from the guest operating systems 214 A and 214 B.
  • the calls mediated through the ring buffer interface 232 arrive at a CSI back end 252 .
  • socket calls originated at the socket calls modules 230 A and 230 B of the guest operating systems terminate at the kernel socket layer 254 provided with the common offload stack 250 .
  • the functional elements of the operating system hosting the common offload stack 250 can supplement some of the functions of common offload stack 250 .
  • the common offload stack 250 may be configured to process messages via the TCP stack that is already provided with the FreeBSD.
  • FIG. 3 is a block diagram illustrating example operations performed by various components of a system 300 to effectuate cut-through socket data transfer.
  • the system 300 comprises guest operating systems 310 and 320 running user applications 312 and 322 respectively and a guest operating system 330 (e.g., FreeBSD) running a common offload stack 350 .
  • Each one of the guest operating systems 310 , 320 and 330 have access to respective memory pages 30 .
  • a virtual machine monitor 340 may be configured to manage memory that may be passed between the guest operating systems 310 , 320 and 330 , e.g., via a transmit ring 40 and a receive ring 50 .
  • the data structures of the transmit ring 40 and the receive ring 50 are managed via the ring buffer interface 232 of FIG. 2 .
  • the assumption is made that a socket has been established between the guest operating systems 310 and 320 in order to allow the guest operating systems 310 and 320 to communicate and exchange data with each other. It will be noted, that the guest operating systems 310 and 320 may be different operating systems or different versions of the same operating system.
  • the guest operating system 310 may initiate a send operation by writing the subject data to one or more memory pages (the source pages) from the pages 30 associated with the guest operating system 310 and sending the pointers to the source pages to a socket writer send call (e.g., the socket calls module 230 A of FIG. 2 ).
  • the pointers to the source pages and other information associated with the subject data are then placed in a send (SND) buffer 314 that resides in the kernel space of the guest operating system 310 .
  • SND send
  • the CSI FE 316 transfers into transmit ring 40 the pointers to the source pages and other relevant information (e.g., the amount of data to be used out of each page, etc.) and sends an event to the CSI back end 353 to indicate that data is available to be transferred to the guest operating system 320 .
  • the CSI back end 353 will detect the event, pull from the transmit ring 40 the available information (the pointers to the source pages, the length of the source pages, etc.) and send this information to a queue that it maintains (e.g., a transmit queue 355 A).
  • the CSI back end 353 allocates memory to manage the source memory pages and swaps the source memory pages with memory pages owned by a kernel of the offload stack 350 .
  • the CSI back end 353 determines whether the designated recipient for the subject data resides on the same hardware as the guest operating system 310 that originated the send request. This determination may be performed by the source/destination analyzer 255 A illustrated in FIG. 2 . For example, if the source/destination analyzer 255 A determines that the other end of the socket is on a different machine, then the CSI back end 353 may effectuate network protocol layer calls and send the subject data and other relevant information over the network interface of the common offload stack 350 and the guest operating system 330 that hosts it.
  • the CSI back end 353 determines that both ends of the socket established between a source OS and a destination OS (here, the guest operating systems 310 and 320 ) are on the same machine, the pointers to the source pages may be transferred from the transmit queue 355 A, via the connection's receive buffer 355 B, to the buffers of the receive ring 50 .
  • the receive buffer 355 B is a socket interface concept, where a “receive buffer” is provided per connection.
  • the rings 40 and 50 are used by all connections of a guest operating system. Thus, there is an instance of the rings 40 and 50 for each guest operating system.
  • the pointers to the source pages may be transferred to the receive buffer 324 maintained in the kernel space of the guest operating system 320 .
  • the guest operating system 320 may detect an indication that it has to pull information from the receive ring 50 , obtain the descriptors including the pointers to the source pages off the receive ring 50 and then put them in its own kernel specific receive buffer structures 324 . These operations may be accomplished utilizing the CSI FE 326 running in the kernel space of the guest operating system 320 .
  • the source pages may be accessed by an application 322 running in the user space of the guest operating system 320 by any means available to the guest operating system 320 .
  • the source data is written into a memory page. That memory page is transferred into the ownership of the receiving guest operating system such that there is no need for copying the memory.
  • the network stack of the common offload stack may be bypassed if both the source OS and the destination OS reside on the same endpoint device, which may further improve performance. An example method of a cut-through socket data transfer is described with reference to FIG. 4 .
  • FIG. 4 is a flow chart of a method 400 to effectuate cut-through socket data transfer, in accordance with an example embodiment.
  • the method 400 may be performed by processing logic that may comprise hardware (e.g., dedicated logic, programmable logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both.
  • processing logic resides at a computer system 110 of FIG. 1 .
  • the method 400 commences at operation 402 .
  • the processing logic detects a request for data transfer at a common offload stack, e.g., at the CSI back end 252 illustrated in FIG. 2 .
  • the processing logic of the common offload stack e.g., the source/destination analyzer 255 A of FIG. 2 ) determines the source operating system associated with the request and the destination operating system associated with the request at operation 406 .
  • the processing logic determines whether the source operating system and the destination operating system reside on the same endpoint device. If it is determined that the source operating system and the destination operating system do not reside on the same endpoint device, the common offload stack processes the request utilizing its network stack, e.g., the network protocol layers and the network driver layers (operation 410 ). If it is determined that the source operating system and the destination operating system share the same endpoint device, the common offload stack processes the request bypassing its network stack (operation 412 ), as discussed above with reference to FIG. 3 .
  • this processing is performed during connection setup for TCP, such that when the data transfer is occurring, the process 400 is utilized as a quick check.
  • the full procedure is performed with each packet.
  • the operations 410 and 412 may include a policy-based decision mechanism to determine whether to allow the page mapping based upon security settings or other rules (such as, e.g., compliance or licensing) that can restrict communications.
  • MPI-2 Message Passing Interface
  • SDP Sockets Direct Protocol
  • FIG. 5 shows a diagrammatic representation of machine in the example form of a computer system 500 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed.
  • the machine operates as a standalone device or may be connected to other machines.
  • the machine may operate in the capacity of a server or a client machine in server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • the machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a portable music player (e.g., a portable hard drive audio device such as an MP3 player), a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • PC personal computer
  • PDA Personal Digital Assistant
  • STB set-top box
  • portable music player e.g., a portable hard drive audio device such as an MP3 player
  • web appliance e.g., a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • machine shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
  • the example computer system 500 includes a processor 502 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), a main memory 504 and a static memory 506 , which communicate with each other via a bus 508 .
  • the computer system 500 may further include a video display unit 510 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)).
  • the computer system 500 also includes an alphanumeric input device 512 (e.g., a keyboard), a user interface (UI) navigation device 514 (e.g., a mouse), a disk drive unit 516 , a signal generation device 518 (e.g., a speaker) and a network interface device 520 .
  • a processor 502 e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both
  • main memory 504 e.g., RAM
  • static memory 506 e.g.,
  • the disk drive unit 516 includes a machine-readable medium 522 on which is stored one or more sets of instructions and data structures (e.g., software 524 ) embodying or utilized by any one or more of the methodologies or functions described herein.
  • the software 524 may also reside, completely or at least partially, within the main memory 504 and/or within the processor 502 during execution thereof by the computer system 500 , the main memory 504 and the processor 502 also constituting machine-readable media.
  • the software 524 may further be transmitted or received over a network 526 via the network interface device 520 utilizing any one of a number of well-known transfer protocols (e.g., HTTP).
  • HTTP transfer protocol
  • machine-readable medium 522 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
  • the term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions.
  • machine-readable medium shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media, and carrier wave signals. Such medium may also include, without limitation, hard disks, floppy disks, flash memory cards, digital video disks, random access memory (RAMs), read only memory (ROMs), and the like.
  • the embodiments described herein may be implemented in an operating environment comprising software installed on a computer, in hardware, or in a combination of software and hardware.

Abstract

A method and system to exchange information between computer applications are provided. The system may include a source operating system, a destination operating system and an offload stack, all residing on the device. The source operating system and the destination operating system appear to users as distinct network entities. The offload stack may be configured to function as an intermediate network device for the source operating system. The offload stack, in one embodiment, comprises a back end to receive a message from the source operating system to the destination operating system, an analyzer to determine that the destination operating system resides on the device and a cut though socket module to process the message such that a network layer of the offload stack is bypassed.

Description

TECHNICAL FIELD
This application relates to a method and system to transfer data utilizing cut-through sockets.
BACKGROUND
Recent trends in CPU chip design provide multiple CPU cores on the same die. The cores may share a common communications bus and main memory, but cache designs may vary to include separate L1 and L2, options for shared L3, shared L2 but separate L1, and direct access L1 across cores. The use of shared memory allocation techniques that can draw memory from a large shared pool has been used in some data processing approaches. Both multi-core CPUs and shared memory allocation techniques are now used on high-performance servers.
In some high-power servers, the large amount of CPU power that they provide may, in some cases, be under-utilized. Therefore, server users have begun deploying virtualization software that permits running multiple operating system instances (guest operating systems) on a single server. The opportunities provided by virtualization, real time monitor operating systems, and multi-core CPU chipsets may be combined and improved to produce a flexible open platform for I/O control and protection along with a common management interface as a beneficial side-effect. For example, one or more processors of an endpoint device may be dedicated as a network core. The network core may be configured to host a common offload stack to provide a unified network interface for the multiple operating system instances running on the endpoint device or host.
The common offload stack may appear to the guest operating systems as being on the network. As a result, the network, file, and storage I/O functionality may allow the offload stack to function, in effect, as an intermediate embedded network device capable of bridging, switching or even routing between operating systems on the server, and off of the server when operating in conjunction with other (external) network devices deeper in a network. An offload stack in the Open Systems Interconnection Reference Model (OSI model) may include, among other components, a Transmission Control Protocol (TCP) layer, an Internet Protocol (IP) layer and an Ethernet driver.
Data exchange between two operating systems using a TCP stack may include first converting the data into TCP segments, adding IP headers with IP addresses to the data and adding MAC addresses when the data is received at the offload stack and then sending the data from the offload stack and stripping the previously added headers and reassembling the data from the TCP segments. The same operations may need to be performed even when data is exchanged between two operating systems residing on the same hardware.
BRIEF DESCRIPTION OF DRAWINGS
Embodiments of the present invention are illustrated by way of example and not limited to the figures of the accompanying drawings, in which like references indicate similar elements and in which:
FIG. 1 shows a network environment within which an example embodiment may be implemented;
FIG. 2 is a block diagram illustrating a system utilizing a common offload stack, in accordance with an example embodiment;
FIG. 3 is a block diagram illustrating example operations performed by various components of a system to effectuate cut-through socket data transfer, in accordance with an example embodiment;
FIG. 4 is a flow chart illustrating a method to effectuate cut-through socket data transfer, in accordance with an example embodiment; and
FIG. 5 illustrates a diagrammatic representation of an example machine in the form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed.
DETAILED DESCRIPTION
In order to address issues associated with optimizing data transfers between two or more operating system images, a method and system are presented to transfer data utilizing cut-through sockets.
When sharing a network, block, or file system offload stack between multiple operating system images, the physical memory pages that comprise the send and receive buffer space may be assigned by a virtual machine monitor to any of the images at any time. In one embodiment, for data path connections between images that use socket protocols such as TCP, the entire TCP stack, along with the associated data copies and packetization, can be avoided by providing a cut-through socket layer that may be implemented as a part of a common offload stack.
For example, data sent on the source socket may be directly put into the receiving socket's buffer. Thus, the flow control may be provided to the sender based on the receiver's state, rather than based on the state of the sender's send buffer. The memory pages storing the data that is to be transferred from a source image to a destination image may be remapped into the memory of the destination image, thereby avoiding any data copies. In one embodiment, when both the source and the destination endpoints of the data transfer reside on the same hardware, the data transfer may be effectuated by changing the ownership of the associated data pages from the sender operating system image to the recipient operating system image instead of sending the transfer request through the TCP layer of the common offload stack.
It will be noted, that the technique described herein is not limited to a TCP layer of the offload stack, but may be used, in some embodiments, to optimize data transfers between operating system images that utilize other network protocols that are capable of transferring a data stream or a message via an IP network, e.g., User Datagram Protocol (UDP) or Stream Control Transmission Protocol (SCTP).
Example embodiments may be implemented in the context of a network environment. An example of such a network is illustrated in FIG. 1.
As shown in FIG. 1, a network environment 100 may include a plurality of endpoint devices, such as an endpoint device 110 and an endpoint device 120, coupled to a communications network 130. The communications network 130 may be a public network (e.g., the Internet, a wireless network, etc.) or a private network (e.g., LAN, WAN, Intranet, etc.).
The endpoint devices 110 and 120, may be, for example, server systems and may include a number of resources, such as multiple processor cores and memory, that are shared between operating systems 111, 112 and 113. Each one of the operating systems 111, 112 and 113 may be allocated some portion of the shared memory and some portion or all processing bandwidth of one or more processor cores. Such a system may be referred to as a virtual system because, while the operating systems 111, 112 and 113 may share resources, each of the operating systems may operate independently, utilizing their allocated resources, as if each was operating in a separate computer system. Thus, even though the operating systems 111 and 112 both reside of the same device 110, the operating systems 111 and 112 may function as separate network nodes (or, in other words, as separate end points in a network to which or from which data can be routed).
In the example endpoint device 110, the operating systems 111 and 112 have access to functions provided by a common offload stack 114. In one embodiment, a common offload stack may be run as a guest operating system, rather than as a software element that requires a dedicated processor core. This approach may allow the common offload stack to be hosted on only a portion of a core, on an entire core, or on a plurality of cores or, alternatively, it may run within a hyper thread on a CPU. Thus, a plurality of other guest operating systems running on other cores, CPUs, or virtualized domains can share networking, block, and file services provided by the common offload stack.
An example common offload stack may operate as described in the U.S. provisional patent application Ser. No. 60/693,133, entitled “Network Stack Offloading Approaches” filed on Jun. 22, 2005, and in U.S. patent application Ser. No. 11/386,487, entitled “Zero-copy Network and File Offload for Web and Application Servers” filed on Mar. 22, 2006, which are herein incorporated by reference.
It will be noted, that, in one embodiment, the common offload stack 114 on the endpoint device 110 may be utilized when the data exchange is requested between the operating systems residing on the same endpoint device, as well as when the data exchange is requested between the operating systems residing on different endpoint devices. As described below, the common offload stack 114 may be configured such that the communications between the operating systems residing on the same endpoint device (e.g., communications between the operating systems 111 and 112) bypass the network layers of the common offload stack 114.
FIG. 2 is a block diagram illustrating components of a system 200 utilizing a common offload stack. In the example embodiment of FIG. 2, one or more processor cores host guest operating systems 214A and 214B, each of which hosts applications 216A and 216B respectively. The applications 216A and 216B may be completely unrelated and perform different functions or provide different services.
A common offload stack 250 may be hosted by a separate operating system, for example, by a BSD, Linux, Microsoft Windows, or embedded operating system that may be simplified with fewer functions than a typical general-purpose operating system and that may be structured with enhanced security features. Further, in an alternative embodiment, the functionality provided by the hosting operating system and the common offload stack 250 may be implemented in an optimized hardware such as in a special-purpose CPU core.
A guest operating system (e.g., the guest operating systems 214A and 214B), in one embodiment, may host a common stack interface (CSI) front end, e.g., 222A, 222B, which provide a secure interface to the common offload stack 250. The applications 216A and 216B may establish socket interfaces to the common offload stack 250 utilizing socket calls modules 230A and 230B and the CSI front end (e.g., 222A and 222B) in order to obtain certain functions from the common offload stack 250.
The common offload stack 250, in one embodiment, comprises a CSI back end to receive calls from the guest operating systems, a kernel socket layer 254 to process the calls, a network protocol layer 256 and a network driver layer 258. The kernel socket layer 254 may, in turn, comprise a source/destination analyzer 255A to determine whether the source and the destination associated with the received call reside on the same hardware system, and a cut-though socket module 255B to process the call without invoking the functionality of the network protocol layers 256 and the network driver layers 258.
As shown in FIG. 2, a ring buffer interface 232 may be interposed between the guest operating systems 214A and 214B and the common offload stack 250. The ring buffer interface 232 may be configured to mediate mutually exclusive calls that may originate from the guest operating systems 214A and 214B. The calls mediated through the ring buffer interface 232 arrive at a CSI back end 252.
In one embodiment, socket calls originated at the socket calls modules 230A and 230B of the guest operating systems terminate at the kernel socket layer 254 provided with the common offload stack 250. In certain embodiments, the functional elements of the operating system hosting the common offload stack 250 can supplement some of the functions of common offload stack 250. For example, in a system that utilizes FreeBSD to host the common offload stack 250, the common offload stack 250 may be configured to process messages via the TCP stack that is already provided with the FreeBSD.
FIG. 3 is a block diagram illustrating example operations performed by various components of a system 300 to effectuate cut-through socket data transfer. The system 300 comprises guest operating systems 310 and 320 running user applications 312 and 322 respectively and a guest operating system 330 (e.g., FreeBSD) running a common offload stack 350. Each one of the guest operating systems 310, 320 and 330 have access to respective memory pages 30. A virtual machine monitor 340 may be configured to manage memory that may be passed between the guest operating systems 310, 320 and 330, e.g., via a transmit ring 40 and a receive ring 50. In one embodiment, the data structures of the transmit ring 40 and the receive ring 50 are managed via the ring buffer interface 232 of FIG. 2.
For the purposes of the discussion with reference to FIG. 3, the assumption is made that a socket has been established between the guest operating systems 310 and 320 in order to allow the guest operating systems 310 and 320 to communicate and exchange data with each other. It will be noted, that the guest operating systems 310 and 320 may be different operating systems or different versions of the same operating system.
In one embodiment, in order to transfer subject data to the application 322 running on the guest operating system 320, the guest operating system 310 may initiate a send operation by writing the subject data to one or more memory pages (the source pages) from the pages 30 associated with the guest operating system 310 and sending the pointers to the source pages to a socket writer send call (e.g., the socket calls module 230A of FIG. 2). The pointers to the source pages and other information associated with the subject data (collectively referred to as descriptors) are then placed in a send (SND) buffer 314 that resides in the kernel space of the guest operating system 310. Next, this information is processed by the front end of the common offload stack interface (the CSI FE 316).
The CSI FE 316 transfers into transmit ring 40 the pointers to the source pages and other relevant information (e.g., the amount of data to be used out of each page, etc.) and sends an event to the CSI back end 353 to indicate that data is available to be transferred to the guest operating system 320.
The CSI back end 353 will detect the event, pull from the transmit ring 40 the available information (the pointers to the source pages, the length of the source pages, etc.) and send this information to a queue that it maintains (e.g., a transmit queue 355A). In an example embodiment, the CSI back end 353 allocates memory to manage the source memory pages and swaps the source memory pages with memory pages owned by a kernel of the offload stack 350.
The CSI back end 353 then determines whether the designated recipient for the subject data resides on the same hardware as the guest operating system 310 that originated the send request. This determination may be performed by the source/destination analyzer 255A illustrated in FIG. 2. For example, if the source/destination analyzer 255A determines that the other end of the socket is on a different machine, then the CSI back end 353 may effectuate network protocol layer calls and send the subject data and other relevant information over the network interface of the common offload stack 350 and the guest operating system 330 that hosts it.
If the CSI back end 353 determines that both ends of the socket established between a source OS and a destination OS (here, the guest operating systems 310 and 320) are on the same machine, the pointers to the source pages may be transferred from the transmit queue 355A, via the connection's receive buffer 355B, to the buffers of the receive ring 50. It will be noted, that, in an example embodiment, the receive buffer 355B is a socket interface concept, where a “receive buffer” is provided per connection. The rings 40 and 50 are used by all connections of a guest operating system. Thus, there is an instance of the rings 40 and 50 for each guest operating system.
From the receive ring 50, the pointers to the source pages may be transferred to the receive buffer 324 maintained in the kernel space of the guest operating system 320. For example, the guest operating system 320 may detect an indication that it has to pull information from the receive ring 50, obtain the descriptors including the pointers to the source pages off the receive ring 50 and then put them in its own kernel specific receive buffer structures 324. These operations may be accomplished utilizing the CSI FE 326 running in the kernel space of the guest operating system 320. From the kernel space of the guest operating system 320, the source pages may be accessed by an application 322 running in the user space of the guest operating system 320 by any means available to the guest operating system 320.
Thus, when one guest operating system sends data over to another guest operating system, the source data is written into a memory page. That memory page is transferred into the ownership of the receiving guest operating system such that there is no need for copying the memory. Furthermore, the network stack of the common offload stack may be bypassed if both the source OS and the destination OS reside on the same endpoint device, which may further improve performance. An example method of a cut-through socket data transfer is described with reference to FIG. 4.
FIG. 4 is a flow chart of a method 400 to effectuate cut-through socket data transfer, in accordance with an example embodiment. The method 400 may be performed by processing logic that may comprise hardware (e.g., dedicated logic, programmable logic, microcode, etc.), software (such as run on a general purpose computer system or a dedicated machine), or a combination of both. In an example embodiment, processing logic resides at a computer system 110 of FIG. 1.
As shown in FIG. 4, the method 400 commences at operation 402. At operation 402, the processing logic detects a request for data transfer at a common offload stack, e.g., at the CSI back end 252 illustrated in FIG. 2. At operation 404, the processing logic of the common offload stack (e.g., the source/destination analyzer 255A of FIG. 2) determines the source operating system associated with the request and the destination operating system associated with the request at operation 406.
At operation 408, the processing logic determines whether the source operating system and the destination operating system reside on the same endpoint device. If it is determined that the source operating system and the destination operating system do not reside on the same endpoint device, the common offload stack processes the request utilizing its network stack, e.g., the network protocol layers and the network driver layers (operation 410). If it is determined that the source operating system and the destination operating system share the same endpoint device, the common offload stack processes the request bypassing its network stack (operation 412), as discussed above with reference to FIG. 3.
In an example embodiment, this processing is performed during connection setup for TCP, such that when the data transfer is occurring, the process 400 is utilized as a quick check. For UDP, the full procedure is performed with each packet. Furthermore, the operations 410 and 412 may include a policy-based decision mechanism to determine whether to allow the page mapping based upon security settings or other rules (such as, e.g., compliance or licensing) that can restrict communications.
Although the embodiments are described herein with reference to an offload stack interface, the techniques may be advantageously utilized with other stacks, e.g., Message Passing Interface (MPI-2), Sockets Direct Protocol (SDP), or other stream or message-passing protocols.
FIG. 5 shows a diagrammatic representation of machine in the example form of a computer system 500 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In alternative embodiments, the machine operates as a standalone device or may be connected to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client machine in server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a portable music player (e.g., a portable hard drive audio device such as an MP3 player), a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The example computer system 500 includes a processor 502 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), a main memory 504 and a static memory 506, which communicate with each other via a bus 508. The computer system 500 may further include a video display unit 510 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The computer system 500 also includes an alphanumeric input device 512 (e.g., a keyboard), a user interface (UI) navigation device 514 (e.g., a mouse), a disk drive unit 516, a signal generation device 518 (e.g., a speaker) and a network interface device 520.
The disk drive unit 516 includes a machine-readable medium 522 on which is stored one or more sets of instructions and data structures (e.g., software 524) embodying or utilized by any one or more of the methodologies or functions described herein. The software 524 may also reside, completely or at least partially, within the main memory 504 and/or within the processor 502 during execution thereof by the computer system 500, the main memory 504 and the processor 502 also constituting machine-readable media.
The software 524 may further be transmitted or received over a network 526 via the network interface device 520 utilizing any one of a number of well-known transfer protocols (e.g., HTTP).
While the machine-readable medium 522 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media, and carrier wave signals. Such medium may also include, without limitation, hard disks, floppy disks, flash memory cards, digital video disks, random access memory (RAMs), read only memory (ROMs), and the like.
The embodiments described herein may be implemented in an operating environment comprising software installed on a computer, in hardware, or in a combination of software and hardware.
Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims (18)

The invention claimed is:
1. A system to exchange information, the system comprising:
a source operating system residing on a device comprising one or more processors, source operating system configured to write data to a memory page;
a destination operating system residing on the device, the source operating system and the destination operating system being distinct network nodes and having a shared memory, the shared memory of the source operating system and the destination operating system comprising the memory page; and
an offload stack residing on the device and hosted by a third operating system separate from the source operating system and the destination operating system, the offload stack to function as an intermediate network device for the source operating system;
wherein the offload stack comprises:
a back end to receive a message from the source operating system to the destination operating system, the message from the source operating system comprising a request to transfer source data from the source operating system to the destination operating system,
an analyzer to determine that the source operating system and the destination operating system both reside on the same device, and
a cut through socket module to process the message such that a network layer of the offload stack is bypassed in response to the analyzer determining that the source operating system and the destination operating system both reside on the device, wherein processing the message such that the network layer is bypassed comprises transferring a pointer to the memory page from the source operating system to the destination operating system,
the offload stack being configured to transfer the source data from the source operating system to the destination operating system by transferring ownership of memory pages storing the source data.
2. The system of claim 1, wherein the offload stack is to run on a first processor core and the source operating system is to run on a second processor core.
3. The system of claim 1, wherein the offload stack comprises a transmit queue to receive pointers to the memory pages storing the source data.
4. The system of claim 1, further comprising:
an offload stack front end to run on the source operating system; and
a ring buffer interface to bridge the source operating system and the offload stack.
5. The system of claim 4, wherein the ring buffer interface is to receive pointers to memory pages storing data associated with the message from the source operating system.
6. The system of claim 1, wherein the source operating system and the destination operating system are distinct operating systems.
7. The system of claim 1, wherein the source operating system and the destination operating system are two versions of an operating system.
8. A method to exchange information between computer applications, the method comprising:
receiving, at an offload stack hosted on a device comprising one or more processors, a message from a source operating system to a destination operating system, the source operating system and the destination operating system being distinct network nodes and having a shared memory, the shared memory of the source operating system and the destination operating system comprising a memory page to which data from the source operating system has been written, the offload stack being hosted by a third operating system separate from the source operating system and the destination operating system, the message from the source operating system comprising a request to transfer source data from the source operating system to the destination operating system;
determining by the offload stack, that the source operating system and the destination operating system are both hosted on the device;
transferring the message to the destination operating system such that a network layer of the offload stack is bypassed in response to the determination that the source operating system and the destination operating system are both hosted on the device, wherein transferring the message such that the network layer is bypassed comprises transferring a pointer to the memory page from the source operating system to the destination operating system; and
transferring, by the offload stack, the source data from the source operating system to the destination operating system by transferring ownership of memory pages storing the source data.
9. The method of claim 8, wherein the processing of the message such that a network layer of the offload stack is bypassed comprises:
determining a pointer to a memory page associated with the message from the source operating system;
placing the pointer into a transmit queue of the offload stack; and
sending the pointer to a socket layer of the destination operating system.
10. The method of claim 9, further comprising:
allocating memory to manage the memory page associated with the message from the source; and
swapping the memory page associated with the message from the source with a memory page owned by a kernel of the offload stack.
11. The method of claim 10, further comprising swapping the memory page owned by the kernel of the offload stack with a memory page owned by the destination operating system.
12. The method of claim 8, wherein the source operating system and the destination operating system are distinct operating systems.
13. The method of claim 8, wherein the source operating system and the destination operating system are two versions of an operating system.
14. The method of claim 8 comprising establishing a socket between the source operating system and the destination operating system.
15. The method of claim 14, wherein the established socket uses a Transmission Control Protocol (TCP).
16. The method of claim 14, wherein the established socket uses a User Datagram Protocol (UDP).
17. A non-transitory machine-readable storage medium having instructions which, when executed by a machine, cause the machine to:
receive, at an offload stack hosted on a device, a message from a source operating system to a destination operating system, the source operating system and the destination operating system being distinct network nodes and having a shared memory, the shared memory of the source operating system and the destination operating system comprising a memory page to which data from the source operating system has been written, the offload stack being hosted by a third operating system separate from the source operating system and the destination operating system, the message from the source operating system comprising a request to transfer source data from the source operating system to the destination operating system;
determine, at the offload stack, that the source operating system and the destination operating system are both hosted on the device;
transfer the message to the destination operating system such that a network layer of the offload stack is bypassed in response to the determination that the source operating system and the destination operating system are both hosted on the device, wherein transferring the message such that the network layer is bypassed comprises transferring a pointer to the memory page from the source operating system to the destination operating system; and
transfer, by the offload stack, the source data from the source operating system to the destination operating system by transferring ownership of memory pages storing the source data.
18. A system to exchange information between computer applications, the system comprising:
means for receiving, at an offload stack hosted on a device, a message from a source operating system to a destination operating system, the source operating system and the destination operating system being distinct network nodes and having a shared memory, the shared memory of the source operating system and the destination operating system comprising a memory page to which data from the source operating system has been written, the offload stack being hosted by a third operating system separate from the source operating system and the destination operating system, the message from the source operating system comprising a request to transfer source data from the source operating system to the destination operating system;
means for determining, at the offload stack, that the source operating system and the destination operating system are both hosted on the device; and
means for transferring the message to the destination operating system such that a network layer of the offload stack is bypassed in response to the determination that the source operating system and the destination operating system are both hosted on the device, wherein the means for transferring the message comprises means for transferring a pointer to the memory page from the source operating system to the destination operating system; and
means for transferring, at the offload stack, the source data from the source operating system to the destination operating system by transferring ownership of memory pages storing the source data.
US11/468,942 2006-08-31 2006-08-31 Method and system to transfer data utilizing cut-through sockets Active 2031-05-20 US8819242B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/468,942 US8819242B2 (en) 2006-08-31 2006-08-31 Method and system to transfer data utilizing cut-through sockets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/468,942 US8819242B2 (en) 2006-08-31 2006-08-31 Method and system to transfer data utilizing cut-through sockets

Publications (2)

Publication Number Publication Date
US20080059644A1 US20080059644A1 (en) 2008-03-06
US8819242B2 true US8819242B2 (en) 2014-08-26

Family

ID=39153351

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/468,942 Active 2031-05-20 US8819242B2 (en) 2006-08-31 2006-08-31 Method and system to transfer data utilizing cut-through sockets

Country Status (1)

Country Link
US (1) US8819242B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9787529B1 (en) * 2015-01-16 2017-10-10 Juniper Networks, Inc. Systems and methods for tunneling socket calls across operating systems
US9882972B2 (en) 2015-10-30 2018-01-30 International Business Machines Corporation Packet forwarding optimization without an intervening load balancing node

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080086603A1 (en) * 2006-10-05 2008-04-10 Vesa Lahtinen Memory management method and system
US7941812B2 (en) * 2007-01-30 2011-05-10 Hewlett-Packard Development Company, L.P. Input/output virtualization through offload techniques
US8739179B2 (en) * 2008-06-30 2014-05-27 Oracle America Inc. Method and system for low-overhead data transfer
WO2010145709A1 (en) * 2009-06-18 2010-12-23 Telefonaktiebolaget Lm Ericsson (Publ) Data flow in peer-to-peer networks
US8635632B2 (en) * 2009-10-21 2014-01-21 International Business Machines Corporation High performance and resource efficient communications between partitions in a logically partitioned system
US8630173B2 (en) * 2010-11-19 2014-01-14 Cisco Technology, Inc. Dynamic queuing and pinning to improve quality of service on uplinks in a virtualized environment
GB2528441B (en) * 2014-07-21 2016-05-18 Ibm Routing communication between computing platforms
CN105791315B (en) * 2016-04-25 2019-05-14 网宿科技股份有限公司 A kind of udp protocol acceleration method and system

Citations (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4145739A (en) * 1977-06-20 1979-03-20 Wang Laboratories, Inc. Distributed data processing system
US4484264A (en) * 1980-10-20 1984-11-20 Inventio Ag Multiprocessor system
US4945473A (en) * 1987-05-15 1990-07-31 Bull Hn Information Systems Inc. Communications controller interface
US5175818A (en) * 1988-02-23 1992-12-29 Hitachi, Ltd. Communication interface for independently generating frame information that is subsequently stored in host memory and sent out to transmitting fifo by dma
US5247616A (en) * 1989-10-23 1993-09-21 International Business Machines Corporation Computer system having different communications facilities and data transfer processes between different computers
US5517662A (en) * 1991-11-19 1996-05-14 International Business Machines Corporation Multiprocessor system with distributed memory
US5557744A (en) * 1992-12-18 1996-09-17 Fujitsu Limited Multiprocessor system including a transfer queue and an interrupt processing unit for controlling data transfer between a plurality of processors
US5884046A (en) * 1996-10-23 1999-03-16 Pluris, Inc. Apparatus and method for sharing data and routing messages between a plurality of workstations in a local area network
US6052737A (en) * 1998-04-15 2000-04-18 International Business Machines Corporation Computer system, program product and method for dynamically optimizing a communication protocol for supporting more users
US6085277A (en) * 1997-10-15 2000-07-04 International Business Machines Corporation Interrupt and message batching apparatus and method
US6141701A (en) * 1997-03-13 2000-10-31 Whitney; Mark M. System for, and method of, off-loading network transactions from a mainframe to an intelligent input/output device, including off-loading message queuing facilities
US6233619B1 (en) * 1998-07-31 2001-05-15 Unisys Corporation Virtual transport layer interface and messaging subsystem for high-speed communications between heterogeneous computer systems
US20010005381A1 (en) * 1999-12-27 2001-06-28 Nec Corporation ATM edge node switching equipment utilized IP-VPN function
US6360262B1 (en) * 1997-11-24 2002-03-19 International Business Machines Corporation Mapping web server objects to TCP/IP ports
US6366583B2 (en) * 1996-08-07 2002-04-02 Cisco Technology, Inc. Network router integrated onto a silicon chip
US20020062389A1 (en) * 2000-09-01 2002-05-23 Airsys Atm Sa Multiprocess computer system
US6427171B1 (en) * 1997-10-14 2002-07-30 Alacritech, Inc. Protocol processing stack for use with intelligent network interface device
US20020143962A1 (en) * 2001-03-14 2002-10-03 Siemens Information And Communication Networks, Inc. Dynamic loading of protocol stacks under signaling control
US20030014544A1 (en) * 2001-02-15 2003-01-16 Banderacom Infiniband TM work queue to TCP/IP translation
US20040003131A1 (en) * 2002-06-28 2004-01-01 International Business Machines Corporation Apparatus and method for monitoring and routing status messages
US6678726B1 (en) * 1998-04-02 2004-01-13 Microsoft Corporation Method and apparatus for automatically determining topology information for a computer within a message queuing network
US20040013117A1 (en) * 2002-07-18 2004-01-22 Ariel Hendel Method and apparatus for zero-copy receive buffer management
US6697868B2 (en) * 2000-02-28 2004-02-24 Alacritech, Inc. Protocol processing stack for use with intelligent network interface device
US20040039672A1 (en) * 2001-06-19 2004-02-26 Predrag Zivic Trust model router
US20040042487A1 (en) * 2002-08-19 2004-03-04 Tehuti Networks Inc. Network traffic accelerator system and method
US20040095237A1 (en) * 1999-01-09 2004-05-20 Chen Kimball C. Electronic message delivery system utilizable in the monitoring and control of remote equipment and method of same
US6751676B2 (en) * 2000-02-04 2004-06-15 Fujitsu Limited Network control system, network apparatus, repeater, and connecting apparatus
US6757744B1 (en) * 1999-05-12 2004-06-29 Unisys Corporation Distributed transport communications manager with messaging subsystem for high-speed communications between heterogeneous computer systems
US6757725B1 (en) * 2000-04-06 2004-06-29 Hewlett-Packard Development Company, Lp. Sharing an ethernet NIC between two sub-systems
US20040199732A1 (en) * 2003-04-07 2004-10-07 Kelley Timothy M. System and method for processing high priority data elements
US20040230794A1 (en) * 2003-05-02 2004-11-18 Paul England Techniques to support hosting of a first execution environment by a second execution environment with protection for the first execution environment
US20040249957A1 (en) * 2003-05-12 2004-12-09 Pete Ekis Method for interface of TCP offload engines to operating systems
US20040250253A1 (en) * 2003-03-20 2004-12-09 Hisham Khartabil Method and apparatus for providing multi-client support in a sip-enabled terminal
US20050021680A1 (en) * 2003-05-12 2005-01-27 Pete Ekis System and method for interfacing TCP offload engines using an interposed socket library
US20060004933A1 (en) * 2004-06-30 2006-01-05 Sujoy Sen Network interface controller signaling of connection event
US20060005186A1 (en) * 2004-06-30 2006-01-05 Microsoft Corporation Systems and methods for stack-jumping between a virtual machine and a host environment
US20060036570A1 (en) * 2004-08-03 2006-02-16 Softricity, Inc. System and method for controlling inter-application association through contextual policy control
US20060104295A1 (en) * 2004-11-16 2006-05-18 Secure64 Software Corporation Queued, asynchronous communication architecture interface
US20060206904A1 (en) * 2005-03-11 2006-09-14 Microsoft Corporation Systems and methods for supporting device access from multiple operating systems
US20060294234A1 (en) * 2005-06-22 2006-12-28 Cisco Technology, Inc. Zero-copy network and file offload for web and application servers
US20070011272A1 (en) * 2005-06-22 2007-01-11 Mark Bakke Offload stack for network, block and file input and output
US20070083638A1 (en) * 2005-08-31 2007-04-12 Microsoft Corporation Offloaded neighbor cache entry synchronization
US20070124474A1 (en) * 2005-11-30 2007-05-31 Digital Display Innovations, Llc Multi-user display proxy server
US20070204265A1 (en) * 2006-02-28 2007-08-30 Microsoft Corporation Migrating a virtual machine that owns a resource such as a hardware device
US7362709B1 (en) * 2001-11-02 2008-04-22 Arizona Board Of Regents Agile digital communication network with rapid rerouting
US7937447B1 (en) * 2004-07-22 2011-05-03 Xsigo Systems Communication between computer systems over an input/output (I/O) bus
US7941800B2 (en) * 2006-02-23 2011-05-10 Microsoft Corporation Transferring data between virtual machines by way of virtual machine bus in pipe mode

Patent Citations (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4145739A (en) * 1977-06-20 1979-03-20 Wang Laboratories, Inc. Distributed data processing system
US4484264A (en) * 1980-10-20 1984-11-20 Inventio Ag Multiprocessor system
US4945473A (en) * 1987-05-15 1990-07-31 Bull Hn Information Systems Inc. Communications controller interface
US5175818A (en) * 1988-02-23 1992-12-29 Hitachi, Ltd. Communication interface for independently generating frame information that is subsequently stored in host memory and sent out to transmitting fifo by dma
US5247616A (en) * 1989-10-23 1993-09-21 International Business Machines Corporation Computer system having different communications facilities and data transfer processes between different computers
US5517662A (en) * 1991-11-19 1996-05-14 International Business Machines Corporation Multiprocessor system with distributed memory
US5557744A (en) * 1992-12-18 1996-09-17 Fujitsu Limited Multiprocessor system including a transfer queue and an interrupt processing unit for controlling data transfer between a plurality of processors
US6366583B2 (en) * 1996-08-07 2002-04-02 Cisco Technology, Inc. Network router integrated onto a silicon chip
US5884046A (en) * 1996-10-23 1999-03-16 Pluris, Inc. Apparatus and method for sharing data and routing messages between a plurality of workstations in a local area network
US6141701A (en) * 1997-03-13 2000-10-31 Whitney; Mark M. System for, and method of, off-loading network transactions from a mainframe to an intelligent input/output device, including off-loading message queuing facilities
US6427171B1 (en) * 1997-10-14 2002-07-30 Alacritech, Inc. Protocol processing stack for use with intelligent network interface device
US6085277A (en) * 1997-10-15 2000-07-04 International Business Machines Corporation Interrupt and message batching apparatus and method
US6360262B1 (en) * 1997-11-24 2002-03-19 International Business Machines Corporation Mapping web server objects to TCP/IP ports
US6678726B1 (en) * 1998-04-02 2004-01-13 Microsoft Corporation Method and apparatus for automatically determining topology information for a computer within a message queuing network
US6052737A (en) * 1998-04-15 2000-04-18 International Business Machines Corporation Computer system, program product and method for dynamically optimizing a communication protocol for supporting more users
US6233619B1 (en) * 1998-07-31 2001-05-15 Unisys Corporation Virtual transport layer interface and messaging subsystem for high-speed communications between heterogeneous computer systems
US20040095237A1 (en) * 1999-01-09 2004-05-20 Chen Kimball C. Electronic message delivery system utilizable in the monitoring and control of remote equipment and method of same
US6757744B1 (en) * 1999-05-12 2004-06-29 Unisys Corporation Distributed transport communications manager with messaging subsystem for high-speed communications between heterogeneous computer systems
US20010005381A1 (en) * 1999-12-27 2001-06-28 Nec Corporation ATM edge node switching equipment utilized IP-VPN function
US6751676B2 (en) * 2000-02-04 2004-06-15 Fujitsu Limited Network control system, network apparatus, repeater, and connecting apparatus
US6697868B2 (en) * 2000-02-28 2004-02-24 Alacritech, Inc. Protocol processing stack for use with intelligent network interface device
US6757725B1 (en) * 2000-04-06 2004-06-29 Hewlett-Packard Development Company, Lp. Sharing an ethernet NIC between two sub-systems
US20020062389A1 (en) * 2000-09-01 2002-05-23 Airsys Atm Sa Multiprocess computer system
US20030014544A1 (en) * 2001-02-15 2003-01-16 Banderacom Infiniband TM work queue to TCP/IP translation
US20020143962A1 (en) * 2001-03-14 2002-10-03 Siemens Information And Communication Networks, Inc. Dynamic loading of protocol stacks under signaling control
US20040039672A1 (en) * 2001-06-19 2004-02-26 Predrag Zivic Trust model router
US7362709B1 (en) * 2001-11-02 2008-04-22 Arizona Board Of Regents Agile digital communication network with rapid rerouting
US20040003131A1 (en) * 2002-06-28 2004-01-01 International Business Machines Corporation Apparatus and method for monitoring and routing status messages
US20040013117A1 (en) * 2002-07-18 2004-01-22 Ariel Hendel Method and apparatus for zero-copy receive buffer management
US20040042487A1 (en) * 2002-08-19 2004-03-04 Tehuti Networks Inc. Network traffic accelerator system and method
US20040250253A1 (en) * 2003-03-20 2004-12-09 Hisham Khartabil Method and apparatus for providing multi-client support in a sip-enabled terminal
US20040199732A1 (en) * 2003-04-07 2004-10-07 Kelley Timothy M. System and method for processing high priority data elements
US20040230794A1 (en) * 2003-05-02 2004-11-18 Paul England Techniques to support hosting of a first execution environment by a second execution environment with protection for the first execution environment
US20050021680A1 (en) * 2003-05-12 2005-01-27 Pete Ekis System and method for interfacing TCP offload engines using an interposed socket library
US20040249957A1 (en) * 2003-05-12 2004-12-09 Pete Ekis Method for interface of TCP offload engines to operating systems
US20060005186A1 (en) * 2004-06-30 2006-01-05 Microsoft Corporation Systems and methods for stack-jumping between a virtual machine and a host environment
US20060004933A1 (en) * 2004-06-30 2006-01-05 Sujoy Sen Network interface controller signaling of connection event
US7937447B1 (en) * 2004-07-22 2011-05-03 Xsigo Systems Communication between computer systems over an input/output (I/O) bus
US20060036570A1 (en) * 2004-08-03 2006-02-16 Softricity, Inc. System and method for controlling inter-application association through contextual policy control
US20060104295A1 (en) * 2004-11-16 2006-05-18 Secure64 Software Corporation Queued, asynchronous communication architecture interface
US20060206904A1 (en) * 2005-03-11 2006-09-14 Microsoft Corporation Systems and methods for supporting device access from multiple operating systems
US20070011272A1 (en) * 2005-06-22 2007-01-11 Mark Bakke Offload stack for network, block and file input and output
US20060294234A1 (en) * 2005-06-22 2006-12-28 Cisco Technology, Inc. Zero-copy network and file offload for web and application servers
US20070083638A1 (en) * 2005-08-31 2007-04-12 Microsoft Corporation Offloaded neighbor cache entry synchronization
US20070124474A1 (en) * 2005-11-30 2007-05-31 Digital Display Innovations, Llc Multi-user display proxy server
US7941800B2 (en) * 2006-02-23 2011-05-10 Microsoft Corporation Transferring data between virtual machines by way of virtual machine bus in pipe mode
US20070204265A1 (en) * 2006-02-28 2007-08-30 Microsoft Corporation Migrating a virtual machine that owns a resource such as a hardware device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
McLaughlin, L., "Making Multicore Fly", [online]. Technology Review [observed on Dec. 16, 2005]. Retrieved from the Internet: , (2005©), 3 pgs.
McLaughlin, L., "Making Multicore Fly", [online]. Technology Review [observed on Dec. 16, 2005]. Retrieved from the Internet: <URL: http:/www.technologyreview.com/read—article.aspx?id=16060&ch=infotech>, (2005©), 3 pgs.

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9787529B1 (en) * 2015-01-16 2017-10-10 Juniper Networks, Inc. Systems and methods for tunneling socket calls across operating systems
US9882972B2 (en) 2015-10-30 2018-01-30 International Business Machines Corporation Packet forwarding optimization without an intervening load balancing node

Also Published As

Publication number Publication date
US20080059644A1 (en) 2008-03-06

Similar Documents

Publication Publication Date Title
US8819242B2 (en) Method and system to transfer data utilizing cut-through sockets
US11934341B2 (en) Virtual RDMA switching for containerized
US7996569B2 (en) Method and system for zero copy in a virtualized network environment
US8830870B2 (en) Network adapter hardware state migration discovery in a stateful environment
US9588807B2 (en) Live logical partition migration with stateful offload connections using context extraction and insertion
US8156230B2 (en) Offload stack for network, block and file input and output
US9473596B2 (en) Using transmission control protocol/internet protocol (TCP/IP) to setup high speed out of band data communication connections
US9910687B2 (en) Data flow affinity for heterogenous virtual machines
US8370855B2 (en) Management of process-to-process intra-cluster communication requests
CN114745341A (en) Application level network queuing
US9936049B2 (en) Protocol independent way for dynamically selecting data compression methods for redirected USB devices
US9009214B2 (en) Management of process-to-process inter-cluster communication requests
Yu et al. Freeflow: High performance container networking
US9098354B2 (en) Management of application to I/O device communication requests between data processing systems
US8521895B2 (en) Management of application to application communication requests between data processing systems
US11561916B2 (en) Processing task deployment in adapter devices and accelerators
US10523741B2 (en) System and method for avoiding proxy connection latency
US8560594B2 (en) Management of process-to-process communication requests
Neville-Neil Whither Sockets? High bandwidth, low latency, and multihoming challenge the sockets API.

Legal Events

Date Code Title Description
AS Assignment

Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BAKKE, MARK A.;THOMPSON, DAVID PATRICK;KUIK, TIMOTHY J.;AND OTHERS;REEL/FRAME:018195/0231;SIGNING DATES FROM 20060807 TO 20060829

Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BAKKE, MARK A.;THOMPSON, DAVID PATRICK;KUIK, TIMOTHY J.;AND OTHERS;SIGNING DATES FROM 20060807 TO 20060829;REEL/FRAME:018195/0231

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8