WO2002089000A1 - A system for caching data during peer-to-peer data transfer - Google Patents

A system for caching data during peer-to-peer data transfer Download PDF

Info

Publication number
WO2002089000A1
WO2002089000A1 PCT/AU2002/000518 AU0200518W WO02089000A1 WO 2002089000 A1 WO2002089000 A1 WO 2002089000A1 AU 0200518 W AU0200518 W AU 0200518W WO 02089000 A1 WO02089000 A1 WO 02089000A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
stored
server
peer
transfer
Prior art date
Application number
PCT/AU2002/000518
Other languages
French (fr)
Inventor
Mark David Haselden
Clayton Andrew Bell
Original Assignee
Iinet Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iinet Limited filed Critical Iinet Limited
Publication of WO2002089000A1 publication Critical patent/WO2002089000A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]

Definitions

  • the present invention relates to a system for caching data during peer-to-peer data transfer in a server mediated peer-to-peer file-sharing system, particularly, although not exclusively, using the Internet, and for sharing audio files.
  • the Internet is a network of computers that communicate via communication links such as a telephone network, and allow the exchange of information between the computers.
  • the World Wide Web is a part of the Internet, which allows networked computers to send graphical images and data (including audio and video information) between the computers.
  • computers - commonly referred to as servers - provide data (or "web pages") in the form of text and graphical images that can be downloaded to remote terminals such as a personal computer ("PC"), where a user can view the web pages.
  • web pages can be used to display information on the display of the remote terminal, for example to give information about a particular service provided by the host of a web site, or products and services provided by the host.
  • the pages may include video and audio information and allow viewers of the web page to view the information.
  • These web pages are viewed at the remote terminal using a so-called web browser.
  • Each web site (and individual web pages) is uniquely identifiable by means of a web address, or Uniform Resource Locater ("URL").
  • URL Uniform Resource Locater
  • Hyper Text Mark Up Language is a text and graphics formatting software that is commonly used to produce and view web pages, and the protocol used by web browsers and servers transferring HTML based files is the Hypertext Transfer Protocol ("HTTP").
  • HTTP Hypertext Transfer Protocol
  • the well-known port number for HTTP transmission is port 80, for example.
  • TCP/IP Transmission Control Protocol - Internet Protocol
  • protocol stack protocol stack
  • the HTTP protocol operates at a higher layer than the TCP/IP protocol.
  • IP address Internet Protocol address
  • ISP Internet Service Provider
  • ISPs use a caching system where data that is frequently accessed is stored at the ISP. When that data is next requested by a user, it is provided from the local cache rather than connecting the user's terminal to the required server.
  • existing caching systems use the known port number of HTTP transfers to capture requests for data and to redirect those requests to a local cache, if appropriate.
  • the Internet can also be used to facilitate the sharing of files between users.
  • One popular use is the sharing of audio files - typically those in the so-called MP3 format - using a server mediated peer-to-peer file sharing system.
  • users access a Web site for details of other remote users connected to the web site who are willing to share files, the user then accesses another remote user's computer directly and downloads the required files from that remote users computer. This is illustrated in Figure 1 , and is described in more detail below.
  • a user 4 accesses a file-sharing system meta-server 2 using the Internet and the services of his ISP.
  • the user 4 will be accessing the meta-server 2 using a terminal such as a personal computer equipped with a modem and the usual user interfaces as is well known.
  • the meta-server 2 will return the address of one or more file-sharing system servers 3 which the user 4 will use to access details of files available for download.
  • the system server 3 Upon connection to the system server 3 - using for example, known login techniques - the user 4 will search or browse for files of interest and available for sharing by other remote users 5 of the system and currently connected to it. This is done by sending either a SEARCH_REQUEST or BROWSE_REQUEST to the system server 3.
  • a SEARCH_REQUEST the user 4 will specify certain characteristics of a file that is required, and the lists of all remote users 5 connected to the system servers 3 will be examined to see if a requested file, with these characteristics, is available to share.
  • a BROWSE_REQUEST the user will supply details of a particular known remote user, and all files that the remote user (if connected to the system server 3) is willing to share will be examined.
  • the system server 3 then issues either a SEARCH_RESPONSE or BROWSE_RESPONSE back to the user 4 containing information on the files that are of interest to the user 4. Typically the results of the search or browse operation will be displayed on the user's remote terminal.
  • the user wishes to download one of the files, he issues a DOWNLOAD_REQUEST to the system server 3.
  • the system server 3 then issues a DOWNLOAD_ACK message that contains the address of the remote user 5 who has the file available for download.
  • the user 4 - without intervention from the system server 3 - makes a connection to the remote user 5 and the file is copied to the user 4.
  • MP3 audio files are often large in size, and, because this type of file sharing can be extremely popular, it can use a significant proportion of an ISP's bandwidth. Because this file transfer is peer-to-peer, it is not carried over any specific port - the port to be used for a particular file transfer is negotiated between the two communicating peers on a transfer by transfer basis. This makes it difficult to cache. Further, the data is transient - it is only accessible while the remote user 5 that is sharing the file is connected to the Internet, further increasing the difficulty of caching the data. Disclosure of the Invention
  • a system for caching data during peer-to-peer data transfer between computers in a network of computers including:
  • a first database for dynamically storing information regarding the data available for transfer from one or more computers in the network of computers
  • control means operable to transfer data selected from the available data to a first computer in the network in response to a request for the selected data from the first computer;
  • control means is further operable to search the first database in response to a request for the selected data, to determine if the data is stored in the storage means, and to transfer the data from the storage means to the first computer if the data is stored therein, and, if the data is not available in the first database, to transfer the selected data from another computer in the network having the selected data stored therein, to the storage means for storage therein, and to the first computer.
  • files shared through a peer-to-peer file transfer system can be easily cached.
  • control means comprises a first server and a second server, the first server being coupled to a remote server for receiving information regarding the available data selected therefrom and for storing the information in the first database, the second server including a second database having information to identify the data selected stored in the storage means and their storage locations therein and being coupled to the first server for communication therewith. More preferably, the second database identifies the data selected stored in the storage means by means of hashing methods or other unique identified encoding methods.
  • the second database identifies the data selected stored in the storage means by means of an MD5 hash value.
  • the second server may be operable, in response to a request from the first server, to search the second database for the identifying data to thereby determine if the data selected is stored in the storage means, and its storage therein.
  • the second server may be operable, if the data selected is determined to be stored in the storage means, to send information on the storage location to the first server, the first server being further operable, in response to the received storage location information, to send this information to the first computer.
  • the second server may be operable, if the selected data is determined not to be stored in the storage means, to send information on an available for storage location for the selected data to the first server, the first server being further operable, in response to the received storage location information, to send this information to the first computer.
  • a method of caching data during peer-to-peer data transfer including:
  • the search of the first database determines that the selected data is not stored in the storage means, transferring the selected data from another computer in the network of computers having the selected data stored therein to the storage means and to the first computer.
  • the method includes the steps of
  • a third aspect of the invention there is provided data transferred between computers in a network of computers, the data having been transferred by a control means in response to a request from a first computer in the network wherein the data has been selected from a set of dynamic data available for transfer stored in a first database and the control means has searched a storage means that stores a copy of data previously transferred by the control means to determine if the data is stored in the storage means and, if the control means determines that the data is stored in the storage means, to initiate the transfer of the data to the first computer from the storage means and, if the control means determines that the data is not stored in the storage means, to initiate the transfer of the data to the first computer and the storage means from a second computer in the network having the selected data stored therein.
  • control means determines whether the data is stored in the storage means by searching a second database containing information that identifies the data stored in the first database and their storage locations.
  • Figure 1 is a schematic representation of a peer-to-peer file sharing system of the prior art
  • Figure 2 is a schematic representation of a system for caching data during peer-to-peer data transfer of the present invention.
  • Figure 2 illustrates a system for caching data during peer-to-peer data transfer of the present invention.
  • a user 4 is connected to the Internet via his ISP, in a known manner, and can access web sites by typing in the desired URL in his web browser.
  • the ISP is provided with an ISP server 6 which acts as a gateway to provide connection to the Internet for the user 4 (as is well known in the art).
  • the ISP is also provided with a cache system, which communicates with the ISP server 6.
  • the cache system comprises a proxy server 8, a cache director 7, a cache manager 9 and one or more cache servers 10 - whose functions will be described in more detail below.
  • the ISP server 6 In response to the connection request from the user 4, the ISP server 6 will redirect the user 4 to a cache director 7 for connection thereto. In response to this connection, the cache director 7 returns the address of the proxy server 8 via ISP server 6.
  • the ISP server 6 is able to capture the initial connection request because it uses the TCP/IP protocol with a known port number.
  • the cache director 7 may be omitted. It is used in this embodiment to emulate the meta-server of existing peer-to-peer file sharing systems such as the Napster system operated by Napster Inc. of Redwood City, California.
  • the user 4 then connects to the proxy server 8 via ISP server 6.
  • the proxy server 8 Upon receipt of a connection attempt from the user 4, the proxy server 8 connects to the meta-server 2 by a direct TCP/IP connection and retrieves an IP address of a system server 3.
  • Meta-server 2 contains a large collection of valid system server addresses - the proxy server 8, retrieves one of these by querying meta-server 2
  • the proxy server 8 then makes a direct connection to the system server 3 using the address retrieved from its query of meta- server 2.
  • the proxy server 8 acts as an intermediary between the user 4 and the system server 3, and is operable to forward all messages it receives from the user 4 to the system server 3, and vice versa. These messages are forwarded verbatim until the system server 3 sends either a BROWSE_RESPONSE or SEARCH_RESPONSE message, in reply to a BROWSE_REQUEST or SEARCH_REQUEST message from the user 4 to the system server 3 to browse or search the files available for download.
  • the proxy server 8 When a BROWSE_RESPONSE or SEARCH_RESPONSE message is received by the proxy server 8, the proxy server 8 is operable to store details of the files available for sharing in its memory. These details include the filename, the MD5 hash of the first 300kB, the file size, encoding bitrate, length (time), the nickname (or nick) of the user sharing the file and the IP address of the user sharing the file. Some or all of these details will also be displayed at the user's remote terminal in the usual manner.
  • the user 4 issues a DOWNLOAD_REQUEST message that is sent - via proxy server 8 - to system server 3.
  • the system server 3 responds with a DOWNLOAD_ACK message, which is sent to the proxy server 8.
  • the proxy server 8 in response to the received DOWNLOAD_ACK message, connects to cache manager 9, which - in response - will return the address of one of the cache servers 10 to the proxy server 8.
  • the cache manager 9 includes a database 12.
  • the database is an SQL database, but could be any mechanism for persistent storage.
  • the address returned by the cache manager 9 in response to a received message from the proxy server 8 will depend upon whether the file requested by the user 4 is already stored (or cached) locally in one of the cache servers 10, or whether it must be downloaded from a remote user 5 who has the file available for sharing.
  • the cache manager 9 will search its database 12, firstly to determine if a file exists with the same MD5 hash value. If this is the case, the cache manager 9 will determine that it has found the requested file and will return the address of the cache server 10 that contains that file.
  • the MD5 hash is an algorithm used to provide a degree of certainty that two files are the same, and is well known to the person skilled in the art.
  • the cache manager 9 will attempt to find an approximate match.
  • the algorithm for this is to, firstly, strip the path information from the filename, and attempt to match the filename exactly. If a match is found, then the cache manager 9 will then compare the size of the requested file with the size of the stored file and the encoding bit rate of the requested file with that of the stored file. If the encoding bit rates match, and the requested file size is no longer than that of the stored file (to prevent partially downloaded files being stuck in the cache), then a match will be recorded, and, again, the address of the cache server 10 containing the file will be returned.
  • the algorithm is readily extendable to include matching on any of the stored details of the file.
  • the cache manager 9 selects a cache server 10 that has available storage capacity and the address of this cache server 10 will be returned - as it will be to this cache 10 that the requested file will be downloaded to.
  • a record is then written into database 12 that contains details passed in the ASK request, including the address of the remote user 5 from which the file should be downloaded.
  • the proxy server 8 will rewrite the DOWNLOAD_ACK message with the address of the respective cache server 10 (rather than the address of the remote user 5, which will have been included in the DOWNLOAD_ACK message sent from the system server 3 to the proxy server 8). The proxy server 8 will then send the rewritten DOWNLOAD_ACK message to the user 4.
  • the user 4 Upon receipt of the rewritten DOWNLOAD_ACK message, the user 4 will then attempt to directly connect to the particular cache server 10 (rather than the remote client 5), identified by the address inserted in the rewritten DOWNLOAD_ACK message from the proxy server 8.
  • the cache server 10 will send the username of the remote user 5 and the full pathname for the requested file - as sent with the request from the user 4 to the cache server 10.
  • the cache manager 9 will compare this data with that stored in the database 12, and return the MD5 value. Using this MD5 hash value, the cache server 10 will search its own memory for a file identified by the returned MD5 hash value. If the memory contains this file, then this is copied from the cache server 10 to the user 4 in the usual way.
  • This second lookup is necessary so that the cache server can serve the correct file on request from the user - the protocol used in the Napster system referred to above specifies that download requests contain only the filename to be downloaded, while the cache server 10 only indexes on the MD5 sum.
  • the cache manager 9, via the data stored in the database 12 is the only location where such a correlation can be made.
  • the cache server 10 If the cache server 10 does not find a copy of this file in its database from the returned MD5 hash value, then it assumed that the file is not available locally and must be downloaded from the remote user 5. In this case, the cache server 10 retrieves the IP address of the remote user 5 that is sharing the file from the information stored when the proxy server 8 received the DOWNLOAD_ACK message. The cache server 10 then makes an outgoing connection to the remote user 5, and requests a copy of the file from the remote user 5 in a known manner. The downloaded file is copied to the user 4, and is also copied into the memory on the cache server 10 at the same time.
  • the DELETECACHED clause directs the cache manager 9 to remove the database row in the database 12 referencing the partially downloaded file. If an error does occur, the user 4 will be advised accordingly and may retry if desired.
  • the cache manager 9 and cache servers 10 operate a least recently used policy to removes files from the cache as needed, as is known in the art.
  • hashing methods other than the MD5 method may be used.
  • unique identifier encoding methods other than hashing methods may be used.

Abstract

system for caching data during peer-to-peer data transfer between computers in a network of computers, the system including a database (12) for dynamically storing information regarding the data available for transfer, proxy server (8) and cache manager (9), when combined, operable to transfer data selected from the available data to user (4) in response to a request for the selected data from the user (4); and cache server (10) for storing a copy of data already transferred by the system. The proxy server (8) and cache manager (9), in combination, are further operable to search the database (12) in response to a request for the data selected, to determine if the data selected is stored in the cache server (10), and to transfer the data selected from the cache server (10) to the user (4) if the data selected is stored therein, and, if the data selected is not available in the database (12), to trasnsfer the data selected from a remote user (5) in the network having the data selected stored therein, to the cache server (10)and to the user (4).

Description

"A system for caching data during peer-to-peer data transfer"
Field of the Invention
The present invention relates to a system for caching data during peer-to-peer data transfer in a server mediated peer-to-peer file-sharing system, particularly, although not exclusively, using the Internet, and for sharing audio files.
Throughout the specification, unless the context requires otherwise, the word "comprise" or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
Background Art
The following discussion of the background to the invention is intended to facilitate an understanding of the present invention. However, it should be appreciated that the discussion is not an acknowledgement or admission that any of the material referred to was published, known or part of the common general knowledge of the relevant person skilled in the art at the priority date of the application.
The Internet is a network of computers that communicate via communication links such as a telephone network, and allow the exchange of information between the computers. The World Wide Web is a part of the Internet, which allows networked computers to send graphical images and data (including audio and video information) between the computers.
Typically, computers - commonly referred to as servers - provide data (or "web pages") in the form of text and graphical images that can be downloaded to remote terminals such as a personal computer ("PC"), where a user can view the web pages. These pages can be used to display information on the display of the remote terminal, for example to give information about a particular service provided by the host of a web site, or products and services provided by the host. The pages may include video and audio information and allow viewers of the web page to view the information. These web pages are viewed at the remote terminal using a so-called web browser. Each web site (and individual web pages) is uniquely identifiable by means of a web address, or Uniform Resource Locater ("URL"). Hyper Text Mark Up Language ("HTML") is a text and graphics formatting software that is commonly used to produce and view web pages, and the protocol used by web browsers and servers transferring HTML based files is the Hypertext Transfer Protocol ("HTTP"). In data transfer, when one application is in communication with another on a host computer, that application is specified in each data transmission by using its so-called port number. The well-known port number for HTTP transmission is port 80, for example.
The Transmission Control Protocol - Internet Protocol ("TCP/IP") is the standard communications protocol for sending data over the Internet. As is well known in the art, protocols that work together as a group are commonly referred to as a protocol stack, with different layers. The HTTP protocol operates at a higher layer than the TCP/IP protocol.
Data is sent in packets (datagrams), which are routed via an Internet Protocol address ("IP address"). The required IP address is obtained by cross-referencing the specified URL.
When a user wishes to access the Internet they usually connect to an Internet Service Provider ("ISP"). The ISP receives a request from the user to access a specific web site and connects the user's terminal to the required server having the requested URL. The method of connection and subsequent communication is as has been described above.
However, to reduce the amount of information that is transmitted over the Internet, ISPs use a caching system where data that is frequently accessed is stored at the ISP. When that data is next requested by a user, it is provided from the local cache rather than connecting the user's terminal to the required server. Existing caching systems use the known port number of HTTP transfers to capture requests for data and to redirect those requests to a local cache, if appropriate.
The Internet can also be used to facilitate the sharing of files between users. One popular use is the sharing of audio files - typically those in the so-called MP3 format - using a server mediated peer-to-peer file sharing system. In this case, users access a Web site for details of other remote users connected to the web site who are willing to share files, the user then accesses another remote user's computer directly and downloads the required files from that remote users computer. This is illustrated in Figure 1 , and is described in more detail below.
In the known file-sharing system 1 , a user 4 accesses a file-sharing system meta-server 2 using the Internet and the services of his ISP. Typically, the user 4 will be accessing the meta-server 2 using a terminal such as a personal computer equipped with a modem and the usual user interfaces as is well known. The meta-server 2 will return the address of one or more file-sharing system servers 3 which the user 4 will use to access details of files available for download. Upon connection to the system server 3 - using for example, known login techniques - the user 4 will search or browse for files of interest and available for sharing by other remote users 5 of the system and currently connected to it. This is done by sending either a SEARCH_REQUEST or BROWSE_REQUEST to the system server 3. In a SEARCH_REQUEST the user 4 will specify certain characteristics of a file that is required, and the lists of all remote users 5 connected to the system servers 3 will be examined to see if a requested file, with these characteristics, is available to share. In a BROWSE_REQUEST the user will supply details of a particular known remote user, and all files that the remote user (if connected to the system server 3) is willing to share will be examined. The system server 3 then issues either a SEARCH_RESPONSE or BROWSE_RESPONSE back to the user 4 containing information on the files that are of interest to the user 4. Typically the results of the search or browse operation will be displayed on the user's remote terminal. If the user wishes to download one of the files, he issues a DOWNLOAD_REQUEST to the system server 3. The system server 3 then issues a DOWNLOAD_ACK message that contains the address of the remote user 5 who has the file available for download. The user 4 - without intervention from the system server 3 - makes a connection to the remote user 5 and the file is copied to the user 4.
MP3 audio files are often large in size, and, because this type of file sharing can be extremely popular, it can use a significant proportion of an ISP's bandwidth. Because this file transfer is peer-to-peer, it is not carried over any specific port - the port to be used for a particular file transfer is negotiated between the two communicating peers on a transfer by transfer basis. This makes it difficult to cache. Further, the data is transient - it is only accessible while the remote user 5 that is sharing the file is connected to the Internet, further increasing the difficulty of caching the data. Disclosure of the Invention
According to a first aspect of the present invention, there is provided a system for caching data during peer-to-peer data transfer between computers in a network of computers, the system including:
a first database for dynamically storing information regarding the data available for transfer from one or more computers in the network of computers;
control means operable to transfer data selected from the available data to a first computer in the network in response to a request for the selected data from the first computer; and
storage means for storing a copy of data already transferred by the system;
wherein the control means is further operable to search the first database in response to a request for the selected data, to determine if the data is stored in the storage means, and to transfer the data from the storage means to the first computer if the data is stored therein, and, if the data is not available in the first database, to transfer the selected data from another computer in the network having the selected data stored therein, to the storage means for storage therein, and to the first computer.
This has the advantage of allowing data that has already been transferred by the system to be easily downloaded to additional users without having to connect to the original storage medium. Thus, files shared through a peer-to-peer file transfer system can be easily cached.
Preferably, the control means comprises a first server and a second server, the first server being coupled to a remote server for receiving information regarding the available data selected therefrom and for storing the information in the first database, the second server including a second database having information to identify the data selected stored in the storage means and their storage locations therein and being coupled to the first server for communication therewith. More preferably, the second database identifies the data selected stored in the storage means by means of hashing methods or other unique identified encoding methods.
Still more preferably, the second database identifies the data selected stored in the storage means by means of an MD5 hash value.
Preferably, the second server may be operable, in response to a request from the first server, to search the second database for the identifying data to thereby determine if the data selected is stored in the storage means, and its storage therein.
Preferably, the second server may be operable, if the data selected is determined to be stored in the storage means, to send information on the storage location to the first server, the first server being further operable, in response to the received storage location information, to send this information to the first computer.
Preferably, the second server may be operable, if the selected data is determined not to be stored in the storage means, to send information on an available for storage location for the selected data to the first server, the first server being further operable, in response to the received storage location information, to send this information to the first computer.
According to a second aspect of the invention there is a method of caching data during peer-to-peer data transfer including:
selecting data to transfer;
searching a first database of dynamically stored information regarding data available for transfer from one or more computers in a network of computers to determine if the selected data is stored in a storage means;
if the search of the first database determines that the selected data is stored in the storage means, transferring the selected data from the storage means to a first computer in the network of computers; and
if the search of the first database determines that the selected data is not stored in the storage means, transferring the selected data from another computer in the network of computers having the selected data stored therein to the storage means and to the first computer.
Preferably, the method includes the steps of
receiving information regarding the data available for transfer from a remote server; and
selecting data to transfer from the received information.
According to a third aspect of the invention there is provided data transferred between computers in a network of computers, the data having been transferred by a control means in response to a request from a first computer in the network wherein the data has been selected from a set of dynamic data available for transfer stored in a first database and the control means has searched a storage means that stores a copy of data previously transferred by the control means to determine if the data is stored in the storage means and, if the control means determines that the data is stored in the storage means, to initiate the transfer of the data to the first computer from the storage means and, if the control means determines that the data is not stored in the storage means, to initiate the transfer of the data to the first computer and the storage means from a second computer in the network having the selected data stored therein.
Preferably, the control means determines whether the data is stored in the storage means by searching a second database containing information that identifies the data stored in the first database and their storage locations.
Brief Description of the Drawings
One specific embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawings, of which:
Figure 1 is a schematic representation of a peer-to-peer file sharing system of the prior art; and Figure 2 is a schematic representation of a system for caching data during peer-to-peer data transfer of the present invention.
Best Mode(s) for Carrying Out the Invention
Figure 2 illustrates a system for caching data during peer-to-peer data transfer of the present invention. Those features that are known from the prior art, and discussed above in the preamble, will use the same reference numerals for clarity.
In the embodiment, a user 4 is connected to the Internet via his ISP, in a known manner, and can access web sites by typing in the desired URL in his web browser.
The ISP is provided with an ISP server 6 which acts as a gateway to provide connection to the Internet for the user 4 (as is well known in the art). The ISP is also provided with a cache system, which communicates with the ISP server 6. The cache system comprises a proxy server 8, a cache director 7, a cache manager 9 and one or more cache servers 10 - whose functions will be described in more detail below.
When the user 4 wishes to download files from a remote user 5, he starts a file-sharing application that requests connection to a file sharing service using the TCP/IP protocol.
In response to the connection request from the user 4, the ISP server 6 will redirect the user 4 to a cache director 7 for connection thereto. In response to this connection, the cache director 7 returns the address of the proxy server 8 via ISP server 6. The ISP server 6 is able to capture the initial connection request because it uses the TCP/IP protocol with a known port number.
It should be appreciated that in other embodiments, the cache director 7 may be omitted. It is used in this embodiment to emulate the meta-server of existing peer-to-peer file sharing systems such as the Napster system operated by Napster Inc. of Redwood City, California.
The user 4 then connects to the proxy server 8 via ISP server 6. Upon receipt of a connection attempt from the user 4, the proxy server 8 connects to the meta-server 2 by a direct TCP/IP connection and retrieves an IP address of a system server 3. Meta-server 2 contains a large collection of valid system server addresses - the proxy server 8, retrieves one of these by querying meta-server 2 The proxy server 8 then makes a direct connection to the system server 3 using the address retrieved from its query of meta- server 2.
The proxy server 8 acts as an intermediary between the user 4 and the system server 3, and is operable to forward all messages it receives from the user 4 to the system server 3, and vice versa. These messages are forwarded verbatim until the system server 3 sends either a BROWSE_RESPONSE or SEARCH_RESPONSE message, in reply to a BROWSE_REQUEST or SEARCH_REQUEST message from the user 4 to the system server 3 to browse or search the files available for download.
When a BROWSE_RESPONSE or SEARCH_RESPONSE message is received by the proxy server 8, the proxy server 8 is operable to store details of the files available for sharing in its memory. These details include the filename, the MD5 hash of the first 300kB, the file size, encoding bitrate, length (time), the nickname (or nick) of the user sharing the file and the IP address of the user sharing the file. Some or all of these details will also be displayed at the user's remote terminal in the usual manner.
If there is a file that the user 4 wishes to download, then the user 4 issues a DOWNLOAD_REQUEST message that is sent - via proxy server 8 - to system server 3. The system server 3 then responds with a DOWNLOAD_ACK message, which is sent to the proxy server 8.
The proxy server 8, in response to the received DOWNLOAD_ACK message, connects to cache manager 9, which - in response - will return the address of one of the cache servers 10 to the proxy server 8.
The cache manager 9 includes a database 12. In the present embodiment, the database is an SQL database, but could be any mechanism for persistent storage. The cache manager 9 will accept one of four request types - namely, ASK=PROXY; ASK=CACHE, ASK=UPDATECACHED, AND ACTION=DELETECACHED. These will be discussed in more detail below. The address returned by the cache manager 9 in response to a received message from the proxy server 8 will depend upon whether the file requested by the user 4 is already stored (or cached) locally in one of the cache servers 10, or whether it must be downloaded from a remote user 5 who has the file available for sharing.
The proxy server 8 will send an ASK=PROXY request to the cache manager 9 to request a file with a specific filename, from a specific remote user, with a specific MD5 hash value, in response to a received DOWNLOAD_ACK message from the system server 3. The cache manager 9 will search its database 12, firstly to determine if a file exists with the same MD5 hash value. If this is the case, the cache manager 9 will determine that it has found the requested file and will return the address of the cache server 10 that contains that file. The MD5 hash is an algorithm used to provide a degree of certainty that two files are the same, and is well known to the person skilled in the art.
If the cache manager 9 cannot detect a match for the MD5 hash value, then the cache manager 9 will attempt to find an approximate match. The algorithm for this is to, firstly, strip the path information from the filename, and attempt to match the filename exactly. If a match is found, then the cache manager 9 will then compare the size of the requested file with the size of the stored file and the encoding bit rate of the requested file with that of the stored file. If the encoding bit rates match, and the requested file size is no longer than that of the stored file (to prevent partially downloaded files being stuck in the cache), then a match will be recorded, and, again, the address of the cache server 10 containing the file will be returned. The algorithm is readily extendable to include matching on any of the stored details of the file.
Should there be no exact or approximate match, then an assumption is made that the file is not stored in one of the cache servers 10 i.e. that it is not stored locally. In this case, the cache manager 9 selects a cache server 10 that has available storage capacity and the address of this cache server 10 will be returned - as it will be to this cache 10 that the requested file will be downloaded to. A record is then written into database 12 that contains details passed in the ASK request, including the address of the remote user 5 from which the file should be downloaded. ln response to the received cache server 10 address, the proxy server 8 will rewrite the DOWNLOAD_ACK message with the address of the respective cache server 10 (rather than the address of the remote user 5, which will have been included in the DOWNLOAD_ACK message sent from the system server 3 to the proxy server 8). The proxy server 8 will then send the rewritten DOWNLOAD_ACK message to the user 4.
Upon receipt of the rewritten DOWNLOAD_ACK message, the user 4 will then attempt to directly connect to the particular cache server 10 (rather than the remote client 5), identified by the address inserted in the rewritten DOWNLOAD_ACK message from the proxy server 8.
In response to a connection by the user 4, the cache server 10 issues a ASK=CACHE request to the cache manager 9 denoting a request to the cache manager 9 for the MD5 hash value for the requested file. The cache server 10 will send the username of the remote user 5 and the full pathname for the requested file - as sent with the request from the user 4 to the cache server 10. In response to this request, the cache manager 9 will compare this data with that stored in the database 12, and return the MD5 value. Using this MD5 hash value, the cache server 10 will search its own memory for a file identified by the returned MD5 hash value. If the memory contains this file, then this is copied from the cache server 10 to the user 4 in the usual way. This second lookup is necessary so that the cache server can serve the correct file on request from the user - the protocol used in the Napster system referred to above specifies that download requests contain only the filename to be downloaded, while the cache server 10 only indexes on the MD5 sum. The cache manager 9, via the data stored in the database 12 is the only location where such a correlation can be made.
If the cache server 10 does not find a copy of this file in its database from the returned MD5 hash value, then it assumed that the file is not available locally and must be downloaded from the remote user 5. In this case, the cache server 10 retrieves the IP address of the remote user 5 that is sharing the file from the information stored when the proxy server 8 received the DOWNLOAD_ACK message. The cache server 10 then makes an outgoing connection to the remote user 5, and requests a copy of the file from the remote user 5 in a known manner. The downloaded file is copied to the user 4, and is also copied into the memory on the cache server 10 at the same time. When the file has been copied to the memory, the cache server 10 will send an ACTION=UPDATECACHED message to the cache manager 9 informing it of the file transfer, and including details of the newly stored file, so that it may be copied directly from the cache manager 10, should another request be made for that particular file. This flags that the file has been downloaded completely and therefore should not be cleaned up from the cache if a script were to be run cleaning the cache information. Its explicit information is to update the database state column in the database 12, for a particular file from 'downloading" to "cached".
If an error arose during download, then the cache manager 9 will also be informed by an ACTION=DELETECACHED request from the cache server 10 to the cache manager 9. In this situation, it is undesirable for further requests to be directed to the partially downloaded file. The DELETECACHED clause directs the cache manager 9 to remove the database row in the database 12 referencing the partially downloaded file. If an error does occur, the user 4 will be advised accordingly and may retry if desired.
All the messages and data transfer discussed above are carried out in accordance with the TCP/IP protocol.
The cache manager 9 and cache servers 10 operate a least recently used policy to removes files from the cache as needed, as is known in the art.
It should be appreciated by the person skilled in the art that the scope of this invention is not limited to the particular embodiment described above. In particular hashing methods other than the MD5 method may be used. Further, and in the alternative, unique identifier encoding methods other than hashing methods may be used.

Claims

Claims
1. A system for caching data during peer-to-peer data transfer between computers in a network of computers, the system including:
a first database for dynamically storing information regarding the data available for transfer from one or more computers in the network of computers;
control means operable to transfer data selected from the available data to a first computer in the network in response to a request for the selected data from the first computer; and
storage means for storing a copy of data already transferred by the system;
wherein the control means is further operable to search the first database in response to a request for the data selected, to determine if the data selected is stored in the storage means, and to transfer the data selected from the storage means to the first computer if the data selected is stored therein, and, if the data selected is not available in the first database, to transfer the data selected from another computer in the network having the data selected stored therein, to the storage means therein, and to the first computer.
2. A system for caching data during peer-to-peer data transfer as claimed in claim 1 , wherein the control means comprises a first server and a second server, the first server being coupled to a remote server for receiving information regarding the available data selected therefrom and for storing the information in the first database, the second server including a second database having information to identify the data selected stored in the storage means and their storage locations therein and being coupled to the first server for communication therewith.
3. A system for caching data during peer-to-peer data transfer according to claim 2 wherein the second database identifies the data selected stored in the storage means by means of hashing methods or other unique identified encoding methods.
4. A system for caching data during peer-to-peer data transfer according to claim 3 wherein the second database identifies the data selected stored in the storage means by means of an MD5 hash value.
5. A system for caching data during peer-to-peer data transfer according to any one of claims 2 to 4, wherein the second server may be operable, in response to a request from the first server, to search the second database for the identifying data to thereby determine if the data selected is stored in the storage means, and its storage therein.
6. A system for caching data during peer-to-peer data transfer according to any one of claims 2 to 5, wherein the second server may be operable, if the data selected is determined to be stored in the storage means, to send information on the storage location to the first server, the first server being further operable, in response to the received storage location information, to send this information to the first computer.
7. A system for caching data during peer-to-peer data transfer according to any one of claims 2 to 6, wherein the second server may be operable, if the selected data is determined not to be stored in the storage means, to send information on an available for storage location for the selected data to the first server, the first server being further operable, in response to the received storage location information, to send this information to the first computer.
8. A method of caching data during peer-to-peer data transfer including:
selecting data to transfer;
searching a first database of dynamically stored information regarding data available for transfer from one or more computers in a network of computers to determine if the selected data is stored in a storage means;
if the search of the first database determines that the selected data is stored in the storage means, transferring the selected data from the storage means to a first computer in the network of computers; and if the search of the first database determines that the selected data is not stored in the storage means, transferring the selected data from another computer in the network of computers having the selected data stored therein to the storage means and to the first computer.
9. A method of caching data during peer-to-peer data transfer according to claim 8, further including the steps of
receiving information regarding the data available for transfer from a remote server; and
selecting data to transfer from the received information.
10. Data transferred between computers in a network of computers, the data having been transferred by a control means in response to a request from a first computer in the network wherein the data has been selected from a set of dynamic data available for transfer stored in a first database and the control means has searched a storage means that stores a copy of data previously transferred by the control means to determine if the data is stored in the storage means and, if the control means determines that the data is stored in the storage means, to initiate the transfer of the data to the first computer from the storage means and, if the control means determines that the data is not stored in the storage means, to initiate the transfer of the data to the first computer and the storage means from a second computer in the network having the selected data stored therein.
11. Data according to claim 10 wherein the control means determines whether the data is stored in the storage means by searching a second database containing information that identifies the data stored in the first database and their storage locations.
12. A system for caching data during peer-to-peer data transfer substantially as described herein with reference to Figure 2.
3. A method of caching data during peer-to-peer data transfer substantially as described herein with reference to Figure 2.
PCT/AU2002/000518 2001-04-26 2002-04-26 A system for caching data during peer-to-peer data transfer WO2002089000A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPR4589A AUPR458901A0 (en) 2001-04-26 2001-04-26 Cache for a peer-to-peer data transfer
AUPR4589 2001-04-26

Publications (1)

Publication Number Publication Date
WO2002089000A1 true WO2002089000A1 (en) 2002-11-07

Family

ID=3828583

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2002/000518 WO2002089000A1 (en) 2001-04-26 2002-04-26 A system for caching data during peer-to-peer data transfer

Country Status (2)

Country Link
AU (1) AUPR458901A0 (en)
WO (1) WO2002089000A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2395086A (en) * 2002-10-30 2004-05-12 Hewlett Packard Co Resources caching in distributed peer-to-peer networks
WO2004077313A1 (en) * 2003-02-27 2004-09-10 Techsell Interaktiv Ab A method and apparatus for advertising objects
WO2005027457A1 (en) * 2003-09-12 2005-03-24 Telefonaktiebolaget Lm Ericsson (Publ) Data sharing in a multimedia communication system
GB2440762A (en) * 2006-08-11 2008-02-13 Cachelogic Ltd Cache tracker and peer tracker for peer to peer network
WO2009092240A1 (en) * 2007-12-29 2009-07-30 Shenzhen Huawei Communication Technologies Co., Ltd. A communication device and application method, system thereof
WO2009097002A1 (en) * 2008-01-31 2009-08-06 Sony Ericsson Mobile Communications Ab Improved data sharing
US7995473B2 (en) 2006-08-11 2011-08-09 Velocix Ltd. Content delivery system for digital object
US8010748B2 (en) 2006-08-11 2011-08-30 Velocix Ltd. Cache structure for peer-to-peer distribution of digital objects
US8244867B2 (en) 2006-08-11 2012-08-14 Velocix Limited System and method for the location of caches
US8880698B2 (en) 2004-10-18 2014-11-04 Sony United Kingdom Limited Storage of content data in a peer-to-peer network
US9241032B2 (en) 2006-08-11 2016-01-19 Alcatel Lucent Storage performance

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6092080A (en) * 1996-07-08 2000-07-18 Survivors Of The Shoah Visual History Foundation Digital library system
WO2000042519A1 (en) * 1999-01-11 2000-07-20 Edgix Corporation Internet content delivery acceleration system
US6167438A (en) * 1997-05-22 2000-12-26 Trustees Of Boston University Method and system for distributed caching, prefetching and replication

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6092080A (en) * 1996-07-08 2000-07-18 Survivors Of The Shoah Visual History Foundation Digital library system
US6167438A (en) * 1997-05-22 2000-12-26 Trustees Of Boston University Method and system for distributed caching, prefetching and replication
WO2000042519A1 (en) * 1999-01-11 2000-07-20 Edgix Corporation Internet content delivery acceleration system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2395086A (en) * 2002-10-30 2004-05-12 Hewlett Packard Co Resources caching in distributed peer-to-peer networks
WO2004077313A1 (en) * 2003-02-27 2004-09-10 Techsell Interaktiv Ab A method and apparatus for advertising objects
WO2005027457A1 (en) * 2003-09-12 2005-03-24 Telefonaktiebolaget Lm Ericsson (Publ) Data sharing in a multimedia communication system
JP2007534202A (en) * 2003-09-12 2007-11-22 テレフオンアクチーボラゲット エル エム エリクソン(パブル) Data sharing in multimedia communication systems
US8880698B2 (en) 2004-10-18 2014-11-04 Sony United Kingdom Limited Storage of content data in a peer-to-peer network
GB2440762B (en) * 2006-08-11 2011-11-02 Cachelogic Ltd Content distribution network
US7995473B2 (en) 2006-08-11 2011-08-09 Velocix Ltd. Content delivery system for digital object
US8010748B2 (en) 2006-08-11 2011-08-30 Velocix Ltd. Cache structure for peer-to-peer distribution of digital objects
US8200906B2 (en) 2006-08-11 2012-06-12 Velocix Limited Cache structure for peer-to-peer distribution of digital objects
US8244867B2 (en) 2006-08-11 2012-08-14 Velocix Limited System and method for the location of caches
GB2440762A (en) * 2006-08-11 2008-02-13 Cachelogic Ltd Cache tracker and peer tracker for peer to peer network
US9241032B2 (en) 2006-08-11 2016-01-19 Alcatel Lucent Storage performance
WO2009092240A1 (en) * 2007-12-29 2009-07-30 Shenzhen Huawei Communication Technologies Co., Ltd. A communication device and application method, system thereof
WO2009097002A1 (en) * 2008-01-31 2009-08-06 Sony Ericsson Mobile Communications Ab Improved data sharing

Also Published As

Publication number Publication date
AUPR458901A0 (en) 2001-05-24

Similar Documents

Publication Publication Date Title
US11194719B2 (en) Cache optimization
US6324582B1 (en) Enhanced network communication
EP2091202B1 (en) Data distributing method, data distributing system and correlative devices in edge network
US7542999B2 (en) Extended file system
US8024484B2 (en) Caching signatures
US7363291B1 (en) Methods and apparatus for increasing efficiency of electronic document delivery to users
US6813690B1 (en) Caching media data using content-sensitive identifiers
US8280985B2 (en) Serving content from an off-line peer server in a photosharing peer-to-peer network in response to a guest request
EP1008057B1 (en) Performance optimizations for computer networks using http
US6823362B2 (en) Effectively and efficiently updating content files among duplicate content servers
US20060168645A1 (en) Apparatus and method for a personal cookie repository service for cookie management among multiple devices
US7725598B2 (en) Network cache-based content routing
US20100064047A1 (en) Internet lookup engine
JP2004511116A (en) System for network addressing
WO2001080014A2 (en) System and method for on-network storage services
US6324584B1 (en) Method for intelligent internet router and system
JP2008511078A (en) Proxy caching in photo-sharing peer-to-peer networks to improve guest image browsing performance
WO2002076003A2 (en) System and method for peer-to-peer file exchange mechanism from multiple sources
US6408296B1 (en) Computer implemented method and apparatus for enhancing access to a file
JP3984086B2 (en) Cache server, data transfer device, and program
WO2002089000A1 (en) A system for caching data during peer-to-peer data transfer
EP1181652A2 (en) Extended file system
US20040073604A1 (en) Cache control method of proxy server with white list
GB2412464A (en) Methods and system for using caches
JP3943868B2 (en) Server-side proxy, data transfer method and program

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP