US5854754A - Scheduling computerized backup services - Google Patents

Scheduling computerized backup services Download PDF

Info

Publication number
US5854754A
US5854754A US08/598,488 US59848896A US5854754A US 5854754 A US5854754 A US 5854754A US 59848896 A US59848896 A US 59848896A US 5854754 A US5854754 A US 5854754A
Authority
US
United States
Prior art keywords
clients
subsets
utilization
service
networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US08/598,488
Inventor
Luis Felipe Cabrera
Claudia Beinglas Dragoescu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Trend Micro Inc
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US08/598,488 priority Critical patent/US5854754A/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DRAGOESCU, CLAUDIA BEINGLAS, CABRERA, LUIS FELIPE
Application granted granted Critical
Publication of US5854754A publication Critical patent/US5854754A/en
Assigned to TREND MICRO INCORPORATED reassignment TREND MICRO INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs

Definitions

  • the invention described herein relates to the scheduling of software services in a computing system. More particularly, the invention relates to scheduling software services such as backup in a networked client/server system.
  • a computer installation may consist of a vast array of computers and related devices interconnected by a plurality of networks.
  • computers There are many types of computers and at least several types of networks which are commonly interconnected. Scheduling services for a significant number of devices in an installation can be a very difficult problem which cannot practically be solved by a trial and error approach because of the huge number of possible combinations presented in typical installations. As will be illustrated in detail below, it is realistic to expect that the total number of possible service scenarios for a multi-network system will be in the tens of millions.
  • There are many types of service for a computer installation which might require scheduling; some examples are compilations of very large programs, regeneration of databases, consolidation of distributed data and backup of data. The following discussion will focus on backup of data, but could be applicable to other services as well. Since large amounts of critical data are stored in computer systems, there is a continuing need to regularly backup data by making redundant copies of the data hard drives, on diskettes, on optical drives, on magnetic tape, etc.
  • the invention is an apparatus and method for scheduling a service (such as backup) in a computer installation which has clients of more than one type, a plurality of interconnected networks (potentially of different types) and at least one server.
  • the schedule is prepared using constraints of elapsed time and/or resource utilization if such a schedule exists given the constraints.
  • the apparatus form of the invention has two major components: a Modeler and a Scheduler.
  • the Modeler calculates utilization of computer installation resources and elapsed time for the service for any subset of clients and/or networks using definitions of client types, network types and interconnection of clients and networks in the computer installation.
  • the Modeler estimates the time required as well as the percentage of resources needed to provide the service concurrently for a subset of clients by modeling the nodes of the installation which are involved in providing the service to those clients.
  • the Scheduler invokes the Modeler with subsets of clients to find utilizations and elapsed times and adjusts the subsets to generate a schedule which is a list of subsets of clients which can be serviced sequentially without exceeding a utilization criterion and/or an elapsed time criterion.
  • the Scheduler uses a heuristic involving client and network types to rapidly converge on the list of subsets of clients for the schedule to greatly improve over a trial and error approach.
  • FIG. 1 is a block diagram of a multi-network computer system on which the invention might be applied.
  • FIGS. 2a and 2b combined are a flowchart of the method for modeling the multi-network computer system according to the invention.
  • FIG. 3 is a flowchart of the method for scheduling the backup process for the multi-network computer system according to the invention.
  • FIG. 4 is a flowchart of the "calcNetGroup” function used in the described detailed embodiment of the invention.
  • FIG. 5 is a flowchart of the "findHowManyClientsAreUsed" function used in the described detailed embodiment of the invention.
  • FIG. 6 is a block diagram of a CAP system.
  • the invention is a means and method to specify how to schedule software services in a client/server environment.
  • the invention allows the determination of a schedule, from a very large number of possible schedules, for the backup process so that the process can complete in a specified amount of time and use only a specified amount of resources.
  • the backup server 17 has access to one or more I/O devices 18 which might be tape units, optical drives or any high capacity storage system on which the backup data can be stored (and retrieved if necessary).
  • the backup server is typically a computer which has a CPU and an I/O channel to which the I/O devices are attached.
  • the backup server is shown connected to two sets of networks 11, 19 which are of different types (i.e. "T" and "E").
  • the network sets will be called “inter-networks".
  • the inter-networks can consist of any number (1..n) of interconnected networks.
  • the networks are typically connected to the backup server by devices called gateways. Although an installation might have many gateways, only one is shown 20. Any number of inter-networks can be connected to the backup server. It is also possible for a single network to be connected to the backup server. An installation will typically have several different types of devices which should have their stored data backed up. The devices could include mainframe computers, minicomputers, servers, personal computers, etc.; and each of these could be further divided into differing brands, models and so on. In FIG. 1, there are only two categories of devices which are labeled type-A 13 and type-B 14. In this example, type-A devices and type-B devices are also connected to either type-T or type-E networks.
  • the client-server terminology labels the devices which use the backup server as clients of the backup server. At least some of these devices will typically be servers which have their own clients. In the following, the term "server” will be used to mean “backup server” unless otherwise indicated.
  • CAP Capacity Planner
  • FIG. 6 is a block diagram of the embodiment of the invention which will be described below.
  • the Scheduler 61 coordinates the overall activity of the system by accepting the parameters and definitions from the user, invoking the Modeler 63 with various defined groupings, and generating the actual schedules.
  • the Modeler accepts as its parameters the particular groupings currently under study by the Scheduler and calculates the throughput and utilizations for the groupings which are then returned to the Scheduler.
  • the Scheduler uses the heuristics, which will be defined below, to create certain groupings until one meets the parameters of throughput, utilization, and elapsed time which have been set by the user. (This assumes that an acceptable schedule meeting the criteria exists.)
  • the output from the Scheduler can be simply a soft or hard copy showing the schedule which was selected along with the corresponding throughput, utilization, and elapsed time. It would also be relatively simple to have the output from the Scheduler be in such a form that it could be directly executed by the backup system, thus automating the process.
  • the CAP algorithm requires that the various aspects of the computer installation be defined by the user. These include the number and types of networks, number and types of clients on each, etc.
  • the CAP algorithm also requires as input values of system parameters to estimate performance indices such as the duration of individual backup sessions, the utilization of the main processor of a client machine, the utilization of a network connecting client machines to a server machine, and the utilization of the main processor of the server machine. Even though there may be multiple networks in the system, a single network utilization value is used as a reasonable simplification of the procedure. Reasonable default values for some or all of these parameters could be used by the algorithm in a practical application.
  • CAP calculates, in an iterative manner, possible backup schedules subject to the resource utilization constraints imposed by the user. CAP calculates first which networks can be backed up at the same time, such that the relative resource utilization of the networks and the CPU utilization of the server are within the bounds set by the user. CAP then calculates which clients on these networks can be backed up at the same time, such that the absolute value of the server utilization is at or below the desired value.
  • CAP also calculates how many identical groups there are, and how long it takes to backup each group, and all groups, one after another.
  • the criterion used to limit the number of clients that can be backed up at the same time is that the network and the server utilizations have to be at or below the value specified by the user. Utilization is defined to be the fraction of the total time that a certain device is busy while performing the backup services.
  • the measure of throughput which is the data rate in megabytes/second observed at the network, is used to tie together the resource utilization of the server and that of the network. This is done by calculating the server utilization when it receives data at a given rate.
  • CAP uses the fact that system throughput, equal to the ratio of the target utilization and total service demand, should be the same at the network and at the server.
  • Service demand refers to the actual time that a service center is busy.
  • CAP divides the installation into groups.
  • a group consists of networks, clients, and of course the server, that can backup at the same time while keeping the networks and server utilizations lower than the specified limits.
  • a group can contain multiple networks of multiple types, and each network can have attached multiple clients of multiple types. In some cases, the whole installation is a group.
  • CAP determines a certain division of the installation into groups. The number of possible groupings can be very large, as will be shown later. The algorithm does not attempt to try all groups, but rather is a heuristic which does not guarantee that the result is the optimal one.
  • CAP In a proposed installation configuration, CAP tries to determine first which networks (with all of their clients) satisfy the requirement that the ratio of the service demand at the networks and at the server should be approximately the same as the desired ratio of the network and server utilizations. To determine which networks can be backed up at the same time, CAP calculates the estimate of the total time that each network and the server are busy doing backup, then it sorts the network times in decreasing order. CAP keeps adding networks to a group while the ratio of the total time spent on all networks in the group to the total time spent at the server by the clients attached to these networks is smaller than the ratio of the desired utilizations at the networks and server. Then the next group is created. CAP keeps iterating down the list of networks until all are assigned to groups.
  • each group is further examined to ensure that the utilization at the server and networks is not too high in absolute value. This is done by calculating the performance estimates for each group with each client type being examined separately. (Alternatively the client types could be examined in sets rather than one at a time.) If they are higher than the user specified threshold, the group is further subdivided by allowing only part of the clients in the group to backup at the same time.
  • the performance of the system with only the clients of one type backing up is calculated for one and all clients active concurrently.
  • the number of clients is varied by binary search until all the device utilizations are around the target values. If, on the contrary, the device utilization with all clients of a type active concurrently is lower than the user-specified threshold, clients of the next type are added to the group. This procedure is repeated until the utilizations are around their target values and all clients of all types are assigned to subgroups.
  • the CAP parameters of interest for the model are:
  • backup characteristics such as compressed or not compressed and selective or incremental. If compressed, the compression factor is also a parameter. If incremental, the fraction of the files changed is also a parameter.
  • the gateways connected to the server, networks connected to a gateway or to the server, and the clients connected to a network can be of multiple types; and each type can have multiple machines.
  • the CAP parameters of interest for the scheduling algorithm are:
  • the system being modeled is composed of service centers (where requests queue for service), which are the networks, gateways, server CPU and server I/O devices, and of "waiting elements" (the clients or jobs in the system), which generate units of work.
  • the service centers are characterized by service demand and the length of the queue of requests. Queues are commonly used in modeling to simulate processes where waiting time is involved.
  • the installation can be pictured as a tree structure with each leaf generating a workload which is processed by the nodes (service centers). The processing by the nodes requires a finite amount of time. Graphically inputting a tree structure is one way of defining the structure to the CAP system.
  • the root of the tree is the backup server.
  • the workload generated during the backup process by each client will be processed by a certain number of service centers.
  • backing up data on a personal computer might require work by the personal computer, one or more networks/gateways through which the data must pass, the server CPU, and the server's I/O devices.
  • the model maps these nodes into its service centers which give a measure of the times required to process the data from its home on a client to a backup I/O device.
  • class will be used to refer to a client type and a workload type.
  • An example of a class would be all clients that are IBM PS/2 personal computers which have approximately 300 MB to back up, consisting of approximately 1000 files, and which are connected to a TokenRing network.
  • job refers to all the work that the system needs to perform. In the backup system, this translates to all the files to be backed up by a set of clients.
  • a job is a unit of work executed one at a time by each client. For example, units might be pages of data to be transmitted (for example, 4KB of data) or transactions to be finished.
  • the sizes of the complete client workload are used to keep the floating point arithmetic operations to a minimum. The final result depends only on the total size of the client workload as the individual job sizes cancel out in the calculations.
  • the backup system which will be modeled in the description of the embodiment described below, is a "closed type" system, which means that the number of jobs is kept constant and equal to the multiprogramming level. (Other types of BUS's can be used with the invention by altering the model.)
  • the multiprogramming level N is the number of clients in the system. In reality, even though one transaction is processed at a time, both the network and server operate simultaneously, processing small pieces of files. Therefore, the effective number of clients of each class is double the number computed and the service demand at each service center is smaller by the same factor.
  • FIGS. 2a and 2b comprise a high-level flowchart of the modeling process, which will be described below.
  • the service demand is calculated for each class at each service center. This is done by adding up the times to execute BUS operations like processing a file, processing metadata (the system data about files), processing the reading of files, sending files over the network, and compressing files. These processing times are determined in terms of the number of instructions necessary to execute an operation on the client or the server, the speeds of the I/O devices, the speeds of the client and server CPU, the number of files processed, the total size of the files processed, and the network and gateway speeds.
  • the client service demand is the sum of the time to read a file from disk, to process the file data, to process the file metadata, and to process all the BUS transactions generated during the request.
  • the read time can be computed as the total file size, divided by the effective I/O speed of the disk subsystem.
  • the other times are computed using characteristic values of instruction counts or operation counts, obtained from laboratory measurements, times the file size for data, or times the number of files for metadata operations, divided by the CPU speed as measured in MIPS or SPECints.
  • the server time is calculated in a similar manner.
  • the network time is also calculated similarly, as the sum of times to transfer data and the time to transfer metadata (which is much smaller in size usually). These times are calculated as the total size transmitted, divided by the effective network speed (taking into account bandwidth availability).
  • each class i we calculate the time Q ij spent at each service center j 23. This time is calculated as the service demand D ij times the number of jobs in the queue (q j +1-q ij /N). The total queue length at the service center is q j , and the queue length of the class i is q ij . The correction factor q ij /N accounts for the fact that the client does not see itself waiting in the queue, and the 1 accounts for the client itself. Then, we calculate the response time R i , as the sum of all times Q ij spent at all service centers, plus the "waiting time" Z i at the client 24.
  • Total throughput is the sum of all the class throughputs X i , calculated above in the loop, expressed on a KB/second basis, i.e. multiplied by the job size S i 27.
  • the average service time is the total size of all clients (the sum of all S i ), divided by the total throughput X 27.
  • the utilization of each device is the sum of the utilizations of each class i at the device j 28. This utilization is calculated as the product of the service demand of the class at the device, D ij , and the class throughput X i , divided by the number of identical devices N i 28.
  • Nmax the maximum number of clients of class i that can backup at the same time without saturating the server 31. This number is calculated based on the knee point of the curve of throughput versus the number of clients in the system.
  • the knee point is the number of clients where the throughput of the system is about 70% to 80% of the maximum possible, and its value is calculated as the maximum throughput of the system (achieved when all clients are participating in the backup), and the minimum throughput, achieved when only one client is participating in the backup.
  • Nmax twice this value. This value is calculated separately for each class, and we loop over all available classes to calculate it.
  • Nmax for each class i is found as two times the ratio of the optimum throughput for the configuration to the throughput for one member of the class.
  • the optimum throughput for the configuration is obtained by running the model with all clients backing up.
  • the throughput for the class is obtained by running the model with only one member of the class.
  • the "calcNetGroups" logic is depicted in FIG. 4.
  • the networks are ordered in decreasing order of the service demand spent on them by all classes. They are added to the current group until the total network service time (the sum of the service times of all networks divided by the network target utilization) is larger than the server service demand, divided by the server target utilization. At that point, the group is closed, and another group is started, until all the networks are assigned to a group.
  • CAP first calculates the service demand at the network plus at the gateway (when it exists) of each class, then calculates the total network service demand by adding all the contributions from the different classes 41. Then it goes through an iteration loop, repeated while there are still networks not accounted for.
  • the loop starts by checking to see if the group is complete 43 (the criterion). It does this by checking that the difference of the server and the network group times is positive, meaning that the time spent on the server is larger than the time spent on all networks, i.e., the server is slower than the group of networks. If yes, a branch is taken where a new group is created 44. (Initially the criterion is set to true to start a new group the first time the loop is executed 42.) If no, another branch is taken where another network is added to the current group 45. The first time in the loop, or if a group is complete, we take the branch where a new group is started. The network with the largest service demand is found, and assigned to the group.
  • CAP calculates how many networks of this type can backup concurrently with the server, by comparing the ratios of the service demand and target utilization, as described above. The minimum of these three numbers (maximum possible, currently available, and suggested by the user) is taken.
  • the number of suggested concurrence is a preset number, added to introduce the benefit of experimental experience with existing enterprises.
  • CAP After returning from the "calcNetGroups" function, CAP has the groups of networks that can be backed up together. Now, CAP determines which clients within these networks can backup together at the same time, i.e., it further fragments the schedules into smaller groups (FIG. 3, 33). In the routine "calcNetGroups", we only made sure that the ratio of the network and server utilizations have the desired values. Now CAP has to actually calculate the utilizations using its internal analytic model of BUS and make sure that they are below the desired values. To do this, CAP traverses the groups of networks and for each group calculates the groups of clients and the backup times.
  • CAP loops over all the groups of networks, and for each group calculates the numbers of clients still available. Then, it calculates the number of clients that keep the network and server below the desired utilization. This is done by function "findHowManyClientsAreUsed", described below.
  • CAP records the calculated utilizations of the devices for this run, calculates how many times one can repeat this run with identical groups, updates the numbers of remaining available clients of each type, and records the performance statistics for this run (throughput, response time, service center utilizations) 34.
  • CAP proceeds to the next clients used is equal to of clients used is equal to the number of clients used so far, we exit the loop 35. Else, we still have available clients; so we repeat the loop and we create a new client group 36, and repeat the above process. This is also the end of the loop where we traverse the network groups, i.e., when exiting this loop, we have finished a pass through the loop for each network group, and we proceed to the next network group; or if we traversed all of them, we exit 37.
  • FIG. 5 illustrates the logic flow for this function.
  • we have an iteration loop which traverses all networks in a group of networks looking for the first network with available clients 51. Then, it goes into an internal loop, over all the classes that belong to this network, finding clients to be assigned to this run 52.
  • the number of clients which can be run while meeting the utilization criteria are found by assuming a high and low value, running the model for each value, then adjusting the value using binary search techniques to converge on a number of clients which yield utilizations close to the target values 53. (Of course, for some target utilizations, there might not be a solution.)
  • the inner loop is repeated until all classes have been included 54.
  • the outer loop is repeated for all networks 55.
  • the time window to do the backup is smaller than the shortest backup time of any group. In this case the problem has no solution as stated.
  • the system administrator has to modify the base configuration and then retry using CAP. Possible changes that the administrator can use include decreasing the workload on each machine, decreasing the number of clients per network, upgrading the clients to be faster models, and upgrading the networks or the servers.
  • Appendix A includes C++ source code illustrating a possible embodiment of the invention's scheduling algorithm, including the functions described above.
  • the general problem being solved is how to schedule a relatively large number of centrally managed tasks using a finite amount of system resources to accomplish the tasks. Compiling a large number of source code files or updating databases might be examples where similar problems could arise.
  • the model could be changed to predict performance of the general software service.
  • the scheduling algorithm uses given constraints to arrive at an acceptable schedule out of a very large number of possibilities.
  • the parameter varied is the number of BUS clients of each type.
  • the parameter would change accordingly.
  • CAP the constraint is that the utilization has to be less than a preset value. CAP produces values of the utilization, which we compare against a preset maximum value. In the other cases, some other parameter would be produced and its value compared against the constraint.
  • the following sample calculation illustrates how much time the invention can save in creating a schedule.
  • Each scenario has a unique combination of clients and networks and requires running the model once.
  • a scenario can have any number of clients of any type present, so the total number of scenarios is the sum of all possible combinations of clients types and numbers (and the networks attached to them).
  • Any client type I that has N i clients, can be present in N i +1 possible ways: no clients of this type present, one client present, two clients present, up to all N i clients present. This is the number of cases for the clients attached to one network of type j, but there are N j networks of type j, and these scenarios apply for each network independently.
  • the total number of scenarios for the clients of type I attached to networks of type j is then (N i +1) multiplied by itself N j times, which means (N i +1) to the power N j .
  • the total number of scenarios for all client types attached to all network types is the product of all scenarios for all client types:
  • N i is the number of clients of type I attached to each network of type j
  • the algorithm tries to put together all clients of one type before continuing to the next type.
  • the number of clients is determined by binary search from one client to a maximum number previously calculated, which in most practical cases is less than 100. This is done once for each network type, because the number of networks present is known since it was calculated. Since a binary search divides the interval in half at each iteration, the number of iterations is, at most, logarithm in base 2 of 100, which is approximately seven. Therefore, in this case, the total number of scenarios run is less than:
  • Nct is the number of client types present.
  • the second case is the opposite, in which we have one client of each type.
  • the scenarios are recalculated while adding one client at a time until the networks or server utilizations become higher than the limit. Assuming there are on the average Nl clients in one configuration, this means that there are approximately Nl scenarios run to determine each configuration, and there are a total of Ncls/Nl configurations. The total number of scenarios is then:
  • Ncls is the total number of clients (and types) present.
  • the workstations are the clients.
  • Each network has two types of clients, with nine clients of each type.
  • the total number of network types in the enterprise is two, and the total number of networks is four.
  • the total number of scenarios that need to be run in a trial and error approach is:
  • the clients are actually file servers themselves, so there are only 20 clients in the whole enterprise, each of a different type, each attached to its own network.
  • FIG. 1 is a collection of four interconnected Token Ring (TR) networks and "Network 2” is a collection of four interconnected Ethernet (ET) networks. There are four TR networks and four ET networks.
  • TR Token Ring
  • ET Ethernet
  • Each TR network has two classes of clients attached: the first class, "Client 1" in FIG. 1, is 20 RS/6000 workstations (AIX-TR) per network, or a total of 80.
  • the second class, "Client 2" in FIG. 1 is 80 PS/2 personal computers (OS/2-TR) per network, or a total of 320.
  • Each ET network has also two classes of clients attached: the first class, "Client 3" in FIG. 1, is 10 RS/6000 workstations (AIX-ET) per network, or a total of 40.
  • the second class, "Client 4" in FIG. 1, is 40 PS/2 personal computers (OS/2-ET) per network, or a total of 160.
  • the characteristics of each class, including its workload (WL), are shown in Table 1.
  • the suggested utilization at the server and network is 80% and the time window available for backup is six hours.
  • the total workload size per enterprise is 248 GB.
  • the resulting number of networks per group is two, also shown in Table 3. Therefore, we have two identical groups, of two TR networks each.
  • the total backup time for the whole enterprise is 44.5 hours, with an average throughput of 1550 KB/sec.
  • the invention has been described by way of a preferred embodiment, but those skilled in the art will understand that various changes in form and detail may be made without deviating from the spirit or scope of the invention.
  • the invention may be implemented using any combination of computer programming software, firmware or hardware.
  • the computer programming code (whether software or firmware) according to the invention will typically be stored in one or more machine readable storage devices such as fixed (hard) drives, diskettes, optical disks, magnetic tape, semiconductor memories such as ROMs, PROMs, etc., thereby making an article of manufacture according to the invention.
  • the article of manufacture containing the computer programming code is used by either executing the code directly from the storage device, by copying the code from the storage device into another storage device such a hard disk, RAM, etc. or by transmitting the code on a network for remote execution.
  • the method form of the invention may be practiced by combining one or more machine readable storage device containing the code according to the invention with appropriate standard computer hardware to execute the code contained therein.
  • An apparatus for practicing the invention could be a one or more computers and storage systems containing or having network access to computer programming code according to the invention. ##SPC1##

Abstract

An apparatus and method is disclosed for scheduling a service (such as backup) in a complex computer installation given constraints of elapsed time and resource utilization. The apparatus form of the invention has two major components: a Modeler and a Scheduler. The Modeler calculates utilization of computer installation resources and elapsed time for the service for any subset of clients and/or networks using a model of the various client types, network types and their interconnections. The Scheduler invokes the Modeler with subsets of clients to find utilizations and elapsed times and adjusts the subsets to generate a schedule (if one exists) which is a list of subsets of clients which can be serviced concurrently without exceeding the utilization criterion or an elapsed time criterion. The Scheduler uses a heuristic involving client and network types to rapidly converge on the list of subsets of clients for the schedule to greatly improve over a trial and error approach.

Description

FIELD OF THE INVENTION
The invention described herein relates to the scheduling of software services in a computing system. More particularly, the invention relates to scheduling software services such as backup in a networked client/server system.
BACKGROUND OF THE INVENTION
A computer installation may consist of a vast array of computers and related devices interconnected by a plurality of networks. There are many types of computers and at least several types of networks which are commonly interconnected. Scheduling services for a significant number of devices in an installation can be a very difficult problem which cannot practically be solved by a trial and error approach because of the huge number of possible combinations presented in typical installations. As will be illustrated in detail below, it is realistic to expect that the total number of possible service scenarios for a multi-network system will be in the tens of millions. There are many types of service for a computer installation which might require scheduling; some examples are compilations of very large programs, regeneration of databases, consolidation of distributed data and backup of data. The following discussion will focus on backup of data, but could be applicable to other services as well. Since large amounts of critical data are stored in computer systems, there is a continuing need to regularly backup data by making redundant copies of the data hard drives, on diskettes, on optical drives, on magnetic tape, etc.
In a networked installation the client/server paradigm is often used to describe the roles of various components in the network. In a large installation, only a fraction of all clients can be backed up at the same time because the server and network cannot handle the load of all clients doing backup simultaneously. Traditionally, system administrators have manually tried different configuration alternatives to decide where to place the backup server machines in a proposed installation and which clients to connect to each backup server. This trial and error process is simplified if all the clients and all the workloads are of the same type, but becomes quite complex if clients are of different types and have different workloads.
Thus, there is a need for an invention which determines schedules for backup or other services in a complex installation without the need for a trial and error approach.
SUMMARY OF THE INVENTION
The invention is an apparatus and method for scheduling a service (such as backup) in a computer installation which has clients of more than one type, a plurality of interconnected networks (potentially of different types) and at least one server. The schedule is prepared using constraints of elapsed time and/or resource utilization if such a schedule exists given the constraints. The apparatus form of the invention has two major components: a Modeler and a Scheduler. The Modeler calculates utilization of computer installation resources and elapsed time for the service for any subset of clients and/or networks using definitions of client types, network types and interconnection of clients and networks in the computer installation. Using this configuration data, the Modeler estimates the time required as well as the percentage of resources needed to provide the service concurrently for a subset of clients by modeling the nodes of the installation which are involved in providing the service to those clients. The Scheduler invokes the Modeler with subsets of clients to find utilizations and elapsed times and adjusts the subsets to generate a schedule which is a list of subsets of clients which can be serviced sequentially without exceeding a utilization criterion and/or an elapsed time criterion. The Scheduler uses a heuristic involving client and network types to rapidly converge on the list of subsets of clients for the schedule to greatly improve over a trial and error approach.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a multi-network computer system on which the invention might be applied.
FIGS. 2a and 2b combined are a flowchart of the method for modeling the multi-network computer system according to the invention.
FIG. 3 is a flowchart of the method for scheduling the backup process for the multi-network computer system according to the invention.
FIG. 4 is a flowchart of the "calcNetGroup" function used in the described detailed embodiment of the invention.
FIG. 5 is a flowchart of the "findHowManyClientsAreUsed" function used in the described detailed embodiment of the invention.
FIG. 6 is a block diagram of a CAP system.
DETAILED DESCRIPTION OF THE INVENTION
The invention is a means and method to specify how to schedule software services in a client/server environment. In the embodiment, which will be described in detail, the invention allows the determination of a schedule, from a very large number of possible schedules, for the backup process so that the process can complete in a specified amount of time and use only a specified amount of resources.
An example of a computer installation in which the invention could be employed is shown in simplified block form in FIG. 1. The backup server 17 has access to one or more I/O devices 18 which might be tape units, optical drives or any high capacity storage system on which the backup data can be stored (and retrieved if necessary). The backup server is typically a computer which has a CPU and an I/O channel to which the I/O devices are attached. There can be multiple backup servers as will be described in the details presented below, but the principles remain the same. The backup server is shown connected to two sets of networks 11, 19 which are of different types (i.e. "T" and "E"). The network sets will be called "inter-networks". The inter-networks can consist of any number (1..n) of interconnected networks. The networks are typically connected to the backup server by devices called gateways. Although an installation might have many gateways, only one is shown 20. Any number of inter-networks can be connected to the backup server. It is also possible for a single network to be connected to the backup server. An installation will typically have several different types of devices which should have their stored data backed up. The devices could include mainframe computers, minicomputers, servers, personal computers, etc.; and each of these could be further divided into differing brands, models and so on. In FIG. 1, there are only two categories of devices which are labeled type-A 13 and type-B 14. In this example, type-A devices and type-B devices are also connected to either type-T or type-E networks. The client-server terminology labels the devices which use the backup server as clients of the backup server. At least some of these devices will typically be servers which have their own clients. In the following, the term "server" will be used to mean "backup server" unless otherwise indicated.
The preferred embodiments of the invention which will be described below will be called the Capacity Planner (CAP). CAP is a software system that estimates the behavior of a particular backup system installation using software modeling techniques and then generates a schedule using the results. FIG. 6 is a block diagram of the embodiment of the invention which will be described below. The Scheduler 61 coordinates the overall activity of the system by accepting the parameters and definitions from the user, invoking the Modeler 63 with various defined groupings, and generating the actual schedules. The Modeler accepts as its parameters the particular groupings currently under study by the Scheduler and calculates the throughput and utilizations for the groupings which are then returned to the Scheduler. The Scheduler uses the heuristics, which will be defined below, to create certain groupings until one meets the parameters of throughput, utilization, and elapsed time which have been set by the user. (This assumes that an acceptable schedule meeting the criteria exists.) The output from the Scheduler can be simply a soft or hard copy showing the schedule which was selected along with the corresponding throughput, utilization, and elapsed time. It would also be relatively simple to have the output from the Scheduler be in such a form that it could be directly executed by the backup system, thus automating the process.
The CAP algorithm requires that the various aspects of the computer installation be defined by the user. These include the number and types of networks, number and types of clients on each, etc. The CAP algorithm also requires as input values of system parameters to estimate performance indices such as the duration of individual backup sessions, the utilization of the main processor of a client machine, the utilization of a network connecting client machines to a server machine, and the utilization of the main processor of the server machine. Even though there may be multiple networks in the system, a single network utilization value is used as a reasonable simplification of the procedure. Reasonable default values for some or all of these parameters could be used by the algorithm in a practical application. Using the device utilization performance indices, CAP calculates, in an iterative manner, possible backup schedules subject to the resource utilization constraints imposed by the user. CAP calculates first which networks can be backed up at the same time, such that the relative resource utilization of the networks and the CPU utilization of the server are within the bounds set by the user. CAP then calculates which clients on these networks can be backed up at the same time, such that the absolute value of the server utilization is at or below the desired value.
CAP also calculates how many identical groups there are, and how long it takes to backup each group, and all groups, one after another. The criterion used to limit the number of clients that can be backed up at the same time is that the network and the server utilizations have to be at or below the value specified by the user. Utilization is defined to be the fraction of the total time that a certain device is busy while performing the backup services.
The measure of throughput, which is the data rate in megabytes/second observed at the network, is used to tie together the resource utilization of the server and that of the network. This is done by calculating the server utilization when it receives data at a given rate.
To tie together the utilization of the networks and that of the server, in CAP the throughput is used. The throughput is set the same for both, with the network as a data-producing service and the server as a data-consuming service. The throughput is defined to be the utilization divided by the total time spent providing the service. Thus, CAP uses the fact that system throughput, equal to the ratio of the target utilization and total service demand, should be the same at the network and at the server. Service demand refers to the actual time that a service center is busy.
CAP divides the installation into groups. A group consists of networks, clients, and of course the server, that can backup at the same time while keeping the networks and server utilizations lower than the specified limits. A group can contain multiple networks of multiple types, and each network can have attached multiple clients of multiple types. In some cases, the whole installation is a group. CAP determines a certain division of the installation into groups. The number of possible groupings can be very large, as will be shown later. The algorithm does not attempt to try all groups, but rather is a heuristic which does not guarantee that the result is the optimal one.
In a proposed installation configuration, CAP tries to determine first which networks (with all of their clients) satisfy the requirement that the ratio of the service demand at the networks and at the server should be approximately the same as the desired ratio of the network and server utilizations. To determine which networks can be backed up at the same time, CAP calculates the estimate of the total time that each network and the server are busy doing backup, then it sorts the network times in decreasing order. CAP keeps adding networks to a group while the ratio of the total time spent on all networks in the group to the total time spent at the server by the clients attached to these networks is smaller than the ratio of the desired utilizations at the networks and server. Then the next group is created. CAP keeps iterating down the list of networks until all are assigned to groups.
After all networks (along with all their clients) are assigned to groups, each group is further examined to ensure that the utilization at the server and networks is not too high in absolute value. This is done by calculating the performance estimates for each group with each client type being examined separately. (Alternatively the client types could be examined in sets rather than one at a time.) If they are higher than the user specified threshold, the group is further subdivided by allowing only part of the clients in the group to backup at the same time.
The performance of the system with only the clients of one type backing up is calculated for one and all clients active concurrently. The number of clients is varied by binary search until all the device utilizations are around the target values. If, on the contrary, the device utilization with all clients of a type active concurrently is lower than the user-specified threshold, clients of the next type are added to the group. This procedure is repeated until the utilizations are around their target values and all clients of all types are assigned to subgroups.
We start by assuming that the maximum number of clients possible participates in the process simultaneously. If the utilization obtained in this case is at target, we are finished. If it is too large, we reduce the number of clients and repeat the process until we find a smaller number of clients which produces a utilization below the target. If the utilization is too low, we continue to the next client type and repeat the loop, by calculating how many clients of the second type to add to the group.
If we need to find less clients to be used, we keep halving the number of clients and executing the model; then, we select the half interval based on the value of the utilization yielded by running the model with the mid-point number of clients. If the utilizations are too large at midpoint, we select the lower half interval. If they are too high, we select the higher half interval. If interval. If they are at target, we return. We start the interval with the lower bound of one client, and the upper bound of Nmax, the maximum number of client considered, as calculated above.
The CAP parameters of interest for the model are:
server CPU speed;
server I/O speed;
number of I/O devices connected to server;
gateway speed;
number of gateways connected to server;
network speed;
network available bandwidth;
number of networks connected to a gateway or to the server;
client CPU speed;
client I/O speed;
number of clients connected to a network;
workload total size;
workload number of files;
workload number of backup transactions generated, a characteristic of the average file size and the file size distribution; and
backup characteristics such as compressed or not compressed and selective or incremental. If compressed, the compression factor is also a parameter. If incremental, the fraction of the files changed is also a parameter.
The gateways connected to the server, networks connected to a gateway or to the server, and the clients connected to a network can be of multiple types; and each type can have multiple machines.
The CAP parameters of interest for the scheduling algorithm are:
target server utilization;
target network utilization; and
maximum time window available for backing up the whole installation.
Outputs from the model are:
total throughput of the system, in kilobytes (KB)/second and throughput by class;
backup time for each class;
total size of the backup data in megabytes (MB); and
utilization of each device (client, network, server) (utilization =service demand divided by total backup time).
Outputs from the scheduling algorithm are:
throughput of the system, in KB/second;
average response time of the whole system;
total size backed up, in MB; and
a list of all groups which can be backed up concurrently. For each list, the number of identical groups that exist in the system, the backup time, throughput, participating clients and networks, the amounts of data for each class and network type participating, and their utilizations.
The system being modeled is composed of service centers (where requests queue for service), which are the networks, gateways, server CPU and server I/O devices, and of "waiting elements" (the clients or jobs in the system), which generate units of work. The service centers are characterized by service demand and the length of the queue of requests. Queues are commonly used in modeling to simulate processes where waiting time is involved. The installation can be pictured as a tree structure with each leaf generating a workload which is processed by the nodes (service centers). The processing by the nodes requires a finite amount of time. Graphically inputting a tree structure is one way of defining the structure to the CAP system. The root of the tree is the backup server. The workload generated during the backup process by each client will be processed by a certain number of service centers. For example, backing up data on a personal computer might require work by the personal computer, one or more networks/gateways through which the data must pass, the server CPU, and the server's I/O devices. The model maps these nodes into its service centers which give a measure of the times required to process the data from its home on a client to a backup I/O device.
In the following, the term "class" will be used to refer to a client type and a workload type. An example of a class would be all clients that are IBM PS/2 personal computers which have approximately 300 MB to back up, consisting of approximately 1000 files, and which are connected to a TokenRing network.
The term "workload" refers to all the work that the system needs to perform. In the backup system, this translates to all the files to be backed up by a set of clients. A job is a unit of work executed one at a time by each client. For example, units might be pages of data to be transmitted (for example, 4KB of data) or transactions to be finished. However, for the purpose of calculating the modeling parameters, the sizes of the complete client workload are used to keep the floating point arithmetic operations to a minimum. The final result depends only on the total size of the client workload as the individual job sizes cancel out in the calculations.
The backup system (BUS), which will be modeled in the description of the embodiment described below, is a "closed type" system, which means that the number of jobs is kept constant and equal to the multiprogramming level. (Other types of BUS's can be used with the invention by altering the model.) The multiprogramming level N is the number of clients in the system. In reality, even though one transaction is processed at a time, both the network and server operate simultaneously, processing small pieces of files. Therefore, the effective number of clients of each class is double the number computed and the service demand at each service center is smaller by the same factor.
FIGS. 2a and 2b comprise a high-level flowchart of the modeling process, which will be described below. To compute estimates, the service demand is calculated for each class at each service center. This is done by adding up the times to execute BUS operations like processing a file, processing metadata (the system data about files), processing the reading of files, sending files over the network, and compressing files. These processing times are determined in terms of the number of instructions necessary to execute an operation on the client or the server, the speeds of the I/O devices, the speeds of the client and server CPU, the number of files processed, the total size of the files processed, and the network and gateway speeds.
The client service demand is the sum of the time to read a file from disk, to process the file data, to process the file metadata, and to process all the BUS transactions generated during the request. The read time can be computed as the total file size, divided by the effective I/O speed of the disk subsystem. The other times are computed using characteristic values of instruction counts or operation counts, obtained from laboratory measurements, times the file size for data, or times the number of files for metadata operations, divided by the CPU speed as measured in MIPS or SPECints. The server time is calculated in a similar manner. The network time is also calculated similarly, as the sum of times to transfer data and the time to transfer metadata (which is much smaller in size usually). These times are calculated as the total size transmitted, divided by the effective network speed (taking into account bandwidth availability).
After all these calculations are done, we have the values of Dij, the service demand of class i at service center j 21. Each queue length is given an initial estimate, the same for all queues, and it is the total number of clients in the system divided by the total number of service centers 22. The next thing to do is to estimate the throughput of the whole system, which is the calculation of the queue length qij of each class i at each service center j, because it determines the time spent waiting at each service center 23. Once we know these times, we calculate the total time spent by a client job in the complete system, by adding all the times spent waiting and executing at each service center 24. Next, the value of the queue length of each class at each service center is calculated by an iterative process 25. Each iteration yields a new value of the queue length qij. This value is compared with the previous value of the queue length, and the loop is repeated until the values converge by differing by less than an acceptable value.
Inside this iteration loop there is an additional loop over each class: for each class i we calculate the time Qij spent at each service center j 23. This time is calculated as the service demand Dij times the number of jobs in the queue (qj +1-qij /N). The total queue length at the service center is qj, and the queue length of the class i is qij. The correction factor qij /N accounts for the fact that the client does not see itself waiting in the queue, and the 1 accounts for the client itself. Then, we calculate the response time Ri, as the sum of all times Qij spent at all service centers, plus the "waiting time" Zi at the client 24. Then, we calculate the class throughput Xi using Little's law, as the number of clients of this class Ni divided by the response time R i 24. Then, we recalculate qij as the product of the class throughput Xi and the time spent at device j, Q ij 25. We add this new value to the total queue length qj, and continue to iterate 25. For each service center, the error is calculated in a standard manner as the absolute value of the difference between the new and old values of the queue length, divided by the old value. The exit criterion is that each error has to be less than a preset value 26. For robustness, there can also be an exit criterion after a preset number of iterations through the loop, to prevent an infinite loop.
After the loop is exited, the final values of the total throughput, response time and device utilization are calculated. Total throughput is the sum of all the class throughputs Xi, calculated above in the loop, expressed on a KB/second basis, i.e. multiplied by the job size S i 27. The average service time is the total size of all clients (the sum of all Si), divided by the total throughput X 27. The utilization of each device is the sum of the utilizations of each class i at the device j 28. This utilization is calculated as the product of the service demand of the class at the device, Dij, and the class throughput Xi, divided by the number of identical devices N i 28.
The scheduling algorithm which computes the schedules in CAP will now be described with references to FIG. 3. First, we calculate Nmax, the maximum number of clients of class i that can backup at the same time without saturating the server 31. This number is calculated based on the knee point of the curve of throughput versus the number of clients in the system. The knee point is the number of clients where the throughput of the system is about 70% to 80% of the maximum possible, and its value is calculated as the maximum throughput of the system (achieved when all clients are participating in the backup), and the minimum throughput, achieved when only one client is participating in the backup. In order to consider enough clients, we start with Nmax as twice this value. This value is calculated separately for each class, and we loop over all available classes to calculate it. Nmax for each class i is found as two times the ratio of the optimum throughput for the configuration to the throughput for one member of the class. The optimum throughput for the configuration is obtained by running the model with all clients backing up. The throughput for the class is obtained by running the model with only one member of the class.
Next, we calculate the number of networks that can backup at the same time while keeping the server and the networks below the desired utilization values 32. The function which does this is called "calcNetGroups", and it works by comparing service demands at the server, with service demands at the networks, taking into account the utilizations desired. This computation can be done because the system throughput is the same for all devices. This means that the ratio of the service demand to the utilization should be the same at the server and networks, when they are all at their target utilization.
The "calcNetGroups" logic is depicted in FIG. 4. In order to determine which networks can participate in the backup at the same time, the networks are ordered in decreasing order of the service demand spent on them by all classes. They are added to the current group until the total network service time (the sum of the service times of all networks divided by the network target utilization) is larger than the server service demand, divided by the server target utilization. At that point, the group is closed, and another group is started, until all the networks are assigned to a group.
CAP first calculates the service demand at the network plus at the gateway (when it exists) of each class, then calculates the total network service demand by adding all the contributions from the different classes 41. Then it goes through an iteration loop, repeated while there are still networks not accounted for.
The loop starts by checking to see if the group is complete 43 (the criterion). It does this by checking that the difference of the server and the network group times is positive, meaning that the time spent on the server is larger than the time spent on all networks, i.e., the server is slower than the group of networks. If yes, a branch is taken where a new group is created 44. (Initially the criterion is set to true to start a new group the first time the loop is executed 42.) If no, another branch is taken where another network is added to the current group 45. The first time in the loop, or if a group is complete, we take the branch where a new group is started. The network with the largest service demand is found, and assigned to the group. Then we calculate the number of networks and gateways still available of this type. Then CAP calculates how many networks of this type can backup concurrently with the server, by comparing the ratios of the service demand and target utilization, as described above. The minimum of these three numbers (maximum possible, currently available, and suggested by the user) is taken.
The number of suggested concurrence is a preset number, added to introduce the benefit of experimental experience with existing enterprises. We calculate how many times we can repeat a group, based on the number of networks available and on the number allowed to backup at once. Then CAP checks to see if there are any networks left 46 as remainder; and if the answer is yes, it creates a last, smaller group 47. Then it recalculates the criterion for group completion 48. The process stops when all the networks have been accounted for 49; otherwise, we repeat the loop 43.
After returning from the "calcNetGroups" function, CAP has the groups of networks that can be backed up together. Now, CAP determines which clients within these networks can backup together at the same time, i.e., it further fragments the schedules into smaller groups (FIG. 3, 33). In the routine "calcNetGroups", we only made sure that the ratio of the network and server utilizations have the desired values. Now CAP has to actually calculate the utilizations using its internal analytic model of BUS and make sure that they are below the desired values. To do this, CAP traverses the groups of networks and for each group calculates the groups of clients and the backup times.
CAP loops over all the groups of networks, and for each group calculates the numbers of clients still available. Then, it calculates the number of clients that keep the network and server below the desired utilization. This is done by function "findHowManyClientsAreUsed", described below. CAP records the calculated utilizations of the devices for this run, calculates how many times one can repeat this run with identical groups, updates the numbers of remaining available clients of each type, and records the performance statistics for this run (throughput, response time, service center utilizations) 34.
Then, CAP proceeds to the next clients used is equal to of clients used is equal to the number of clients used so far, we exit the loop 35. Else, we still have available clients; so we repeat the loop and we create a new client group 36, and repeat the above process. This is also the end of the loop where we traverse the network groups, i.e., when exiting this loop, we have finished a pass through the loop for each network group, and we proceed to the next network group; or if we traversed all of them, we exit 37.
To determine how many clients can participate in each run, we have function "findHowManyClientsAreUsed". FIG. 5 illustrates the logic flow for this function. In it, we have an iteration loop which traverses all networks in a group of networks looking for the first network with available clients 51. Then, it goes into an internal loop, over all the classes that belong to this network, finding clients to be assigned to this run 52. The number of clients which can be run while meeting the utilization criteria are found by assuming a high and low value, running the model for each value, then adjusting the value using binary search techniques to converge on a number of clients which yield utilizations close to the target values 53. (Of course, for some target utilizations, there might not be a solution.) The inner loop is repeated until all classes have been included 54. The outer loop is repeated for all networks 55.
Upon completion of the scheduling algorithm we have subdivided the system into groups that can be backed up at the same time while keeping the server and network at or below a target utilization set by the user.
To decide how many BUS servers are needed and where to place them, we compare the total backup time for the whole enterprise with the available time window to perform the backup, which is one of the parameters of the program that is requested from the user. If the total backup time is less than the available time window, we are done and have to do nothing.
If the total time available for backup is less than the available time window, this means that we need to have more than one BUS server in the enterprise. We then examine the groups' outputs by the algorithm, and cluster them such that together they take less time to backup than the available time window. For each such group, we need one BUS server. We have achieved two goals: we determine how many servers we need, and where to place them in the installation.
There is also the possibility that the time window to do the backup is smaller than the shortest backup time of any group. In this case the problem has no solution as stated. The system administrator has to modify the base configuration and then retry using CAP. Possible changes that the administrator can use include decreasing the workload on each machine, decreasing the number of clients per network, upgrading the clients to be faster models, and upgrading the networks or the servers.
Appendix A includes C++ source code illustrating a possible embodiment of the invention's scheduling algorithm, including the functions described above.
Extending the Method to Software Services other than Backup It is possible to use the invention described to schedule software services other than backup. The general problem being solved is how to schedule a relatively large number of centrally managed tasks using a finite amount of system resources to accomplish the tasks. Compiling a large number of source code files or updating databases might be examples where similar problems could arise. For the general case of a generic software service, the model could be changed to predict performance of the general software service. The scheduling algorithm uses given constraints to arrive at an acceptable schedule out of a very large number of possibilities. In the backup case, the parameter varied is the number of BUS clients of each type. In the case for other software services, the parameter would change accordingly. In CAP, the constraint is that the utilization has to be less than a preset value. CAP produces values of the utilization, which we compare against a preset maximum value. In the other cases, some other parameter would be produced and its value compared against the constraint.
Benefits of Using the Invention
The following sample calculation illustrates how much time the invention can save in creating a schedule. We will first calculate the total number of existing scenarios that have to be run in a trial-and-error approach. Each scenario has a unique combination of clients and networks and requires running the model once. A scenario can have any number of clients of any type present, so the total number of scenarios is the sum of all possible combinations of clients types and numbers (and the networks attached to them). Any client type I, that has Ni clients, can be present in Ni +1 possible ways: no clients of this type present, one client present, two clients present, up to all Ni clients present. This is the number of cases for the clients attached to one network of type j, but there are Nj networks of type j, and these scenarios apply for each network independently. The total number of scenarios for the clients of type I attached to networks of type j is then (Ni +1) multiplied by itself Nj times, which means (Ni +1) to the power Nj. The total number of scenarios for all client types attached to all network types is the product of all scenarios for all client types:
Ntotal=product over all client typesI of (N.sub.i +1).sup.Nj
where Ni is the number of clients of type I attached to each network of type j, and NJ=Nj, which is the number of networks of type j present.
We will now calculate the number of scenarios run when the algorithm of the invention is used.
There are two typical cases: one in which there are many clients of each type, and there are a total of Nct client types. In this case, the algorithm tries to put together all clients of one type before continuing to the next type. For each client type, the number of clients is determined by binary search from one client to a maximum number previously calculated, which in most practical cases is less than 100. This is done once for each network type, because the number of networks present is known since it was calculated. Since a binary search divides the interval in half at each iteration, the number of iterations is, at most, logarithm in base 2 of 100, which is approximately seven. Therefore, in this case, the total number of scenarios run is less than:
Ntotal=7×Nct
where Nct is the number of client types present.
The second case is the opposite, in which we have one client of each type. In this case, there's no need to calculate the total number of clients of the type; but the scenarios are recalculated while adding one client at a time until the networks or server utilizations become higher than the limit. Assuming there are on the average Nl clients in one configuration, this means that there are approximately Nl scenarios run to determine each configuration, and there are a total of Ncls/Nl configurations. The total number of scenarios is then:
Ntotal=Nl×N(Ncls/Nl)=Ncls
where Ncls is the total number of clients (and types) present.
In each case described, we have reduced the number of scenarios run from an exponential dependence on the number of clients present, to a linear dependence.
To exemplify the time savings, let's consider two enterprises that exemplify the two situations described above:
a) A large enterprise with many workstations of the same type. The workstations are the clients. There are two types of networks, with two networks of each type. Each network has two types of clients, with nine clients of each type. The total number of network types in the enterprise is two, and the total number of networks is four. The total number of client types in the enterprise is four, and the total number of clients is 9×8=72. The total number of scenarios that need to be run in a trial and error approach is:
Ntotal=(9+1).sup.2 ×(9+1).sup.2 ×(9+1).sup.2 ×(9+1).sup.2 =10.sup.8 =100,000,000.
When using the algorithm, this number is reduced to less than:
Ntotal=7×4=28.
Thus, the savings are enormous.
b) The clients are actually file servers themselves, so there are only 20 clients in the whole enterprise, each of a different type, each attached to its own network.
In the absence of the algorithm, the total number of scenarios that need to be run is:
Ntotal=(1+1).sup.20 =2.sup.20 =1,048,576.
Using the algorithm, the number of scenarios that have to be run is:
Ntotal=20.
This, again, is a significant time savings.
Example Showing Application of the Method
Consider the enterprise structure shown in FIG. 1, which is similar to the big enterprise described above. In the following, specific current hardware and software will be assigned to the generic components for purposes of illustration; but any computers capable of acting as servers could be substituted by altering the associated characteristics used for the calculations. Assume that the server is an RS/6000 workstation with four disk drives attached. "Network 1" is a collection of four interconnected Token Ring (TR) networks and "Network 2" is a collection of four interconnected Ethernet (ET) networks. There are four TR networks and four ET networks. Each TR network has two classes of clients attached: the first class, "Client 1" in FIG. 1, is 20 RS/6000 workstations (AIX-TR) per network, or a total of 80. The second class, "Client 2" in FIG. 1, is 80 PS/2 personal computers (OS/2-TR) per network, or a total of 320. Each ET network has also two classes of clients attached: the first class, "Client 3" in FIG. 1, is 10 RS/6000 workstations (AIX-ET) per network, or a total of 40. The second class, "Client 4" in FIG. 1, is 40 PS/2 personal computers (OS/2-ET) per network, or a total of 160. The characteristics of each class, including its workload (WL), are shown in Table 1. The suggested utilization at the server and network is 80% and the time window available for backup is six hours.
The total number of network types in the enterprise is two, and the total number of networks is 2×4=8. The total number of client types in the enterprise is four, and the total number of clients is 4×(20+80)+4×(10+40)=600. The total workload size per enterprise is 248 GB.
              TABLE 1                                                     
______________________________________                                    
Classes in sample configuration                                           
                                 W1     Number                            
                          Number size per                                 
                                        of files                          
       Network   Client   of     client per                               
Class  type      type     clients                                         
                                 (MB)   client                            
______________________________________                                    
AIX-TR TR        AIX       80    1000    100                              
OS2-TR TR        OS2      320     300   3000                              
AIX-ET ET        AIX       40    1000   1000                              
OS2-ET ET        OS/2     160     200   20000                             
______________________________________                                    
First, we calculate for each class the maximum number of clients that the system can handle, Nmax. We run the model with all clients of all classes participating, and determine the maximum throughput attainable, which is 2,395 KB/sec. Then, we run the model with only one AIX-TR client in the configuration, and get a throughput of 961 KB/sec. The maximum number for the AIX-TR class is:
Nmax=2395/961*2=4
We perform similar calculations for the other classes: we run the model with only one client of the class in the configuration; then, we calculate Nmax. The results are shown in Table 2.
              TABLE 2                                                     
______________________________________                                    
Maximum number of clients calculated for each class                       
              Throughput                                                  
              for 1 client                                                
Class         (KB/sec)  Nmax                                              
______________________________________                                    
AIX-TR        961       4                                                 
OS2-TR        245       19                                                
AIX-ET        916       5                                                 
OS2-ET        158       30                                                
______________________________________                                    
Next, we determine the groups of networks: first we calculate the service time spent by all the clients associated with each network type at the network and server queues. The results are shown in Table 3.
              TABLE 3                                                     
______________________________________                                    
Service times for classes                                                 
                 Service  Service                                         
                                 Service                                  
        Service  demand   demand demand                                   
        demand   at       at     at                                       
        at       server   server server Networks                          
Network network  CPU      I/O    log    per                               
type    (sec)    (sec)    (sec)  (sec)  group                             
______________________________________                                    
TR      91256    57265    9778   24130  2                                 
ET      39900    47782    4000    6899  1                                 
______________________________________                                    
Next, we select the largest time spent at any server queue: it's the server CPU for all networks; therefore, we compare the time spent on the client and network to the time spent on the server CPU. Next, we arrange the networks in decreasing order of service demand; in our case, the TR network is first and the ET is second.
Next, we calculate the number of networks that can participate in the first group, for the network type at the top of the list--the TR network: we divide the service demand at the network (Table 3, column 2) by the service demand at the server (Table 3, column 3), normalized by the respective suggested utilizations at the network and server (both 80% in our case):
N=91256/0.80/(57265/0.80)+1=2.
The resulting number of networks per group is two, also shown in Table 3. Therefore, we have two identical groups, of two TR networks each.
A similar calculation for the ET network type results in four groups of one network each, also shown in Table 3.
Now, we calculate the number of clients in each group that can participate at the same time. We start with class AIX-ET. We run the model for Nmax (5) clients. The resulting service center utilizations, which are our matching criterion, are shown in Table 4. For Nmax (5) the network utilization is too high, i.e., it exceeds the target of 80%. Then, we run the model for one client. The utilizations are found to be too low, i.e., not close enough to the target utilizations. So we run the model at the mid-interval (3 clients). The utilizations are again too high, so we run the model again at mid-interval (2) clients. Now, both the network and server utilizations are close to or below the target; so we stop here. We have reached the right number of clients.
              TABLE 4                                                     
______________________________________                                    
Results of model runs for the AIX-ET class                                
         Network  Server     Server                                       
                                   Server                                 
Number   utili-   utili-     utili-                                       
                                   utili-                                 
of       zation   zation     zation                                       
                                   zation                                 
clients  (%)      (%)        (%)   (%)                                    
______________________________________                                    
5        96       46         11    8                                      
1        46       22         20    4                                      
3        88       42         13    7                                      
2        74       36         16    6                                      
______________________________________                                    
This means that this group will have one ET network backing up at once, with two AIX clients attached to it participating. We need to repeat each group 20 times to finish backing up all 40 AIX-ET clients.
We repeat the above calculation for the other classes, and the final results for all classes are shown in Table 5:
              TABLE 5                                                     
______________________________________                                    
Final results of model runs for all classes                               
       Final                                                              
       number   Network   Server Server Server                            
       of       utili-    utili- utili- utili-                            
       clients  zation    zation zation zation                            
Class  in group (%)       (%)    (%)    (%)                               
______________________________________                                    
AIX-TR 2        72        65     16     6                                 
OS2-TR 4        46        70     10     40                                
AIX-ET 2        74        36     16     6                                 
OS2-ET 5        40        78     4      11                                
______________________________________                                    
Now, the final results of all runs are shown in Table 6; and they are the composition of each group, how many identical groups there are, and the total backup time and throughput for each.
              TABLE 6                                                     
______________________________________                                    
Final composition of groups and backup performance                        
characteristics                                                           
       Number                                                             
       of                Clients        Backup                            
       identi-           per net-                                         
                                 Backup through-                          
       cal      Networks work    time   put                               
Group  groups   included included                                         
                                 (hr:min)                                 
                                        (KB/sec)                          
______________________________________                                    
AIX-TR 20       2 TR     2 AIX-TR                                         
                                  7:45  2864                              
OS2-TR 40       2 TR     5 OS2-TR                                         
                                 15:32  1716                              
AIX-ET 20       1 ET     2 AIX-ET                                         
                                  7:33  1470                              
OS2-ET 32       1 ET     5 OS2-ET                                         
                                 13:37   652                              
______________________________________                                    
The total backup time for the whole enterprise is 44.5 hours, with an average throughput of 1550 KB/sec.
Sample output from a software embodiment of the invention using the input in the foregoing example is included as Appendix B.
The invention has been described by way of a preferred embodiment, but those skilled in the art will understand that various changes in form and detail may be made without deviating from the spirit or scope of the invention. The invention may be implemented using any combination of computer programming software, firmware or hardware. As a preparatory step to practicing the invention or constructing an apparatus according to the invention, the computer programming code (whether software or firmware) according to the invention will typically be stored in one or more machine readable storage devices such as fixed (hard) drives, diskettes, optical disks, magnetic tape, semiconductor memories such as ROMs, PROMs, etc., thereby making an article of manufacture according to the invention. The article of manufacture containing the computer programming code is used by either executing the code directly from the storage device, by copying the code from the storage device into another storage device such a hard disk, RAM, etc. or by transmitting the code on a network for remote execution. The method form of the invention may be practiced by combining one or more machine readable storage device containing the code according to the invention with appropriate standard computer hardware to execute the code contained therein. An apparatus for practicing the invention could be a one or more computers and storage systems containing or having network access to computer programming code according to the invention. ##SPC1##

Claims (13)

We claim:
1. A method for scheduling a backup service in a computer installation having a plurality of clients consisting of more than one client type, a plurality of interconnected networks and at least one server, the method comprising the steps of:
(a) building a model of the computer installation which calculates elapsed time and utilization of computer installation resources for the selected service for a selected subset of clients for the selected service for a selected subset of clients, the model using definitions of client types, network types and interconnection of clients and networks in the computer installation;
(b) repeatedly invoking the model with a minimum number of clients and then a maximum number of clients of a single type to find elapsed times and utilization of computer installation resources, then invoking the model with varying numbers of clients between the minimum and maximum numbers of clients until finding a number of clients for the subset for which the utilization or elapsed time criteria are met and adjusting the subsets to contain clients which can be serviced sequentially without exceeding an elapsed time criterion or a utilization criterion; and
(c) generating a schedule by arranging the subsets into a sequence.
2. The method of claim 1 wherein the invoking step initially groups the clients into subsets by network and type.
3. An apparatus for scheduling a backup service in a computer installation having a plurality of clients consisting of more than one client type, a plurality of interconnected networks and at least one server, comprising:
(a) a Modeler which calculates utilization of computer installation resources and elapsed time for the selected service for a selected subset of clients, the calculations using definitions of client types, network types and interconnection of clients and networks in the computer installation, using queues to model elements in the installation which work to provide the services and using Little's Law to calculate throughput; and
(b) a Scheduler which repeatedly invokes the Modeler with subsets of clients to find utilizations and elapsed times and adjusts the subsets to generate a schedule which is a list of subsets of clients which can be serviced sequentially without exceeding a utilization criterion or an elapsed time criterion.
4. The apparatus of claim 3 wherein the services being scheduled are backup services.
5. The apparatus of claim 4 wherein the Modeler uses queues to model elements in the installation which work to provide the services.
6. An apparatus for scheduling services in a computer installation having a plurality of clients consisting of more than one client type, a plurality of interconnected networks and at least one server, comprising:
(a) first means for modeling utilization of the computer installation resources for services for a selected subset of clients, which calculates utilization of installation resources using definitions of client types, network types and interconnection of clients and networks in the computer installation;
(b) second means for modeling which calculates elapsed time for the selected service for a selected subset of clients using Little's Law to calculate throughput;
(c) means for scheduling which repeatedly invokes the first and second means for modeling with subsets of clients to find utilizations and elapsed times for the subsets and adjusts the subsets to generate a schedule which is a list of subsets of clients which can be serviced sequentially without exceeding a utilization criterion or an elapsed time criterion.
7. The apparatus of claim 7 wherein the service being scheduled is a backup service.
8. The apparatus of claim 6 wherein the first means for modeling use queues to model elements in the installation which work to provide the service.
9. The apparatus of claim 7 wherein the means for scheduling initially divides the clients into subsets based on client type and network to which the clients are attached.
10. A computer readable storage device containing program code for scheduling a selected service in a computer installation having a plurality of clients consisting of more than one client type, a plurality of interconnected networks and at least one server, the program code comprising:
(a) program code for modeling utilization of the computer installation resources and elapsed time for the selected service for a selected subset of clients uses Little's Law to calculate throughout, the modeling using definitions of client types, network types and interconnection of clients and networks in the computer installation; and
(b) program code for scheduling which repeatedly invokes the program code for modeling with subsets of clients to find utilizations and elapsed times and adjusts the subsets to generate a schedule which is a list of subsets of clients which can be serviced sequentially without exceeding a utilization criterion or an elapsed time criterion.
11. The computer readable storage device of claim 10 wherein the service being scheduled is a backup service.
12. The computer readable storage device of claim 10 wherein the program code for modeling use queues to model elements in the installation which work to provide the service.
13. The computer readable storage device of claim 10 wherein the program code for scheduling initially divides the clients into subsets based on client type and network to which the clients are attached.
US08/598,488 1996-02-12 1996-02-12 Scheduling computerized backup services Expired - Lifetime US5854754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US08/598,488 US5854754A (en) 1996-02-12 1996-02-12 Scheduling computerized backup services

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08/598,488 US5854754A (en) 1996-02-12 1996-02-12 Scheduling computerized backup services

Publications (1)

Publication Number Publication Date
US5854754A true US5854754A (en) 1998-12-29

Family

ID=24395749

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/598,488 Expired - Lifetime US5854754A (en) 1996-02-12 1996-02-12 Scheduling computerized backup services

Country Status (1)

Country Link
US (1) US5854754A (en)

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999046660A2 (en) * 1998-03-12 1999-09-16 Fairbanks Systems Group System and method for backing up computer files over a wide area computer network
US6038665A (en) * 1996-12-03 2000-03-14 Fairbanks Systems Group System and method for backing up computer files over a wide area computer network
US6078932A (en) * 1998-01-13 2000-06-20 International Business Machines Corporation Point-in-time backup utilizing multiple copy technologies
US20010037471A1 (en) * 2000-03-01 2001-11-01 Ming-Kang Liu System and method for internal operation of multiple-port xDSL communications systems
US6332161B1 (en) * 1998-09-25 2001-12-18 Charles Schwab & Co., Inc. Customer web log-in architecture
US20010056524A1 (en) * 2000-06-27 2001-12-27 Masayuki Ono Information processing system
US20020023086A1 (en) * 2000-06-30 2002-02-21 Ponzio, Jr. Frank J. System and method for providing signaling quality and integrity of data content
US20020169816A1 (en) * 2001-05-08 2002-11-14 David Meiri Selection of a resource in a distributed computer system
US6526418B1 (en) 1999-12-16 2003-02-25 Livevault Corporation Systems and methods for backing up data files
US20030126247A1 (en) * 2002-01-02 2003-07-03 Exanet Ltd. Apparatus and method for file backup using multiple backup devices
US6606658B1 (en) * 1997-10-17 2003-08-12 Fujitsu Limited Apparatus and method for server resource usage display by comparison of resource benchmarks to determine available performance
US6625623B1 (en) 1999-12-16 2003-09-23 Livevault Corporation Systems and methods for backing up data files
US20030229653A1 (en) * 2002-06-06 2003-12-11 Masashi Nakanishi System and method for data backup
US6675195B1 (en) * 1997-06-11 2004-01-06 Oracle International Corporation Method and apparatus for reducing inefficiencies caused by sending multiple commands to a server
US6704755B2 (en) 1999-12-16 2004-03-09 Livevault Corporation Systems and methods for backing up data files
US20040088147A1 (en) * 2002-10-31 2004-05-06 Qian Wang Global data placement
US6779003B1 (en) * 1999-12-16 2004-08-17 Livevault Corporation Systems and methods for backing up data files
US20050010677A1 (en) * 2003-07-09 2005-01-13 Krissell Daniel L. Methods, systems and computer program products for controlling data transfer for data replication or backup based on system and/or network resource information
US6847984B1 (en) 1999-12-16 2005-01-25 Livevault Corporation Systems and methods for backing up data files
US20050071390A1 (en) * 2003-09-30 2005-03-31 Livevault Corporation Systems and methods for backing up data files
US20050081099A1 (en) * 2003-10-09 2005-04-14 International Business Machines Corporation Method and apparatus for ensuring valid journaled file system metadata during a backup operation
US20060117221A1 (en) * 2004-11-05 2006-06-01 Fisher David J Method, apparatus, computer program and computer program product for adjusting the frequency at which data is backed up
US20060288183A1 (en) * 2003-10-13 2006-12-21 Yoav Boaz Apparatus and method for information recovery quality assessment in a computer system
US20080016217A1 (en) * 2006-06-28 2008-01-17 International Business Machines Corporation System and method for distributed utility optimization in a messaging infrastructure
US20080195447A1 (en) * 2007-02-09 2008-08-14 Eric Bouillet System and method for capacity sizing for computer systems
GB2448566A (en) * 2007-03-27 2008-10-22 Symantec Corp A load balancing backup system
US20080263551A1 (en) * 2007-04-20 2008-10-23 Microsoft Corporation Optimization and utilization of media resources
WO2009086326A1 (en) * 2007-12-20 2009-07-09 Akorri Networks, Inc. Evaluating and predicting computer system performance using kneepoint analysis
US20090300633A1 (en) * 2008-05-28 2009-12-03 International Business Machines Corporation Method and System for Scheduling and Controlling Backups in a Computer System
US7664797B1 (en) * 2005-01-27 2010-02-16 Symantec Operating Corporation Method and apparatus for using statistical process control within a storage management system
US20100198793A1 (en) * 2009-02-03 2010-08-05 Ofer Elrom Methods of multi-server application synchronization without stopping i/o
US20110137790A1 (en) * 2009-12-08 2011-06-09 Bank Of Communications Mainframe-based far-distance bicentric transaction information processing method and system
US20110231172A1 (en) * 2010-03-21 2011-09-22 Stephen Gold Determining impact of virtual storage backup jobs
US8478952B1 (en) * 2006-06-13 2013-07-02 Emc Corporation Flexible optimized group-based backups
US9009724B2 (en) 2010-09-24 2015-04-14 Hewlett-Packard Development Company, L.P. Load balancing data access in virtualized storage nodes
US9772908B1 (en) * 2013-12-05 2017-09-26 EMC IP Holding Company LLC Method and system for concurrently backing up data streams of multiple computers based on backup time estimates
US20180067819A1 (en) * 2016-09-02 2018-03-08 Vmware, Inc. Efficient scheduling of backups for cloud computing systems
US9959138B1 (en) * 2015-09-11 2018-05-01 Cohesity, Inc. Adaptive self-maintenance scheduler
US9977721B2 (en) 2007-12-20 2018-05-22 Netapp, Inc. Evaluating and predicting computer system performance using kneepoint analysis
US20180276033A1 (en) * 2017-03-24 2018-09-27 Canon Kabushiki Kaisha Information processing apparatus, control method for information processing apparatus, and application management method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5005122A (en) * 1987-09-08 1991-04-02 Digital Equipment Corporation Arrangement with cooperating management server node and network service node
US5133065A (en) * 1989-07-27 1992-07-21 Personal Computer Peripherals Corporation Backup computer program for networks
US5179702A (en) * 1989-12-29 1993-01-12 Supercomputer Systems Limited Partnership System and method for controlling a highly parallel multiprocessor using an anarchy based scheduler for parallel execution thread scheduling
US5345587A (en) * 1988-09-14 1994-09-06 Digital Equipment Corporation Extensible entity management system including a dispatching kernel and modules which independently interpret and execute commands
US5369570A (en) * 1991-11-14 1994-11-29 Parad; Harvey A. Method and system for continuous integrated resource management
US5381546A (en) * 1987-04-13 1995-01-10 Gte Laboratories Incorporated Control process for allocating services in communications systems
US5452459A (en) * 1993-01-08 1995-09-19 Digital Equipment Corporation Method and apparatus for allocating server access in a distributed computing environment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5381546A (en) * 1987-04-13 1995-01-10 Gte Laboratories Incorporated Control process for allocating services in communications systems
US5005122A (en) * 1987-09-08 1991-04-02 Digital Equipment Corporation Arrangement with cooperating management server node and network service node
US5345587A (en) * 1988-09-14 1994-09-06 Digital Equipment Corporation Extensible entity management system including a dispatching kernel and modules which independently interpret and execute commands
US5608907A (en) * 1988-09-14 1997-03-04 Digital Equipment Corp. Extensible entity management system including an information manager for obtaining, storing and retrieving information from entities
US5133065A (en) * 1989-07-27 1992-07-21 Personal Computer Peripherals Corporation Backup computer program for networks
US5179702A (en) * 1989-12-29 1993-01-12 Supercomputer Systems Limited Partnership System and method for controlling a highly parallel multiprocessor using an anarchy based scheduler for parallel execution thread scheduling
US5369570A (en) * 1991-11-14 1994-11-29 Parad; Harvey A. Method and system for continuous integrated resource management
US5452459A (en) * 1993-01-08 1995-09-19 Digital Equipment Corporation Method and apparatus for allocating server access in a distributed computing environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
E.D. Lazowska, J. Zahorian, G.S. Graham and K.C. Sevak, Quantitative System Performance, Prentice Hall, 1994, pp. 127 151. *
E.D. Lazowska, J. Zahorian, G.S. Graham and K.C. Sevak, Quantitative System Performance, Prentice Hall, 1994, pp. 127-151.

Cited By (86)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6038665A (en) * 1996-12-03 2000-03-14 Fairbanks Systems Group System and method for backing up computer files over a wide area computer network
US6675195B1 (en) * 1997-06-11 2004-01-06 Oracle International Corporation Method and apparatus for reducing inefficiencies caused by sending multiple commands to a server
US6606658B1 (en) * 1997-10-17 2003-08-12 Fujitsu Limited Apparatus and method for server resource usage display by comparison of resource benchmarks to determine available performance
US6078932A (en) * 1998-01-13 2000-06-20 International Business Machines Corporation Point-in-time backup utilizing multiple copy technologies
WO1999046660A3 (en) * 1998-03-12 1999-10-21 Fairbanks Systems Group System and method for backing up computer files over a wide area computer network
WO1999046660A2 (en) * 1998-03-12 1999-09-16 Fairbanks Systems Group System and method for backing up computer files over a wide area computer network
US6332161B1 (en) * 1998-09-25 2001-12-18 Charles Schwab & Co., Inc. Customer web log-in architecture
US7644113B2 (en) 1999-12-16 2010-01-05 Iron Mountain Incorporated Systems and methods for backing up data files
US20030074378A1 (en) * 1999-12-16 2003-04-17 Livevault Corporation Systems and methods for backing up data files
US20070208783A1 (en) * 1999-12-16 2007-09-06 Christopher Midgley Systems and methods for backing up data files
US20050193031A1 (en) * 1999-12-16 2005-09-01 Livevault Corporation Systems and methods for backing up data files
US6847984B1 (en) 1999-12-16 2005-01-25 Livevault Corporation Systems and methods for backing up data files
US6779003B1 (en) * 1999-12-16 2004-08-17 Livevault Corporation Systems and methods for backing up data files
US6704755B2 (en) 1999-12-16 2004-03-09 Livevault Corporation Systems and methods for backing up data files
US6625623B1 (en) 1999-12-16 2003-09-23 Livevault Corporation Systems and methods for backing up data files
US6526418B1 (en) 1999-12-16 2003-02-25 Livevault Corporation Systems and methods for backing up data files
US7295571B2 (en) 2000-03-01 2007-11-13 Realtek Semiconductor Corp. xDSL function ASIC processor and method of operation
US20020008256A1 (en) * 2000-03-01 2002-01-24 Ming-Kang Liu Scaleable architecture for multiple-port, system-on-chip ADSL communications systems
US20010037471A1 (en) * 2000-03-01 2001-11-01 Ming-Kang Liu System and method for internal operation of multiple-port xDSL communications systems
US20010049757A1 (en) * 2000-03-01 2001-12-06 Ming-Kang Liu Programmable task scheduler for use with multiport xDSL processing system
US7085285B2 (en) 2000-03-01 2006-08-01 Realtek Semiconductor Corp. xDSL communications systems using shared/multi-function task blocks
US7075941B2 (en) 2000-03-01 2006-07-11 Real Communications, Inc. Scaleable architecture for multiple-port, system-on-chip ADSL communications systems
US20010047434A1 (en) * 2000-03-01 2001-11-29 Ming-Kang Liu xDSL communications systems using shared/multi-function task blocks
US20020049581A1 (en) * 2000-03-01 2002-04-25 Ming-Kang Liu Physical medium dependent sub-system with shared resources for multiport xDSL system
US20020010849A1 (en) * 2000-03-01 2002-01-24 Ming-Kang Liu Data object architecture and method for xDSL ASIC processor
US8325751B2 (en) * 2000-03-01 2012-12-04 Realtek Semiconductor Corp. Mixed hardware/software architecture and method for processing communications
US7032223B2 (en) 2000-03-01 2006-04-18 Realtek Semiconductor Corp. Transport convergence sub-system with shared resources for multiport xDSL system
US6839889B2 (en) * 2000-03-01 2005-01-04 Realtek Semiconductor Corp. Mixed hardware/software architecture and method for processing xDSL communications
US7818748B2 (en) 2000-03-01 2010-10-19 Realtek Semiconductor Corporation Programmable task scheduler
US6986073B2 (en) 2000-03-01 2006-01-10 Realtek Semiconductor Corp. System and method for a family of digital subscriber line (XDSL) signal processing circuit operating with an internal clock rate that is higher than all communications ports operating with a plurality of port sampling clock rates
US7200138B2 (en) 2000-03-01 2007-04-03 Realtek Semiconductor Corporation Physical medium dependent sub-system with shared resources for multiport xDSL system
US20050071800A1 (en) * 2000-03-01 2005-03-31 Realtek Semiconductor Corporation Mixed hardware/sofware architecture and method for processing xDSL communications
US20020010810A1 (en) * 2000-03-01 2002-01-24 Ming-Kang Liu xDSL function ASIC processor & method of operation
US20010056524A1 (en) * 2000-06-27 2001-12-27 Masayuki Ono Information processing system
US6711592B2 (en) * 2000-06-27 2004-03-23 Oki Electric Industry Co, Ltd. Information processing system
US20020023086A1 (en) * 2000-06-30 2002-02-21 Ponzio, Jr. Frank J. System and method for providing signaling quality and integrity of data content
US6886164B2 (en) * 2001-05-08 2005-04-26 Emc Corporation Selection of a resource in a distributed computer system
US20020169816A1 (en) * 2001-05-08 2002-11-14 David Meiri Selection of a resource in a distributed computer system
US20030126247A1 (en) * 2002-01-02 2003-07-03 Exanet Ltd. Apparatus and method for file backup using multiple backup devices
WO2003060761A1 (en) * 2002-01-02 2003-07-24 Exanet Ltd. An apparatus and method for file backup using multiple backup devices
US20030229653A1 (en) * 2002-06-06 2003-12-11 Masashi Nakanishi System and method for data backup
US7080105B2 (en) * 2002-06-06 2006-07-18 Hitachi, Ltd. System and method for data backup
US7225118B2 (en) * 2002-10-31 2007-05-29 Hewlett-Packard Development Company, L.P. Global data placement
US20040088147A1 (en) * 2002-10-31 2004-05-06 Qian Wang Global data placement
US7287086B2 (en) 2003-07-09 2007-10-23 Internatinonal Business Machines Corporation Methods, systems and computer program products for controlling data transfer for data replication or backup based on system and/or network resource information
US20050010677A1 (en) * 2003-07-09 2005-01-13 Krissell Daniel L. Methods, systems and computer program products for controlling data transfer for data replication or backup based on system and/or network resource information
US7860832B2 (en) 2003-09-30 2010-12-28 Iron Mountain Incorporated Systems and methods for maintaining data files
US7225208B2 (en) 2003-09-30 2007-05-29 Iron Mountain Incorporated Systems and methods for backing up data files
US20050071390A1 (en) * 2003-09-30 2005-03-31 Livevault Corporation Systems and methods for backing up data files
US20070294321A1 (en) * 2003-09-30 2007-12-20 Christopher Midgley Systems and methods for backing up data files
US20050081099A1 (en) * 2003-10-09 2005-04-14 International Business Machines Corporation Method and apparatus for ensuring valid journaled file system metadata during a backup operation
US8244792B2 (en) 2003-10-13 2012-08-14 Emc (Benelux) B.V., S.A.R.L. Apparatus and method for information recovery quality assessment in a computer system
US20060288183A1 (en) * 2003-10-13 2006-12-21 Yoav Boaz Apparatus and method for information recovery quality assessment in a computer system
US7484119B2 (en) * 2004-11-05 2009-01-27 International Business Machines Corporation Method, apparatus, computer program and computer program product for adjusting the frequency at which data is backed up
US20060117221A1 (en) * 2004-11-05 2006-06-01 Fisher David J Method, apparatus, computer program and computer program product for adjusting the frequency at which data is backed up
US7664797B1 (en) * 2005-01-27 2010-02-16 Symantec Operating Corporation Method and apparatus for using statistical process control within a storage management system
US8478952B1 (en) * 2006-06-13 2013-07-02 Emc Corporation Flexible optimized group-based backups
US20080016217A1 (en) * 2006-06-28 2008-01-17 International Business Machines Corporation System and method for distributed utility optimization in a messaging infrastructure
US7689695B2 (en) * 2006-06-28 2010-03-30 International Business Machines Corporation System and method for distributed utility optimization in a messaging infrastructure
US20080195447A1 (en) * 2007-02-09 2008-08-14 Eric Bouillet System and method for capacity sizing for computer systems
GB2448566A (en) * 2007-03-27 2008-10-22 Symantec Corp A load balancing backup system
GB2448566B (en) * 2007-03-27 2011-11-02 Symantec Corp Method and apparatus for allocating resources among backup tasks in a data backup system
US20080263551A1 (en) * 2007-04-20 2008-10-23 Microsoft Corporation Optimization and utilization of media resources
US8091087B2 (en) * 2007-04-20 2012-01-03 Microsoft Corporation Scheduling of new job within a start time range based on calculated current load and predicted load value of the new job on media resources
US8805647B2 (en) 2007-12-20 2014-08-12 Netapp, Inc. Evaluating and predicting computer system performance using kneepoint analysis
US9977721B2 (en) 2007-12-20 2018-05-22 Netapp, Inc. Evaluating and predicting computer system performance using kneepoint analysis
WO2009086326A1 (en) * 2007-12-20 2009-07-09 Akorri Networks, Inc. Evaluating and predicting computer system performance using kneepoint analysis
US8566285B2 (en) * 2008-05-28 2013-10-22 International Business Machines Corporation Method and system for scheduling and controlling backups in a computer system
US20090300633A1 (en) * 2008-05-28 2009-12-03 International Business Machines Corporation Method and System for Scheduling and Controlling Backups in a Computer System
US8321610B2 (en) 2009-02-03 2012-11-27 International Business Machines Corporation Methods of multi-server application synchronization without stopping I/O
CN102308286A (en) * 2009-02-03 2012-01-04 国际商业机器公司 Method and system for multi-server application synchronization without stopping I/O
US8108575B2 (en) * 2009-02-03 2012-01-31 International Business Machines Corporation Methods of multi-server application synchronization without stopping I/O
CN102308286B (en) * 2009-02-03 2013-11-13 国际商业机器公司 Method and system for multi-server application synchronization without stopping I/O
US8656073B2 (en) 2009-02-03 2014-02-18 International Business Machines Corporation Methods of multi-server application synchronization without stopping I/O
US20100198793A1 (en) * 2009-02-03 2010-08-05 Ofer Elrom Methods of multi-server application synchronization without stopping i/o
US8352363B2 (en) * 2009-12-08 2013-01-08 Bank Of Communications Mainframe-based far-distance bicentric transaction information processing method and system
US20110137790A1 (en) * 2009-12-08 2011-06-09 Bank Of Communications Mainframe-based far-distance bicentric transaction information processing method and system
US20110231172A1 (en) * 2010-03-21 2011-09-22 Stephen Gold Determining impact of virtual storage backup jobs
US9158653B2 (en) * 2010-03-21 2015-10-13 Hewlett-Packard Development Company, L.P. Determining impact of virtual storage backup jobs
US9009724B2 (en) 2010-09-24 2015-04-14 Hewlett-Packard Development Company, L.P. Load balancing data access in virtualized storage nodes
US9772908B1 (en) * 2013-12-05 2017-09-26 EMC IP Holding Company LLC Method and system for concurrently backing up data streams of multiple computers based on backup time estimates
US9959138B1 (en) * 2015-09-11 2018-05-01 Cohesity, Inc. Adaptive self-maintenance scheduler
US10303508B2 (en) 2015-09-11 2019-05-28 Cohesity, Inc. Adaptive self-maintenance scheduler
US20180067819A1 (en) * 2016-09-02 2018-03-08 Vmware, Inc. Efficient scheduling of backups for cloud computing systems
US11023330B2 (en) * 2016-09-02 2021-06-01 Vmware, Inc. Efficient scheduling of backups for cloud computing systems
US20180276033A1 (en) * 2017-03-24 2018-09-27 Canon Kabushiki Kaisha Information processing apparatus, control method for information processing apparatus, and application management method

Similar Documents

Publication Publication Date Title
US5854754A (en) Scheduling computerized backup services
US6009455A (en) Distributed computation utilizing idle networked computers
Deelman et al. Pegasus: mapping large-scale workflows to distributed resources
Carman et al. Towards an economy-based optimisation of file access and replication on a data grid
Das et al. Parallel processing of adaptive meshes with load balancing
Gaussier et al. Online tuning of EASY-backfilling using queue reordering policies
Teng et al. Simmapreduce: A simulator for modeling mapreduce framework
US20130268941A1 (en) Determining an allocation of resources to assign to jobs of a program
CN110347489B (en) Multi-center data collaborative computing stream processing method based on Spark
Ivashko et al. A survey of desktop grid scheduling
Lee et al. On the performance of a dual-objective optimization model for workflow applications on grid platforms
Feitelson et al. Self-tuning systems
Al-Mistarihi et al. On fairness, optimizing replica selection in data grids
Molka et al. Memory-aware sizing for in-memory databases
Linderoth et al. Computational grids for stochastic programming
Ivashko et al. Batch of tasks completion time estimation in a desktop grid
Li et al. Data balancing-based intermediate data partitioning and check point-based cache recovery in Spark environment
Khajehvand et al. Multi-objective and scalable heuristic algorithm for workflow task scheduling in utility grids
Legrand et al. Monarc simulation framework
Bora et al. The tiny-tasks granularity trade-off: Balancing overhead versus performance in parallel systems
Benitez et al. Parallel performance model for vertex repositioning algorithms and application to mesh partitioning
Janardhanan et al. Analysis and modeling of resource management overhead in Hadoop YARN Clusters
Friese et al. Optimizing distributed data-intensive workflows
Moghadam et al. A new data-intensive task scheduling in optorsim, an open source grid simulator
Lim Resource management techniques for multi-stage jobs with deadlines running on clouds

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CABRERA, LUIS FELIPE;DRAGOESCU, CLAUDIA BEINGLAS;REEL/FRAME:007885/0084;SIGNING DATES FROM 19960211 TO 19960212

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: TREND MICRO INCORPORATED,JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:024390/0164

Effective date: 20100331

FPAY Fee payment

Year of fee payment: 12