WO2009020976A1 - Event prediction - Google Patents

Event prediction Download PDF

Info

Publication number
WO2009020976A1
WO2009020976A1 PCT/US2008/072245 US2008072245W WO2009020976A1 WO 2009020976 A1 WO2009020976 A1 WO 2009020976A1 US 2008072245 W US2008072245 W US 2008072245W WO 2009020976 A1 WO2009020976 A1 WO 2009020976A1
Authority
WO
WIPO (PCT)
Prior art keywords
statistics
variables
event
proposed
indicator
Prior art date
Application number
PCT/US2008/072245
Other languages
French (fr)
Inventor
Ralf Herbrich
Thore Graepel
Joaquin Quinonero Candela
Onno Zoeter
Phillip Trelford
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Priority to EP08797215A priority Critical patent/EP2176787A4/en
Publication of WO2009020976A1 publication Critical patent/WO2009020976A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud

Definitions

  • BACKGROUND [0001 ] There are many situations in which it is desired to predict outcomes of events and in many cases it is required to make these predictions in real time and where huge amounts (such as terabytes) of information about past events are available to assist with the prediction. [0002] For example, in the field of fraud detection it is often required to process large amounts of data about credit card transaction behavior and to use that information to make predictions as to whether ongoing or recent transactions are likely to be fraudulent. Other examples include email filtering where it is required to predict whether an email is likely to be spam or not on the basis of past examples of emails being labeled implicitly or explicitly as spam.
  • the process involves combining the accessed statistics and mapping them into a number representing the probability of the proposed event having a specified outcome by using a link function.
  • a machine learning process using assumed density filtering is used to learn the statistics from data about observed events.
  • the event prediction system is used as part of an internet advertising system to predict whether a proposed advertisement will be clicked or not.
  • the event prediction system is used as part of an email filtering system and in another example it is used as part of a system for detecting fraudulent credit card transactions.
  • FIC. 1 is a schematic diagram of an event prediction system
  • FIC. 2 is a schematic diagram of an internet advertising system
  • FIC. 3 is a schematic diagram of an email filtering system
  • FIC. 4 is a schematic diagram of a credit card fraud detection system
  • FIC. 5 is a block diagram of an example method of training an event prediction system
  • FIC. 6 is a block diagram of an example method of making a prediction for a proposed event
  • FIC. 7 is a block diagram of an example method of billing an internet advertiser;
  • FIC. 8 is a block diagram of an example method of email filtering;
  • FIC. 9 is a block diagram of an example method of credit card fraud detection;
  • FIC. 1 0 is a block diagram of an example of part of a method of training an event prediction system
  • FIC. 1 1 illustrates an exemplary computing-based device in which embodiments of an event prediction system may be implemented.
  • Like reference numerals are used to designate like parts in the accompanying drawings.
  • indicator variable is used herein to refer to a variable which may take only one of two values such as 0 and 1 .
  • Each indicator variable is associated with a feature which describes or is associated with an event.
  • a “variable” may take any real value. For example, suppose a feature 'price 1 is specified. A variable associated with this feature may take any real value such as a number of cents. An “indicator variable” with this feature may take a value of say 0 or 1 , to indicate for a given event, into which of a specified set of price ranges the event falls.
  • FIC. 1 is a schematic diagram of an event prediction system comprising an event monitor 1 00 which observes events which occur and their outcomes.
  • the event monitor 1 00 comprises functionality to access information about the events such as features associated with those events as well as about outcomes of the events. This information may be stored in a data store 1 03 by the event monitor or other suitable means.
  • a training engine 1 02 is able to access the historical data about events and event outcomes from the data store 1 03 and to use this to carry out a training process in order to learn information about weights or other parameters modeling the behavior or process producing the events.
  • the learnt information may be stored in the data store 1 03.
  • a prediction engine is able to access the learnt information and to use that to predict likelihoods of outcomes for proposed events.
  • the event prediction system may in some embodiments be an internet advertisement system as illustrated in FIC. 2.
  • an advertisement monitor 200 observes advertisements that are displayed as well as whether those advertisements are clicked or not by one or more end users.
  • the advertisement monitor may observe information about the event in which an advertisement is displayed and clicked or not.
  • the advertisement may be presented by a search engine as a result of a search query input by an end user.
  • the monitor may observe features associated with the presentation of the advertisement such as any keywords used in the search query, a time of day of the presentation, information about the advertiser, information about the end user making the search query, or any other information about presentation of the advertisement.
  • the observed information may be stored in a data store 203 and used by a training engine 202 in a similar manner to that described above with reference to FIC. 1 .
  • a prediction engine 201 uses the learnt information to predict how likely a proposed advertisement is to be clicked and that prediction information may be used in real time by a billing engine 204 to bill an advertiser 206.
  • One or more such advertisers 206 are in communication with the internet advertisement system via a communications network 205 as are one or more end users or clients 207, 208.
  • the event prediction system may be an anti-spam system for email.
  • an email monitor 300 observes information about or associated with email messages such as information about the sender, words used in the subject line, presence of attachments and other information.
  • the email monitor 300 also observes information about whether those email messages are spam or not.
  • This information may be stored in a data store 303 and used by a training engine 302 in a similar manner as described above with reference to FIC. 1 .
  • the results of the training engine may also be stored in the data store 303 and used by a prediction engine 301 to predict whether a given email message is spam or not.
  • the prediction results may be used by an email filter mechanism in real time to block the email, alert users or allow the email as appropriate.
  • the email monitor may receive information about email over a communications network 305 from any suitable source and where clients 306, 307 are observed to send and or receive email.
  • the prediction system is part of a credit card transaction fraud detection system.
  • Credit card transaction systems 405 provide data to the prediction system so that a credit card transaction monitor 400 is able to observe credit card transactions and to obtain information about those transactions. For example, information about one or more parties to the transaction, information about the time of the transaction, information about the amounts and other information.
  • the information may be stored in a data store 403 together with information about whether the transactions are fraudulent or not.
  • a training engine 402 uses the information in the data store to learn statistics or parameters of a model of credit card transaction behavior in a similar manner as described above with reference to FIC. 1 .
  • the results are stored in the data store 403 and used by a prediction engine in real time 401 to predict whether a new credit card transaction is likely to be fraudulent.
  • the prediction results are used by a transaction alert mechanism 404 which may provide output to the credit card transaction systems Exemplary training method
  • FIC. 5 is a block diagram of an example method of training carried out at a training engine such as any of the training engines of FICs. 1 to 4.
  • a set of variables are received describing an event (block 500). For example, these variables are from historical data about past events and their outcomes.
  • the variables received at the training engine may be received from a data store such as any of the data stores of FICs. 1 to 4.
  • Also received at the training engine is information about an outcome of the event (block 501 ).
  • a plurality of features describing or associated with events are pre- specified and for each of these features one or more variables can exist.
  • an example of a feature may be a time of day of a search query input by a user and resulting in display of an advertisement.
  • Each variable is considered as having an associated weight and information about those weights is learnt during the training process.
  • the weights are used to control how much influence each variable may have on the prediction to be made.
  • Belief about each weight is modeled using any suitable distribution such as a Gaussian distribution and statistics are used to describe those distributions. For example, a mean and a standard deviation are used to describe a Gaussian distribution representing belief about a given weight. However, it is not essential to use a Gaussian distribution; other types of distribution may be used. Also, other statistics may be used instead of or in addition to the mean and standard deviation.
  • the training engine accesses statistics describing belief about a weight for the variable (block 502). For example, if the training process has not encountered the particular variables before, the statistics are given default, initial values. Otherwise, the statistics are accessed from the data store. [0020] The statistics are then updated on the basis of the received information and using a Bayesian update process (block 503). An example of a suitable Bayesian update process is described in more detail below. However, it is not essential to use that exact update process, any suitable Bayesian update process may be used. [0021 ] The updated statistics are stored (block 504) for example in a data store such as any of those of FICs. 1 to 4.
  • the pruning process involves discarding some of the statistics because it is typically not practical to store all these due to the huge amounts of data involved (for example, terabytes of information).
  • the pruning process may be carried out at specified time intervals, or when memory availability is running low or when any combination of these or other conditions occur. If the decision is made not to carry out pruning, then training continues for another set of variables associated with another observed event. For example, in the field of internet advertising, hundreds of million advertisements may be shown in any 24 hour period.
  • the training process may be carried out offline, or during operation of the prediction process to predict event outcomes.
  • a combination of offline training and online training may also be used.
  • the training process it is also possible for the training process to be carried out using indicator variables as opposed to general variables taking real values. For example, there could be twenty four indicator variables for the time of day feature, one indicator variable for each hour of the day. In this case, only one indicator variable may be "on" for a given event because the event occurs at some point during only one hour of the day.
  • each indicator variable is considered as having an associated weight and information about those weights is learnt during the training process as described above with reference to FIC. 5.
  • the prediction engine receives a set of variables for the proposed event (block 600).
  • the prediction engine accesses, for each variable, stored statistics describing belief about values of a weight (block 601 ). For example, this information is accessed from a data store such as any of those data stores shown in FICs. 1 to 4.
  • the stored statistics have been formed during the training process or, if unavailable, are initialized to default values.
  • the statistics of the weights are combined for example and not exclusively in a way that may be consistent with a linear combination of the weights (block 602) and are then mapped to a number representing the probability that the proposed event will have a specified outcome (block 603).
  • the mapping process may comprise using any suitable function. A non-exhaustive list of examples is: inverse probit function, logit function or other link function. An inverse probit function and a logit function are examples of link functions.
  • the probability information may then be used in any suitable manner to control a system.
  • the method of FIC. 6 may also be used with indicator variables in place of the general variables taking real values.
  • probability information for a proposed advertisement being clicked is accessed (FIC. 7, block 700) a bid is received from an advertiser for the advertisement (block 701 ) and a price for the advertisement (should it be clicked) is calculated on the basis of the bid and the probability information (block 702) and possibly other information. The price is then stored and the advertiser billed as appropriate (block 703).
  • the probability information may relate to an internet advertisement being clicked and that click resulting in a sale or other successful outcome for the advertiser. This is referred to as a successful conversion of the internet advertisement into a sale or other successful outcome for the advertiser.
  • FIC. 7 the process of FIC. 7 is similar and the price is calculated on the basis of the bid and the probability of successful conversion.
  • the probability information relates to whether a proposed email is spam or not.
  • the probability information is accessed (block 800) by the anti-spam system and compared with one or more specified thresholds (block 801 ).
  • the anti-spam system then blocks the email, alerts a user or allows the email on the basis of the comparison (block 802).
  • the probability information relates to whether a credit card transaction is fraudulent or not.
  • the probability information is accessed (block 900) and compared with one or more specified thresholds (block 901 ).
  • the anti-fraud system then blocks the transaction, allows the transaction and/or triggers alerts on the basis of this comparison (block 902).
  • the methods described herein comprise modeling belief about weights for variables describing factors relating to an event.
  • Any suitable model may be used.
  • a probability distribution is used to model the belief.
  • a bell-curve belief distribution such as a Gaussian distribution may be used, or any other suitable probability distribution.
  • a bimodal or skewed distribution For example, a bimodal or skewed distribution.
  • Statistics describing the distribution are used in the models as mentioned above. For example, in the case that a Gaussian distribution is used, its mean ⁇ and standard deviation ⁇ may be selected.
  • the update mechanism may use techniques based on
  • the value of x in the above update equations is either 0 or 1 depending on whether an indicator variable is "on" or not as mentioned above. That is, in some embodiments, indicator variables are grouped into N groups with one group per feature.
  • an example feature may be the age of an end user (advertisement viewer, email receiver, credit card transaction party etc.).
  • a plurality of indicator variables for the feature may be age ranges, for example, 0 to 9, 1 0 to 1 9, 20 to 29, 30 to 39 etc. However, for a given event only one of the age ranges may be on. That is, an end user's age is only present in one of the bins.
  • 0 and 1 may be used to represent whether an indicator variable is on or not.
  • groups of indicator variables in this way it is possible to reduce processing and memory requirements, which is especially important in many applications where the quantities of data to be analyzed are huge.
  • x in the above equations may have values other than 0 or 1 .
  • ⁇ 2 which is the variance of the feedback around the weight of each variable, ⁇ 2 is thus a configurable parameter and for example is set to 1 .
  • N and F represent the density of the Gaussian distribution function and the cumulative distribution function of the Gaussian, respectively.
  • the symbol t is simply an argument to the functions. Any suitable numerical or analytic methods can be used to evaluate these functions such as those described in Press et al., Numerical Recipes in C: the Art of Scientific Computing (2 nd . Ed.), Cambridge, Cambridge University Press, ISBN -00521 -431 08-5. [0040] These update equations can be thought of as Bayesian update equations.
  • the statistics may be stored in any suitable manner. For example, using vectors. Learning the distribution for observed data over such a vector of statistics for the weights is a computationally difficult task and the assumed density filtering technique enables a solution to be obtained. [0042] Given a value of the mean and standard deviation for each weight, the predicted probability of outcome A for a given event is given by:
  • P(A I event) [0043]
  • the sums are over all the features weighted by feature values for the given event.
  • the function ⁇ (x) is the cumulative normal distribution function which is also known as the inverse probit function. However, it is also possible to use other mapping functions ⁇ (x) here such as a logit function or other link function.
  • a prediction for a particular proposed event may be made by adding the weights of all the variables for the event. The resulting sum is a real number. An inverse probit function may be used to map this number to a probability between 0.0 and 1 .0.
  • FIC. 1 0 is a block diagram of an example method of setting initial values for weight statistics and also of pruning. This method may be carried out as part of the training process of FIC. 5 for example.
  • the training engine if it is presented with variables for an event where it has not previously seen those variables, it sets initial values of weight statistics for those unseen variables (block 1 000). These initial values may be referred to as the prior.
  • the means are all initialized to 0.0 except for a "dummy" mean ⁇ 0 which is set to a specified value in order to provide a bias (block 1 001 ).
  • the a-priori prediction probability is appropriately 2%.
  • biasing mean and an associated biasing variance may be set at other values depending on the particular application, and can be learnt from a separate set of training data. When a previously unseen variable is introduced, this may inappropriately influence the prediction results.
  • the biasing mean may be used to prevent or reduce the effects of this.
  • the following equation may be used to determine an appropriate initial value for the biasing mean.
  • the biasing mean and variance may be associated with an indicator variable which is always on and which may be referred to as a bias indicator variable.
  • the biasing mean and variance may be learnt. Since all observations help in this learning process it is relatively fast.
  • Other values for the sum of the variances can be chosen by appropriately tuning on a separate set of data during training time. For example, different values of ⁇ , 2 may lead to a slightly different learning behavior. Larger variances tend to result in faster adaptation and smaller variances in more conservative updates. The variances may be chosen differently for different variables.
  • the training engine proceeds to update the statistics during the training process (block 1 002) as described above. If the pruning process is entered, then, for a given variable, the weight statistics are reset to their initial values (re-initialized) and an assessment is made about the impact of this reset on the prediction performance (block 1 003). For example, in some embodiments this is achieved by computing a difference ⁇ , as follows:
  • the pruning process then reverts to the previous weight statistics or continues with the reset values depending on the impact assessment (block 1 004).
  • An optional check for memory availability is made (1 005) for example, if the pruning process is carried out only until memory availability is sufficient to continue the training process.
  • the pruning process then repeats for another variable (block 1 006).
  • the methods described above with reference to FIC. 1 0 may also be used with indicator variables in place of the general variables taking real values.
  • a plurality of specified features are used during the training and prediction process. The particular features chosen depend on the particular application concerned whether it be internet advertising, credit card fraud detection or other applications.
  • the features may be selected by making offline analysis of the training data in order to select those features which are most effective for use in the prediction process.
  • FIC. 1 1 illustrates various components of an exemplary computing-based device 1 1 00 which may be implemented as any form of a computing and/or electronic device, and in which embodiments of an event prediction system may be implemented.
  • the computing-based device 1 1 00 comprises one or more inputs 1 1 02 which are of any suitable type for receiving media content, Internet Protocol (IP) input, information about email, information about internet advertisements, information about credit card transactions, information about events whose outcomes are to be predicted etc. Also provided is an output 1 1 03 for providing output comprising at least prediction results to another system for controlling that system.
  • Computing-based device 1 1 00 also comprises one or more processors 1 1 01 which may be microprocessors, controllers or any other suitable type of processors for processing computing executable instructions to control the operation of the device in order to predict outcomes of events.
  • Platform software comprising an operating system 1 1 05 or any other suitable platform software may be provided at the computing- based device to enable application software 1 1 06 to be executed on the device.
  • the computer executable instructions may be provided using any computer-readable media, such as memory 1 1 07.
  • the memory is of any suitable type such as random access memory (RAM), a disk storage device of any type such as a magnetic or optical storage device, a hard disk drive, or a CD, DVD or other disc drive. Flash memory, EPROM or EEPROM may also be used.
  • a display interface 1 1 04 may be provided such as an audio and/or video output to a display system integral with or in communication with the computing-based device.
  • the display system may provide a graphical user interface, or other user interface of any suitable type although this is not essential.
  • the term 'computer 1 is used herein to refer to any device with processing capability such that it can execute instructions. Those skilled in the art will realize that such processing capabilities are incorporated into many different devices and therefore the term 'computer 1 includes PCs, servers, mobile telephones, personal digital assistants and many other devices.
  • the methods described herein may be performed by software in machine readable form on a storage medium.
  • the software can be suitable for execution on a parallel processor or a serial processor such that the method steps may be carried out in any suitable order, or simultaneously.
  • the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network).
  • the software instructions may be carried out by a dedicated circuit, such as a DSP, programmable logic array, or the like.

Abstract

There are many situations in which it is desired to predict outcomes of events. In an example, an event prediction system is described which receives variables for a proposed event. The system accesses learnt statistics describing belief about weights associated with the variables and uses the weights to determine probability information that the proposed event will have a specified outcome. The process involves combining the accessed statistics and mapping them into a number representing the probability. In another example, a machine learning process using assumed density filtering is used to learn the statistics from data about observed events. The event prediction system may be used as part of any suitable type of system such as an internet advertising system, an email filtering system, or a fraud detection system.

Description

EVENT PREDICTION
BACKGROUND [0001 ] There are many situations in which it is desired to predict outcomes of events and in many cases it is required to make these predictions in real time and where huge amounts (such as terabytes) of information about past events are available to assist with the prediction. [0002] For example, in the field of fraud detection it is often required to process large amounts of data about credit card transaction behavior and to use that information to make predictions as to whether ongoing or recent transactions are likely to be fraudulent. Other examples include email filtering where it is required to predict whether an email is likely to be spam or not on the basis of past examples of emails being labeled implicitly or explicitly as spam. This type of prediction is also required in the field of internet advertising where advertisers may often be billed an amount depending on a bid made by that advertiser for an advertisement and whether that advertisement, when displayed, is selected by one or more end users (by clicking on a link for example). Thus, internet advertisement channel providers typically need to predict so called "click- through rates", or the probability that a proposed advertisement will be clicked on by one or more end users.
[0003] Previously it has been difficult to make such predictions of event outcomes with acceptable levels of accuracy and to do so in real time, for example, before a credit card transaction is complete, before delivery of an email, or before presentation of a proposed internet advertisement. This is especially difficult where there are large amounts of data about past events to be processed.
[0004] It is noted that the invention described herein is not intended to be limited to implementations that solve any or all of the above mentioned disadvantages.
SUMMARY [0005] The following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and it does not identify key/critical elements of the invention or delineate the scope of the invention. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later. [0006] There are many situations in which it is desired to predict outcomes of events. In an example, an event prediction system is described which receives variables for a proposed event. The system accesses learnt statistics describing beliefs about weights associated with the variables and uses the weights to determine probability information that the proposed event will have a specified outcome. The process involves combining the accessed statistics and mapping them into a number representing the probability of the proposed event having a specified outcome by using a link function. In an example, a machine learning process using assumed density filtering is used to learn the statistics from data about observed events. In an example, the event prediction system is used as part of an internet advertising system to predict whether a proposed advertisement will be clicked or not. In another example, the event prediction system is used as part of an email filtering system and in another example it is used as part of a system for detecting fraudulent credit card transactions.
[0007] Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings.
DESCRIPTION OF THE DRAWINGS
[0008] The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein: FIC. 1 is a schematic diagram of an event prediction system;
FIC. 2 is a schematic diagram of an internet advertising system; FIC. 3 is a schematic diagram of an email filtering system; FIC. 4 is a schematic diagram of a credit card fraud detection system; FIC. 5 is a block diagram of an example method of training an event prediction system; FIC. 6 is a block diagram of an example method of making a prediction for a proposed event;
FIC. 7 is a block diagram of an example method of billing an internet advertiser; FIC. 8 is a block diagram of an example method of email filtering; FIC. 9 is a block diagram of an example method of credit card fraud detection;
FIC. 1 0 is a block diagram of an example of part of a method of training an event prediction system;
FIC. 1 1 illustrates an exemplary computing-based device in which embodiments of an event prediction system may be implemented. Like reference numerals are used to designate like parts in the accompanying drawings.
DETAILED DESCRIPTION
[0009] The detailed description provided below in connection with the appended drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present example may be constructed or utilized. The description sets forth the functions of the example and the sequence of steps for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples. [0010] Although the present examples are described and illustrated herein as being implemented in an internet advertising system, an email filtering system, or a credit card transaction fraud detection system, the system described is provided as an example and not a limitation. As those skilled in the art will appreciate, the present examples are suitable for application in a variety of different types of systems which require event prediction. A non-exhaustive list of examples is: credit scoring system, search engine, binary classification system and information filtering system.
[001 1 ] The term "indicator variable" is used herein to refer to a variable which may take only one of two values such as 0 and 1 . Each indicator variable is associated with a feature which describes or is associated with an event. In contrast, a "variable" may take any real value. For example, suppose a feature 'price1 is specified. A variable associated with this feature may take any real value such as a number of cents. An "indicator variable" with this feature may take a value of say 0 or 1 , to indicate for a given event, into which of a specified set of price ranges the event falls. An exemplary system
[001 2] FIC. 1 is a schematic diagram of an event prediction system comprising an event monitor 1 00 which observes events which occur and their outcomes. The event monitor 1 00 comprises functionality to access information about the events such as features associated with those events as well as about outcomes of the events. This information may be stored in a data store 1 03 by the event monitor or other suitable means. A training engine 1 02 is able to access the historical data about events and event outcomes from the data store 1 03 and to use this to carry out a training process in order to learn information about weights or other parameters modeling the behavior or process producing the events. The learnt information may be stored in the data store 1 03. A prediction engine is able to access the learnt information and to use that to predict likelihoods of outcomes for proposed events. [001 3] For example, the event prediction system may in some embodiments be an internet advertisement system as illustrated in FIC. 2. Here an advertisement monitor 200 observes advertisements that are displayed as well as whether those advertisements are clicked or not by one or more end users. The advertisement monitor may observe information about the event in which an advertisement is displayed and clicked or not. For example, the advertisement may be presented by a search engine as a result of a search query input by an end user. The monitor may observe features associated with the presentation of the advertisement such as any keywords used in the search query, a time of day of the presentation, information about the advertiser, information about the end user making the search query, or any other information about presentation of the advertisement. The observed information may be stored in a data store 203 and used by a training engine 202 in a similar manner to that described above with reference to FIC. 1 . A prediction engine 201 uses the learnt information to predict how likely a proposed advertisement is to be clicked and that prediction information may be used in real time by a billing engine 204 to bill an advertiser 206. One or more such advertisers 206 are in communication with the internet advertisement system via a communications network 205 as are one or more end users or clients 207, 208.
[0014] In another example, the event prediction system may be an anti-spam system for email. As illustrated in FIC. 3 an email monitor 300 observes information about or associated with email messages such as information about the sender, words used in the subject line, presence of attachments and other information. The email monitor 300 also observes information about whether those email messages are spam or not. This information may be stored in a data store 303 and used by a training engine 302 in a similar manner as described above with reference to FIC. 1 . The results of the training engine may also be stored in the data store 303 and used by a prediction engine 301 to predict whether a given email message is spam or not. The prediction results may be used by an email filter mechanism in real time to block the email, alert users or allow the email as appropriate. The email monitor may receive information about email over a communications network 305 from any suitable source and where clients 306, 307 are observed to send and or receive email.
[001 5] In another example, described with reference to FIC. 4 the prediction system is part of a credit card transaction fraud detection system. Credit card transaction systems 405 provide data to the prediction system so that a credit card transaction monitor 400 is able to observe credit card transactions and to obtain information about those transactions. For example, information about one or more parties to the transaction, information about the time of the transaction, information about the amounts and other information. The information may be stored in a data store 403 together with information about whether the transactions are fraudulent or not. A training engine 402 uses the information in the data store to learn statistics or parameters of a model of credit card transaction behavior in a similar manner as described above with reference to FIC. 1 . The results are stored in the data store 403 and used by a prediction engine in real time 401 to predict whether a new credit card transaction is likely to be fraudulent. The prediction results are used by a transaction alert mechanism 404 which may provide output to the credit card transaction systems Exemplary training method
[001 6] FIC. 5 is a block diagram of an example method of training carried out at a training engine such as any of the training engines of FICs. 1 to 4. [001 7] A set of variables are received describing an event (block 500). For example, these variables are from historical data about past events and their outcomes. The variables received at the training engine may be received from a data store such as any of the data stores of FICs. 1 to 4. Also received at the training engine is information about an outcome of the event (block 501 ).
[001 8] A plurality of features describing or associated with events are pre- specified and for each of these features one or more variables can exist. For example, in the case of internet advertising, an example of a feature may be a time of day of a search query input by a user and resulting in display of an advertisement. Each variable is considered as having an associated weight and information about those weights is learnt during the training process. The weights are used to control how much influence each variable may have on the prediction to be made. Belief about each weight is modeled using any suitable distribution such as a Gaussian distribution and statistics are used to describe those distributions. For example, a mean and a standard deviation are used to describe a Gaussian distribution representing belief about a given weight. However, it is not essential to use a Gaussian distribution; other types of distribution may be used. Also, other statistics may be used instead of or in addition to the mean and standard deviation.
[001 9] For each variable received for the given event, the training engine accesses statistics describing belief about a weight for the variable (block 502). For example, if the training process has not encountered the particular variables before, the statistics are given default, initial values. Otherwise, the statistics are accessed from the data store. [0020] The statistics are then updated on the basis of the received information and using a Bayesian update process (block 503). An example of a suitable Bayesian update process is described in more detail below. However, it is not essential to use that exact update process, any suitable Bayesian update process may be used. [0021 ] The updated statistics are stored (block 504) for example in a data store such as any of those of FICs. 1 to 4. A decision is then made by the training engine as to whether to carry out pruning (block 505). The pruning process involves discarding some of the statistics because it is typically not practical to store all these due to the huge amounts of data involved (for example, terabytes of information). The pruning process may be carried out at specified time intervals, or when memory availability is running low or when any combination of these or other conditions occur. If the decision is made not to carry out pruning, then training continues for another set of variables associated with another observed event. For example, in the field of internet advertising, hundreds of million advertisements may be shown in any 24 hour period.
[0022] If the pruning process occurs then statistics are discarded (block 506) for some of the weights on the basis of a pruning decision process which is described in more detail below. If the training process is to end (block 507) the remaining statistics are stored (block 508) otherwise the training process repeats for another set of variables describing another observed event.
[0023] The training process may be carried out offline, or during operation of the prediction process to predict event outcomes. A combination of offline training and online training may also be used. [0024] It is also possible for the training process to be carried out using indicator variables as opposed to general variables taking real values. For example, there could be twenty four indicator variables for the time of day feature, one indicator variable for each hour of the day. In this case, only one indicator variable may be "on" for a given event because the event occurs at some point during only one hour of the day. When indicator variables are used, each indicator variable is considered as having an associated weight and information about those weights is learnt during the training process as described above with reference to FIC. 5. An example prediction method
[0025] Given a proposed event it is possible to predict an outcome for that event as now described with reference to FIC. 6. The prediction engine receives a set of variables for the proposed event (block 600). The prediction engine accesses, for each variable, stored statistics describing belief about values of a weight (block 601 ). For example, this information is accessed from a data store such as any of those data stores shown in FICs. 1 to 4. The stored statistics have been formed during the training process or, if unavailable, are initialized to default values. The statistics of the weights are combined for example and not exclusively in a way that may be consistent with a linear combination of the weights (block 602) and are then mapped to a number representing the probability that the proposed event will have a specified outcome (block 603). The mapping process may comprise using any suitable function. A non-exhaustive list of examples is: inverse probit function, logit function or other link function. An inverse probit function and a logit function are examples of link functions.
[0026] The probability information for the proposed event is then stored (block
604). The probability information may then be used in any suitable manner to control a system. The method of FIC. 6 may also be used with indicator variables in place of the general variables taking real values. [0027] For example, in the case of an internet advertising system, probability information for a proposed advertisement being clicked is accessed (FIC. 7, block 700) a bid is received from an advertiser for the advertisement (block 701 ) and a price for the advertisement (should it be clicked) is calculated on the basis of the bid and the probability information (block 702) and possibly other information. The price is then stored and the advertiser billed as appropriate (block 703).
[0028] In another example, the probability information may relate to an internet advertisement being clicked and that click resulting in a sale or other successful outcome for the advertiser. This is referred to as a successful conversion of the internet advertisement into a sale or other successful outcome for the advertiser. In this case the process of FIC. 7 is similar and the price is calculated on the basis of the bid and the probability of successful conversion.
[0029] In another example (see FIC. 8) the probability information relates to whether a proposed email is spam or not. The probability information is accessed (block 800) by the anti-spam system and compared with one or more specified thresholds (block 801 ). The anti-spam system then blocks the email, alerts a user or allows the email on the basis of the comparison (block 802).
[0030] In another example (see FIC. 9) the probability information relates to whether a credit card transaction is fraudulent or not. The probability information is accessed (block 900) and compared with one or more specified thresholds (block 901 ). The anti-fraud system then blocks the transaction, allows the transaction and/or triggers alerts on the basis of this comparison (block 902).
[0031 ] As mentioned above the methods described herein comprise modeling belief about weights for variables describing factors relating to an event. Any suitable model may be used. For example, a probability distribution is used to model the belief. A bell-curve belief distribution such as a Gaussian distribution may be used, or any other suitable probability distribution. For example, a bimodal or skewed distribution. [0032] Statistics describing the distribution are used in the models as mentioned above. For example, in the case that a Gaussian distribution is used, its mean μ and standard deviation σ may be selected.
[0033] In the case that a Gaussian distribution is used, for example, to model belief about a value of a weight, the area under the distribution curve within a certain range corresponds to the belief that the weight value will lie in that range. As the prediction system learns more about a weight the standard deviation of the distribution tends to become smaller, more tightly bracketing the system's belief about the value of that weight.
Example of update mechanism
[0034] As mentioned above, the update mechanism may use techniques based on
Bayes' law. In the case of an event comprising presentation of an advertisement which is clicked, then an example update rule is as follows:
Figure imgf000011_0001
Figure imgf000012_0001
[0035] In the case of an event comprising presentation of an advertisement which is not clicked, then an example update rule is as follows:
Figure imgf000012_0002
[0036] In these equations C is given by:
C = ∑σ^ + β> i=l
[0037] In some embodiments the value of x in the above update equations is either 0 or 1 depending on whether an indicator variable is "on" or not as mentioned above. That is, in some embodiments, indicator variables are grouped into N groups with one group per feature. For example, an example feature may be the age of an end user (advertisement viewer, email receiver, credit card transaction party etc.). In this case a plurality of indicator variables for the feature may be age ranges, for example, 0 to 9, 1 0 to 1 9, 20 to 29, 30 to 39 etc. However, for a given event only one of the age ranges may be on. That is, an end user's age is only present in one of the bins. In this case 0 and 1 may be used to represent whether an indicator variable is on or not. By using groups of indicator variables in this way it is possible to reduce processing and memory requirements, which is especially important in many applications where the quantities of data to be analyzed are huge. However, it is not essential to use groups of indicator variables where only one indicator variable may be on in any one group. In this case x in the above equations may have values other than 0 or 1 . [0038] In these equations, the only unknown is β2 which is the variance of the feedback around the weight of each variable, β2 is thus a configurable parameter and for example is set to 1 . The functions v and w are given by: v(t) = N(t) / F(t) w(t) = v(t) * (v(t) - t)
[0039] Where the symbols N and F represent the density of the Gaussian distribution function and the cumulative distribution function of the Gaussian, respectively. The symbol t is simply an argument to the functions. Any suitable numerical or analytic methods can be used to evaluate these functions such as those described in Press et al., Numerical Recipes in C: the Art of Scientific Computing (2nd. Ed.), Cambridge, Cambridge University Press, ISBN -00521 -431 08-5. [0040] These update equations can be thought of as Bayesian update equations.
They receive a set of variables (which may be either indicator variables or general variables taking real values) describing an observed event together with event outcome information. The equations update the values of the mean and standard deviation for each weight in light of the data, assuming that the posterior distribution over the weights is again Gaussian. With a single pass over the training data this procedure is referred to as Gaussian density filtering and more generally as assumed density filtering (ADF). It is also possible to use expectation propagation (EP) whereby ADF is iterated to convergence. Use of Expectation Propagation is described in detail in "A family of algorithms for approximate Bayesian inference" 2001 , Thomas Minka, MIT PhD thesis. This may give a more exact solution but requires more computational resources. [0041 ] The statistics (mean and standard deviation) may be stored in any suitable manner. For example, using vectors. Learning the distribution for observed data over such a vector of statistics for the weights is a computationally difficult task and the assumed density filtering technique enables a solution to be obtained. [0042] Given a value of the mean and standard deviation for each weight, the predicted probability of outcome A for a given event is given by:
P(A I event) =
Figure imgf000013_0001
[0043] The sums are over all the features weighted by feature values for the given event. The function Φ(x) is the cumulative normal distribution function which is also known as the inverse probit function. However, it is also possible to use other mapping functions Φ(x) here such as a logit function or other link function. [0044] For example, given a known set of weights a prediction for a particular proposed event may be made by adding the weights of all the variables for the event. The resulting sum is a real number. An inverse probit function may be used to map this number to a probability between 0.0 and 1 .0. [0045] Since many of the features used in the prediction process may take very many values (variables) the methods described herein are arranged to keep track of only those weights which actually affect the prediction. As mentioned above, weights are initialized to a common prior and pruning is carried out at intervals to eliminate those weight parameters that have remained close to the prior. This is now described in more detail with reference to FIC. 1 0. [0046] FIC. 1 0 is a block diagram of an example method of setting initial values for weight statistics and also of pruning. This method may be carried out as part of the training process of FIC. 5 for example.
[0047] During the training process, if the training engine is presented with variables for an event where it has not previously seen those variables, it sets initial values of weight statistics for those unseen variables (block 1 000). These initial values may be referred to as the prior. In some examples, the means are all initialized to 0.0 except for a "dummy" mean μ0 which is set to a specified value in order to provide a bias (block 1 001 ). For example, this dummy or biasing mean is set such that the a-priori prediction probability is a specified value such as 0.02 = 2% or any other suitable value. In the case of internet "paid search" advertising, where one might assume that around 2% of all displayed adverts are clicked, the a-priori prediction probability is appropriately 2%. However, this biasing mean and an associated biasing variance may be set at other values depending on the particular application, and can be learnt from a separate set of training data.. When a previously unseen variable is introduced, this may inappropriately influence the prediction results. The biasing mean may be used to prevent or reduce the effects of this. The following equation may be used to determine an appropriate initial value for the biasing mean.
Φ(p(A I event))- J∑xtf + β2 = μbia
[0048] In some examples, where indicator variables are used, the biasing mean and variance may be associated with an indicator variable which is always on and which may be referred to as a bias indicator variable. As mentioned above, the biasing mean and variance may be learnt. Since all observations help in this learning process it is relatively fast. [0049] The standard deviation values for previously unseen variables are distributed equally so that for example ∑,σ,2 = 1 .0. Other values for the sum of the variances can be chosen by appropriately tuning on a separate set of data during training time. For example, different values of σ,2 may lead to a slightly different learning behavior. Larger variances tend to result in faster adaptation and smaller variances in more conservative updates. The variances may be chosen differently for different variables.
[0050] The training engine proceeds to update the statistics during the training process (block 1 002) as described above. If the pruning process is entered, then, for a given variable, the weight statistics are reset to their initial values (re-initialized) and an assessment is made about the impact of this reset on the prediction performance (block 1 003). For example, in some embodiments this is achieved by computing a difference Δ, as follows:
Figure imgf000015_0001
[0051 ] If this difference is less than a specified value such as 0.01 % then the weight statistics for this variable are discarded (re-initialized). [0052] In another embodiment a Kullback- Leibler divergence may be used to make this assessment. In this case the following equation is used where p is the first term in the difference calculation above and q is the second term in the difference calculation above. KL(p,q) = PlOg[ + (I- p)log[
[0053] The pruning process then reverts to the previous weight statistics or continues with the reset values depending on the impact assessment (block 1 004). An optional check for memory availability is made (1 005) for example, if the pruning process is carried out only until memory availability is sufficient to continue the training process. The pruning process then repeats for another variable (block 1 006). [0054] The methods described above with reference to FIC. 1 0 may also be used with indicator variables in place of the general variables taking real values. [0055] As mentioned above, a plurality of specified features are used during the training and prediction process. The particular features chosen depend on the particular application concerned whether it be internet advertising, credit card fraud detection or other applications. In addition, the features may be selected by making offline analysis of the training data in order to select those features which are most effective for use in the prediction process. [0056] In some embodiments the event prediction system is used in the field of internet advertising. For example, it may be used to predict not only whether a displayed advertisement will be clicked or not, but also whether any click is likely to result in a successful conversion for the advertiser. In this case the probability that a conversion will occur given a proposed event X may be given as follows: P(conversion = True | X)
= P(conversion = True | click= True, X) P(click = True |X) + P(conversion = True I click= False.X) P(click = False|X)
= P(conversion = True | click= True, X) P(click = True|X) In the above, line 2 follows from line 1 since P(conversion=True|click=False,X)=0, i.e., there can only be a conversion if there was a click.
[0057] In this case the methods described herein may be used to predict the probability that a click will occur P(click=T|X) for a proposed advertisement. The methods described herein may also be used to predict the probability that a conversion will occur given a click. In this case training data comprising information about clicks that have resulted in successful conversions is required. In this way the probability of a successful conversion may be predicted. Exemplary Computing-Based Device [0058] FIC. 1 1 illustrates various components of an exemplary computing-based device 1 1 00 which may be implemented as any form of a computing and/or electronic device, and in which embodiments of an event prediction system may be implemented. [0059] The computing-based device 1 1 00 comprises one or more inputs 1 1 02 which are of any suitable type for receiving media content, Internet Protocol (IP) input, information about email, information about internet advertisements, information about credit card transactions, information about events whose outcomes are to be predicted etc. Also provided is an output 1 1 03 for providing output comprising at least prediction results to another system for controlling that system. [0060] Computing-based device 1 1 00 also comprises one or more processors 1 1 01 which may be microprocessors, controllers or any other suitable type of processors for processing computing executable instructions to control the operation of the device in order to predict outcomes of events. Platform software comprising an operating system 1 1 05 or any other suitable platform software may be provided at the computing- based device to enable application software 1 1 06 to be executed on the device. [0061 ] The computer executable instructions may be provided using any computer-readable media, such as memory 1 1 07. The memory is of any suitable type such as random access memory (RAM), a disk storage device of any type such as a magnetic or optical storage device, a hard disk drive, or a CD, DVD or other disc drive. Flash memory, EPROM or EEPROM may also be used. [0062] A display interface 1 1 04 may be provided such as an audio and/or video output to a display system integral with or in communication with the computing-based device. The display system may provide a graphical user interface, or other user interface of any suitable type although this is not essential. [0063] The term 'computer1 is used herein to refer to any device with processing capability such that it can execute instructions. Those skilled in the art will realize that such processing capabilities are incorporated into many different devices and therefore the term 'computer1 includes PCs, servers, mobile telephones, personal digital assistants and many other devices.
[0064] The methods described herein may be performed by software in machine readable form on a storage medium. The software can be suitable for execution on a parallel processor or a serial processor such that the method steps may be carried out in any suitable order, or simultaneously.
[0065] This acknowledges that software can be a valuable, separately tradable commodity. It is intended to encompass software, which runs on or controls "dumb" or standard hardware, to carry out the desired functions. It is also intended to encompass software which "describes" or defines the configuration of hardware, such as HDL (hardware description language) software, as is used for designing silicon chips, or for configuring universal programmable chips, to carry out desired functions. [0066] Those skilled in the art will realize that storage devices utilized to store program instructions can be distributed across a network. For example, a remote computer may store an example of the process described as software. A local or terminal computer may access the remote computer and download a part or all of the software to run the program. Alternatively, the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network). Those skilled in the art will also realize that by utilizing conventional techniques known to those skilled in the art that all, or a portion of the software instructions may be carried out by a dedicated circuit, such as a DSP, programmable logic array, or the like. [0067] Any range or device value given herein may be extended or altered without losing the effect sought, as will be apparent to the skilled person.
[0068] It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. It will further be understood that reference to 'an1 item refers to one or more of those items. [0069] The steps of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. Additionally, individual blocks may be deleted from any of the methods without departing from the spirit and scope of the subject matter described herein. Aspects of any of the examples described above may be combined with aspects of any of the other examples described to form further examples without losing the effect sought. [0070] It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art. The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments of the invention. Although various embodiments of the invention have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this invention.

Claims

1 . A method of predicting the outcome of a proposed event comprising: receiving (600) a plurality of variables describing the proposed event; for each variable, accessing (601 ) stored statistics describing belief about values of a weight, the stored statistics having been learnt using a machine learning process comprising assumed density filtering; combining (602) the statistics ; mapping (603) the combined statistics into a number representing the probability of the proposed event having a specified outcome by using a link function; and storing (604) the probability information for the proposed event.
2. A method as claimed in claim 1 which further comprises using the probability information to control a system selected from any of: an internet advertising system, a credit card fraud detection system, an email filtering system, a credit scoring system, a search engine, a binary classification system and an information filtering system.
3. A method as claimed in claim 1 wherein the step of receiving variables comprises receiving indicator variables where each indicator variable may take only one of two possible values to indicate whether it is on.
4. A method as claimed in claim 3 wherein the step of receiving the indicator variables comprises receiving indicator variables, each indicator variable being a member of a group and each group being associated with a specified feature from a plurality of specified features describing events of which the proposed event is an instance.
5. A method as claimed in claim 4 wherein the step of receiving the proposed indicator variables comprises receiving information about indicator variables that are on and where only one indicator variable may be on per group.
6. A method as claimed in claim 1 which further comprises learning the stored statistics using a machine learning process.
7. A method as claimed in claim 6 which further comprises updating the statistics in the light of observed data and using a Gaussian density filtering process.
8. A method as claimed in claim 6 which further comprises carrying out a pruning process in order to discard at least some of the stored statistics.
9. A method as claimed in claim 8 wherein the pruning process comprises assessing, for a particular variable, how much influence those stored statistics have on accuracy of the probability information.
1 0. A method as claimed in claim 6 which further comprises, for previously unseen variables, initializing statistics to default values.
1 1 . A method of predicting the outcome of a proposed event comprising: carrying out a training process using assumed density filtering in order to learn statistics describing belief about values of weights; receiving (600) a plurality of variables describing the proposed event; for each variable, accessing (601 ) statistics from the training process describing belief about values of a weight; combining (602) the statistics; mapping (603) the combined statistics into a number representing the probability of the proposed event having a specified outcome by using a link function; and storing (604) the probability information for the proposed event.
1 2. A method as claimed in claim 1 1 which further comprises using the probability information to control a system selected from any of: an internet advertising system, a credit card fraud detection system, an email filtering system, a credit scoring system, a search engine, a binary classification system and an information filtering system.
1 3. A method as claimed in claim 1 1 wherein the step of receiving the variables comprises receiving indicator variables where each indicator variable may take only one of two possible values to indicate whether it is on.
14. A method as claimed in claim 1 3 wherein the step of receiving the indicator variables comprises receiving indicator variables, each indicator variable being a member of a group and each group being associated with a specified feature from a plurality of specified features describing events of which the proposed event is an instance.
1 5. A method as claimed in claim 1 1 wherein the training process comprises a pruning process whereby at least some of the learnt statistics are discarded on the basis of an assessment of the impact of discarding those statistics on accuracy of the probability information.
1 6. A method as claimed in claim 1 1 wherein the training process comprises using Gaussian density filtering.
1 7. A method as claimed in claim 1 1 wherein the training process comprises using expectation propagation.
1 8. A method as claimed in claim 1 1 wherein the proposed event is display of an internet advertisement and wherein the probability information is related to the probability that if a proposed internet advertisement is clicked, that a conversion will result for an associated advertiser.
1 9. A method as claimed in claim 1 1 wherein the proposed event is display of an internet advertisement and wherein the probability information is related to the probability that a proposed internet advertisement will be clicked.
20. One or more device-readable media with device-executable instructions for performing steps comprising: receiving (600) a plurality of variables describing a proposed event; for each variable, accessing (601 ) stored statistics describing belief about values of a weight, the stored statistics having been learnt using a machine learning process comprising assumed density filtering; combining (602) the statistics; mapping (603) the combined statistics into a number representing the probabilityroposed event having a specified outcome by using a link function; and storing (604) the probability information for the proposed event.
PCT/US2008/072245 2007-08-08 2008-08-05 Event prediction WO2009020976A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP08797215A EP2176787A4 (en) 2007-08-08 2008-08-05 Event prediction

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/835,985 2007-08-08
US11/835,985 US20090043593A1 (en) 2007-08-08 2007-08-08 Event Prediction

Publications (1)

Publication Number Publication Date
WO2009020976A1 true WO2009020976A1 (en) 2009-02-12

Family

ID=40341687

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/072245 WO2009020976A1 (en) 2007-08-08 2008-08-05 Event prediction

Country Status (3)

Country Link
US (1) US20090043593A1 (en)
EP (1) EP2176787A4 (en)
WO (1) WO2009020976A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200118162A1 (en) * 2018-10-15 2020-04-16 Affle (India) Limited Method and system for application installation and detection of fraud in advertisement
US10956944B1 (en) * 2009-02-27 2021-03-23 Google Llc Generating a proposed bid

Families Citing this family (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8635103B1 (en) * 2008-03-11 2014-01-21 Google Inc. Contextual targeting prediction
US20100049586A1 (en) * 2008-08-21 2010-02-25 Yahoo! Inc. Method for determining an advertising slate based on an expected utility
US9547865B2 (en) * 2009-03-30 2017-01-17 Ebay Inc. System and method for providing advertising server optimization for online computer users
US9495460B2 (en) 2009-05-27 2016-11-15 Microsoft Technology Licensing, Llc Merging search results
US9841282B2 (en) * 2009-07-27 2017-12-12 Visa U.S.A. Inc. Successive offer communications with an offer recipient
US9443253B2 (en) 2009-07-27 2016-09-13 Visa International Service Association Systems and methods to provide and adjust offers
US10546332B2 (en) 2010-09-21 2020-01-28 Visa International Service Association Systems and methods to program operations for interaction with users
US20110029367A1 (en) 2009-07-29 2011-02-03 Visa U.S.A. Inc. Systems and Methods to Generate Transactions According to Account Features
US20110035278A1 (en) 2009-08-04 2011-02-10 Visa U.S.A. Inc. Systems and Methods for Closing the Loop between Online Activities and Offline Purchases
US20110035280A1 (en) 2009-08-04 2011-02-10 Visa U.S.A. Inc. Systems and Methods for Targeted Advertisement Delivery
WO2011019759A2 (en) * 2009-08-10 2011-02-17 Visa U.S.A. Inc. Systems and methods for targeting offers
US9031860B2 (en) 2009-10-09 2015-05-12 Visa U.S.A. Inc. Systems and methods to aggregate demand
US9342835B2 (en) 2009-10-09 2016-05-17 Visa U.S.A Systems and methods to deliver targeted advertisements to audience
US8595058B2 (en) 2009-10-15 2013-11-26 Visa U.S.A. Systems and methods to match identifiers
US20110093324A1 (en) 2009-10-19 2011-04-21 Visa U.S.A. Inc. Systems and Methods to Provide Intelligent Analytics to Cardholders and Merchants
US8676639B2 (en) 2009-10-29 2014-03-18 Visa International Service Association System and method for promotion processing and authorization
US8626705B2 (en) 2009-11-05 2014-01-07 Visa International Service Association Transaction aggregator for closed processing
US20110125565A1 (en) 2009-11-24 2011-05-26 Visa U.S.A. Inc. Systems and Methods for Multi-Channel Offer Redemption
US8738418B2 (en) 2010-03-19 2014-05-27 Visa U.S.A. Inc. Systems and methods to enhance search data with transaction based data
US8639567B2 (en) 2010-03-19 2014-01-28 Visa U.S.A. Inc. Systems and methods to identify differences in spending patterns
US9697520B2 (en) 2010-03-22 2017-07-04 Visa U.S.A. Inc. Merchant configured advertised incentives funded through statement credits
US9261375B2 (en) 2010-04-01 2016-02-16 International Business Machines Corporation Anomaly detection for road user charging systems
US9471926B2 (en) 2010-04-23 2016-10-18 Visa U.S.A. Inc. Systems and methods to provide offers to travelers
US8359274B2 (en) 2010-06-04 2013-01-22 Visa International Service Association Systems and methods to provide messages in real-time with transaction processing
US8265778B2 (en) 2010-06-17 2012-09-11 Microsoft Corporation Event prediction using hierarchical event features
US8904149B2 (en) 2010-06-24 2014-12-02 Microsoft Corporation Parallelization of online learning algorithms
US8781896B2 (en) 2010-06-29 2014-07-15 Visa International Service Association Systems and methods to optimize media presentations
US9760905B2 (en) 2010-08-02 2017-09-12 Visa International Service Association Systems and methods to optimize media presentations using a camera
US9972021B2 (en) 2010-08-06 2018-05-15 Visa International Service Association Systems and methods to rank and select triggers for real-time offers
US20120053995A1 (en) * 2010-08-31 2012-03-01 D Albis John Analyzing performance and setting strategic targets
US9679299B2 (en) 2010-09-03 2017-06-13 Visa International Service Association Systems and methods to provide real-time offers via a cooperative database
US10055745B2 (en) 2010-09-21 2018-08-21 Visa International Service Association Systems and methods to modify interaction rules during run time
US9477967B2 (en) 2010-09-21 2016-10-25 Visa International Service Association Systems and methods to process an offer campaign based on ineligibility
US10318877B2 (en) 2010-10-19 2019-06-11 International Business Machines Corporation Cohort-based prediction of a future event
US9558502B2 (en) 2010-11-04 2017-01-31 Visa International Service Association Systems and methods to reward user interactions
US10007915B2 (en) 2011-01-24 2018-06-26 Visa International Service Association Systems and methods to facilitate loyalty reward transactions
US8370319B1 (en) * 2011-03-08 2013-02-05 A9.Com, Inc. Determining search query specificity
US10438299B2 (en) 2011-03-15 2019-10-08 Visa International Service Association Systems and methods to combine transaction terminal location data and social networking check-in
WO2012162485A2 (en) * 2011-05-26 2012-11-29 Causata, Inc. Real-time adaptive binning
WO2013012898A2 (en) 2011-07-19 2013-01-24 Causata Inc. Distributed scalable incrementally updated models in decisioning systems
US10223707B2 (en) 2011-08-19 2019-03-05 Visa International Service Association Systems and methods to communicate offer options via messaging in real time with processing of payment transaction
US9466075B2 (en) 2011-09-20 2016-10-11 Visa International Service Association Systems and methods to process referrals in offer campaigns
US8924318B2 (en) 2011-09-28 2014-12-30 Nice Systems Technologies Uk Limited Online asynchronous reinforcement learning from concurrent customer histories
US8914314B2 (en) * 2011-09-28 2014-12-16 Nice Systems Technologies Uk Limited Online temporal difference learning from incomplete customer interaction histories
US10380617B2 (en) 2011-09-29 2019-08-13 Visa International Service Association Systems and methods to provide a user interface to control an offer campaign
US10290018B2 (en) 2011-11-09 2019-05-14 Visa International Service Association Systems and methods to communicate with users via social networking sites
US10497022B2 (en) 2012-01-20 2019-12-03 Visa International Service Association Systems and methods to present and process offers
US10672018B2 (en) 2012-03-07 2020-06-02 Visa International Service Association Systems and methods to process offers via mobile devices
US9092566B2 (en) 2012-04-20 2015-07-28 International Drug Development Institute Methods for central monitoring of research trials
US10387911B1 (en) 2012-06-01 2019-08-20 Integral Ad Science, Inc. Systems, methods, and media for detecting suspicious activity
US8868525B2 (en) 2012-08-24 2014-10-21 Facebook, Inc. Distributed information synchronization
US9208189B2 (en) * 2012-08-24 2015-12-08 Facebook, Inc. Distributed request processing
US9705829B2 (en) 2012-12-07 2017-07-11 Linkedin Corporation Communication systems and methods
US10360627B2 (en) 2012-12-13 2019-07-23 Visa International Service Association Systems and methods to provide account features via web based user interfaces
BR112015021758B1 (en) * 2013-03-06 2022-11-16 Arthur J. Zito Jr MULTIMEDIA PRESENTATION SYSTEMS, METHODS FOR DISPLAYING A MULTIMEDIA PRESENTATION, MULTIMEDIA PRESENTATION DEVICE AND HARDWARE FOR PRESENTING PERCEPTABLE STIMULUS TO A HUMAN OR CREATURE SPECTATOR
US9727882B1 (en) * 2013-06-21 2017-08-08 Amazon Technologies, Inc. Predicting and classifying network activity events
US10489754B2 (en) 2013-11-11 2019-11-26 Visa International Service Association Systems and methods to facilitate the redemption of offer benefits in a form of third party statement credits
US10419379B2 (en) 2014-04-07 2019-09-17 Visa International Service Association Systems and methods to program a computing system to process related events via workflows configured using a graphical user interface
US10354268B2 (en) 2014-05-15 2019-07-16 Visa International Service Association Systems and methods to organize and consolidate data for improved data storage and processing
US10650398B2 (en) 2014-06-16 2020-05-12 Visa International Service Association Communication systems and methods to transmit data among a plurality of computing systems in processing benefit redemption
US10438226B2 (en) 2014-07-23 2019-10-08 Visa International Service Association Systems and methods of using a communication network to coordinate processing among a plurality of separate computing systems
US11210669B2 (en) 2014-10-24 2021-12-28 Visa International Service Association Systems and methods to set up an operation at a computer system connected with a plurality of computer systems via a computer network using a round trip communication of an identifier of the operation
TWI549076B (en) * 2014-11-06 2016-09-11 宏碁股份有限公司 Electronic devices and service management methods thereof
US9691085B2 (en) 2015-04-30 2017-06-27 Visa International Service Association Systems and methods of natural language processing and statistical analysis to identify matching categories
US9443002B1 (en) * 2015-07-10 2016-09-13 Grand Rapids, Inc. Dynamic data analysis and selection for determining outcomes associated with domain specific probabilistic data sets
US10467654B2 (en) * 2015-09-04 2019-11-05 Oracle International Corporation Forecasting customer channel choice using cross-channel loyalty
US11341446B2 (en) 2016-06-14 2022-05-24 International Business Machines Corporation Personalized behavior-driven dynamic risk management with constrained service capacity
US20180336561A1 (en) * 2017-05-17 2018-11-22 Mastercard International Incorporated Spend-profile based transaction value limits for pin-less contactless payment-card authorizations
US10521725B2 (en) 2017-06-12 2019-12-31 Vicarious Fpc, Inc. Systems and methods for event prediction using schema networks
CN108229964B (en) * 2017-12-25 2021-04-02 同济大学 Transaction behavior profile construction and authentication method, system, medium and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020016699A1 (en) * 2000-05-26 2002-02-07 Clive Hoggart Method and apparatus for predicting whether a specified event will occur after a specified trigger event has occurred
EP1197899A1 (en) * 2000-05-26 2002-04-17 Ncr International Inc. Method and apparatus for determining one or more statistical estimators of customer behaviour
US6907566B1 (en) * 1999-04-02 2005-06-14 Overture Services, Inc. Method and system for optimum placement of advertisements on a webpage
US7050868B1 (en) * 2005-01-24 2006-05-23 Microsoft Corporation Bayesian scoring

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1205877A1 (en) * 2000-11-14 2002-05-15 Honda R&D Europe (Deutschland) GmbH Approximate fitness functions
US7424409B2 (en) * 2001-02-20 2008-09-09 Context-Based 4 Casting (C-B4) Ltd. Stochastic modeling of time distributed sequences
US7392199B2 (en) * 2001-05-01 2008-06-24 Quest Diagnostics Investments Incorporated Diagnosing inapparent diseases from common clinical tests using Bayesian analysis
US7490071B2 (en) * 2003-08-29 2009-02-10 Oracle Corporation Support vector machines processing system
US7565370B2 (en) * 2003-08-29 2009-07-21 Oracle International Corporation Support Vector Machines in a relational database management system
US7223234B2 (en) * 2004-07-10 2007-05-29 Monitrix, Inc. Apparatus for determining association variables
US20060248035A1 (en) * 2005-04-27 2006-11-02 Sam Gendler System and method for search advertising
US7505866B2 (en) * 2006-05-22 2009-03-17 The University Of Kansas Method of classifying data using shallow feature selection
US20080033810A1 (en) * 2006-08-02 2008-02-07 Yahoo! Inc. System and method for forecasting the performance of advertisements using fuzzy systems
US7774227B2 (en) * 2007-02-23 2010-08-10 Saama Technologies, Inc. Method and system utilizing online analytical processing (OLAP) for making predictions about business locations
US20080249832A1 (en) * 2007-04-04 2008-10-09 Microsoft Corporation Estimating expected performance of advertisements

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6907566B1 (en) * 1999-04-02 2005-06-14 Overture Services, Inc. Method and system for optimum placement of advertisements on a webpage
US20020016699A1 (en) * 2000-05-26 2002-02-07 Clive Hoggart Method and apparatus for predicting whether a specified event will occur after a specified trigger event has occurred
EP1197899A1 (en) * 2000-05-26 2002-04-17 Ncr International Inc. Method and apparatus for determining one or more statistical estimators of customer behaviour
US7050868B1 (en) * 2005-01-24 2006-05-23 Microsoft Corporation Bayesian scoring

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2176787A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10956944B1 (en) * 2009-02-27 2021-03-23 Google Llc Generating a proposed bid
US11823236B1 (en) 2009-02-27 2023-11-21 Google Llc Generating a proposed bid
US20200118162A1 (en) * 2018-10-15 2020-04-16 Affle (India) Limited Method and system for application installation and detection of fraud in advertisement

Also Published As

Publication number Publication date
EP2176787A1 (en) 2010-04-21
US20090043593A1 (en) 2009-02-12
EP2176787A4 (en) 2012-10-17

Similar Documents

Publication Publication Date Title
US20090043593A1 (en) Event Prediction
US8417650B2 (en) Event prediction in dynamic environments
US20200151628A1 (en) Adaptive Fraud Detection
US8781915B2 (en) Recommending items to users utilizing a bi-linear collaborative filtering model
US8831754B2 (en) Event prediction using hierarchical event features
US11539716B2 (en) Online user behavior analysis service backed by deep learning models trained on shared digital information
US20080288328A1 (en) Content advertising performance optimization system and method
US20160189201A1 (en) Enhanced targeted advertising system
US11809577B2 (en) Application of trained artificial intelligence processes to encrypted data within a distributed computing environment
WO2020181907A1 (en) Decision-making optimization method and apparatus
US10552863B1 (en) Machine learning approach for causal effect estimation
US10685374B2 (en) Exploration for search advertising
US11797840B2 (en) Machine learning based approach for identification of extremely rare events in high-dimensional space
US11263660B2 (en) Attribution of response to multiple channels
US10699203B1 (en) Uplift modeling with importance weighting
KR102174608B1 (en) Apparatus for predicting loan defaults based on machine learning and method thereof
CN112269942B (en) Method, device and system for recommending object and electronic equipment
US11669759B2 (en) Entity resource recommendation system based on interaction vectorization
US20210406931A1 (en) Contextual marketing system based on predictive modeling of users of a system and/or service
CN112541669A (en) Risk identification method, system and device
WO2022060709A1 (en) Discriminative machine learning system for optimization of multiple objectives
Koren et al. Dynamic creative optimization in Verizon media native advertising
US11568289B2 (en) Entity recognition system based on interaction vectorization
Motte Mathematical models for large populations, behavioral economics, and targeted advertising
US20230145924A1 (en) System and method for detecting a fraudulent activity on a digital platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08797215

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2008797215

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2008797215

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE