US20140222744A1 - Applying Data Regression and Pattern Mining to Predict Future Demand - Google Patents
Applying Data Regression and Pattern Mining to Predict Future Demand Download PDFInfo
- Publication number
- US20140222744A1 US20140222744A1 US14/252,180 US201414252180A US2014222744A1 US 20140222744 A1 US20140222744 A1 US 20140222744A1 US 201414252180 A US201414252180 A US 201414252180A US 2014222744 A1 US2014222744 A1 US 2014222744A1
- Authority
- US
- United States
- Prior art keywords
- pattern
- pattern frequency
- predicted
- frequency value
- processing device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000005065 mining Methods 0.000 title claims description 24
- 238000000034 method Methods 0.000 claims abstract description 63
- 238000012545 processing Methods 0.000 claims abstract description 31
- 230000008569 process Effects 0.000 claims description 13
- 238000004519 manufacturing process Methods 0.000 claims description 10
- 238000000611 regression analysis Methods 0.000 claims description 6
- 230000026676 system process Effects 0.000 abstract 1
- 230000015654 memory Effects 0.000 description 11
- 238000004458 analytical method Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 238000007418 data mining Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000012552 review Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 238000005111 flow chemistry technique Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000002062 proliferating effect Effects 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
Definitions
- the present invention relates in general to the field of database analysis.
- the present invention relates to a system and method for database pattern mining operations for generating and evaluating association rules contained in database records.
- association relationships are used to discover association relationships in a database by identifying frequently occurring patterns in the database.
- association relationships or rules may be applied to extract useful information from large databases in a variety of fields, including selective marketing, market analysis and management applications (such as target marketing, customer relation management, market basket analysis, cross selling, market segmentation), risk analysis and management applications (such as forecasting, customer retention, improved underwriting, quality control, competitive analysis), fraud detection and management applications and other applications (such as text mining (news group, email, documents), stream data mining, web mining, DNA data analysis, etc.).
- association rules have been applied to model and emulate consumer purchasing activities. Association rules describe how often items are purchased together. For example, an association rule, “laptop speaker (80%),” states that four out of five customers that bought a laptop computer also bought speakers.
- the first step in generating association rules is to review a database of transactions to identify meaningful patterns (referred to as frequent patterns, frequent sets or frequent itemsets) in a transaction database, such as significant purchase patterns that appear as common patterns recurring among a plurality of customers.
- a threshold such as support and confidence parameters, or other guides to the data mining process.
- These guides are used to discover frequent patterns, i.e., all sets of itemsets that have transaction support above a predetermined minimum support S and confidence C threshold.
- Various techniques have been proposed to assist with identifying frequent patterns in transaction databases, including using “Apriori” algorithms to generate and test candidate sets, such as described by R.
- the association rules are generated by constructing the power set (set of all subsets) of the identified frequent sets, and then generating rules from each of the elements of the power set. For each rule, its meaningfulness (i.e., support, confidence, lift, etc.) is calculated and examined to see if it meets the required thresholds. For example, if a frequent pattern ⁇ A, B, C ⁇ is extracted—meaning that this set occurs more frequently than the minimum support S threshold in the set of transactions—then several rules can be generated from this set:
- a system and method are provided for generating more meaningful frequent set data by analyzing frequent pattern data over time to predict frequent pattern trends.
- frequent pattern trends may be derived by using frequent pattern generation techniques over discrete time slices of transaction data, and then processing the results using numerical calculation techniques, such as least-squares approximation or other higher order interpolation techniques, to extract trend information.
- numerical calculation techniques such as least-squares approximation or other higher order interpolation techniques
- the pattern mining review of the present invention may use regression techniques to analyze the change in frequency of patterns to predict future behavior by projecting the regression to calculate the expected value of a recommendation rule.
- frequent itemset information is accumulated on a constant time interval (week1, week2, week3, etc.) and is used in a regression analysis to make a prediction about future demand.
- FIG. 1 depicts an exemplary system for mining association rules from a transaction database.
- FIG. 2 is an exemplary chart comparison of demand prediction using an average pattern frequency technique and a data regression technique.
- FIG. 3 is a flowchart that schematically illustrates a process for applying data regression and pattern mining to predict future demand.
- An efficient database mining method and apparatus are described for processing frequent patterns from transaction databases by programmatically computing the trend of each pattern frequency over time to provide more accurate frequency prediction for use with generating and evaluating association rules. While various details are set forth in the following description, it will be appreciated that the present invention may be practiced without these specific details. In addition, selected aspects are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention. Some portions of the detailed descriptions provided herein are presented in terms of algorithms or operations on data within a computer memory. Such descriptions and representations are used by those skilled in the data processing arts to describe and convey the substance of their work to others skilled in the art.
- an algorithm refers to a self-consistent sequence of steps leading to a desired result, where a “step” refers to a manipulation of physical quantities which may, though need not necessarily, take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It is common usage to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. These and similar terms may be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.
- the system 10 comprises a data processing engine or system 30 coupled to a database 11 .
- the system 10 also includes an input device 20 where at least one condition of the association rules to be mined is input by a user.
- the input device 20 is used to input the conditions (i.e., support, confidence, lift, etc.) for the association rule to be mined.
- the output device may be a local display or printer device, or may be a remotely connected computer system, such as a client computer device or network-connected computer device.
- the system 10 (e.g., a private wide area network (WAN) or the Internet) includes a central server computer system 30 and one or more networked client or server computer systems that are connected to the network as an input device 20 and/or an output device 40 .
- Communication between central server computer system 30 and the networked computer systems typically occurs over a network, such as a public switched telephone network over asynchronous digital subscriber line (ADSL) telephone lines or high-bandwidth trunks, for example, communications channels providing Ti or OC3 service.
- Networked client computer system(s) typically access central server computer system 30 through a service provider, such as an internet service provider (“ISP”) by executing application specific software, commonly referred to as a browser, on the networked client computer systems.
- ISP internet service provider
- an attribute mapper 32 may be included for mapping a first data set to a second, highly granular data set as described more fully in U.S. patent application Ser. No. 10/870,360 (entitled “Attribute Based Association Rule Mining”), which is assigned to Trilogy Development Group and is hereby incorporated by reference in its entirety.
- a frequent pattern generator 34 is included for identifying frequent patterns occurring in the database 11 .
- the frequent pattern generator 34 may use FPGrowth techniques to identify frequent patterns in the transaction data 12 stored in the database 11 meeting the minimum support count input by the user.
- a rule generator 36 is included for generating association rules from the frequent pattern information, and an output device 40 is also provided for outputting the mined association rules.
- the database 11 may be connected to the attribute mapper 32 , frequent pattern generator 34 and/or rule generator 36 .
- transaction data 12 from the database 11 may be transformed by the attribute mapper 32 , passed directly to the frequent pattern generator 34 for processing to identify frequent patterns, and then passed to the rule generator 36 for rule generation.
- the attribute mapper 32 is provided for transforming generic item descriptors in the transaction database to provide more detailed item description information concerning various product attributes and/or qualities for the item. For example, part number information may be mapped into more granular product or attribute information identifying specific features of the product, where the specific product or attribute information may be presented as native values.
- the frequent pattern generator 34 all of the frequent patterns from the transaction data 12 in the database 11 are compiled, and the support of each frequent pattern may be obtained.
- the rule generator 36 at least one association rule is derived by using the frequent pattern information provided by the frequent pattern generator 34 .
- a broad variety of efficient algorithms for mining association rules have been developed in recent years, including algorithms based on the level-wise Apriori framework, TreeProjection and FPGrowth algorithms.
- a selected embodiment of the present invention balances the accuracy and confidence requirements with a frequent pattern generator module 34 that uses standard approaches to pattern frequency (such as pairwise association rule mining, Apriori, or the FP-growth algorithm) against smaller time slices of transaction data to programmatically compute the trend of each pattern's frequency using a variety of numerical calculation techniques, such as least-squares approximation or other higher order interpolation techniques.
- the future expected pattern frequency is computed by extrapolating the computed trend line into the future to yield a more accurate frequency prediction than the standard averaged frequency technique.
- Columns B-E show the pattern frequency of each processor per week.
- Column F shows the average pattern frequency which, for simplicity in this example, assumes that the number of Desktops sold in each week is constant.
- Column G shows the result of applying a linear least-squares approximation to the pattern frequencies and extrapolating to the fifth week.
- (Desktop, Processor A) is a pattern that is declining in frequency
- (Desktop, Processor B) is an emerging pattern.
- the improved accuracy of the frequency prediction technique of the present invention is readily demonstrated by charting the example comparison data, such as depicted in FIG. 2 which depicts an exemplary chart comparison of demand prediction using an average pattern frequency technique and a data regression technique. As shown in FIG.
- the dashed trend line 202 for Processor A is computed on the basis of the first four weeks of pattern frequency data for Processor A (plotted at single line 204 ) and is used to predict a pattern frequency of 30% at Week 5. If, instead, an averaging technique were used, the calculated value of 48.25% would result from the average of the first four weeks of pattern frequency data for Processor A (plotted at single line 204 ).
- the dotted trend line 206 for Processor B is computed on the basis of the first four weeks of pattern frequency data for Processor B (plotted at double line 208 ) and is used to predict a pattern frequency of 70% at Week 5, as compared to a calculated value of 51.75% based on the average of the first four weeks of pattern frequency data for Processor B (plotted at double line 208 ).
- the predicted pattern frequency values paint a much different and more accurate picture of the pattern frequency trends (70% for Processor B as compared to 30% for Processor A) than is provided by using averaging techniques (51.75% for Processor B as compared to 48.25% for Processor A).
- predicted values may be capped so that any predicted frequency of less than 0% is changed to 0%, and any predicted frequency of over 100% is changed to 100%.
- frequent patterns or itemsets may be constructed using database mining techniques to find interesting patterns from databases, such as association rules, correlations, sequences, episodes, classifiers, clusters and the like.
- the task of discovering and evaluating frequent patterns in a database of items is quite challenging, given that the search space is exponential in the number of items occurring in the database.
- the present invention discloses techniques for discovering more meaningful pattern frequency information by, for example, accumulating frequent itemset information on a constant or predetermined time interval and then using the information at this aggregate level in a regression analysis to make a prediction about future demand. The projected or predicted values may then be used to calculate or quantify an expected value of a recommendation rule that is based on the subject pattern being forecast.
- the database pattern mining may be implemented with a data processing system that processes transaction database information to provide a frequent set with attribute-based items identifying the purchased product, and to more efficiently generate association rules from the generated frequent set.
- data processing may be performed on computer system 10 (see FIG. 1 ) which may be found in many forms including, for example, mainframes, minicomputers, workstations, servers, personal computers, internet terminals, notebooks, wireless or mobile computing devices (including personal digital assistants), embedded systems and other information handling systems, which are designed to provide computing power to one or more users, either locally or remotely.
- a computer system 10 includes one or more microprocessor or central processing units (CPU) 38 , mass storage memory 11 and local RAM memory 31 .
- CPU central processing units
- the processor 38 in one embodiment, is a 32-bit or 64-bit microprocessor manufactured by Motorola, such as the 680X0 processor or microprocessor manufactured by Intel, such as the 80X86, or Pentium processor, or IBM. However, any other suitable single or multiple microprocessors or microcomputers may be utilized. Computer programs and data are generally stored as instructions and data in mass storage 11 until loaded into main memory 31 for execution. Main memory 31 may be comprised of dynamic random access memory (DRAM).
- DRAM dynamic random access memory
- the CPU 38 may be connected directly (or through an interface or bus) to a variety of peripheral and system components, such as a hard disk drive, cache memory, traditional I/O devices (such as display monitors, mouse-type input devices, floppy disk drives, speaker systems, keyboards, hard drive, CD-ROM drive, modems, printers), network interfaces, terminal devices, televisions, sound devices, voice recognition devices, electronic pen devices, and mass storage devices such as tape drives, hard disks, compact disk (“CD”) drives, digital versatile disk (“DVD”) drives, and magneto-optical drives.
- the peripheral devices usually communicate with the processor over one or more buses and/or bridges.
- an exemplary flow methodology 300 is illustrated for predicting or forecasting pattern frequency information by calculating an approximation curve based on historical pattern frequency information.
- the depicted methodology determines the single item count for each item in a transaction database (step 301 ), divides this information into predetermined date ranges for purposes of determining the pattern count values in each date range (step 303 ), and then calculates the predicted pattern frequency information based on historical pattern frequency data (step 305 ). While a variety of different calculation techniques may be used to calculate the predicted pattern frequency information, a selected embodiment performs a regression analysis (such as a least-squares approximation or other higher order interpolation technique). As will be appreciated, the methodology illustrated in FIG.
- FIG. 3 shows the steps for generating pattern frequency information within predetermined date ranges and for generating a forecast or prediction of a future pattern frequency value that may be used to evaluate and generate association rules from the items in the frequent set. While the methodology of the present invention may be thought of as performing the identified sequence of steps in the order depicted in FIG. 3 , the steps may also be performed in parallel, in a different order, or as independent operations that use historical pattern frequency data to make predictions about future demand for purposes of generating association rules therefrom.
- the description of the illustrative method 300 can begin at step 301 where the item count for each item in a transaction database is determined. This count information may be obtained by incrementing a count value (step 308 ) for each item I (step 306 ) in each transaction T (step 304 ) in the transaction database. The item count incrementation step 308 is repeated for all the items in a transaction (negative outcome from decision 310 ), and for all transactions in the database (negative outcome from decision 312 ), until the total count for each item in the transaction database is obtained (affirmative outcome from decisions 310 , 312 ).
- the pattern count for each predetermined date range may be determined at step 303 .
- the transaction database may be divided or parsed into predetermined date ranges at step 314 .
- item pairs (I, J) having a single item count that meets a minimum support threshold may be counted.
- the item pair count information may be obtained by incrementing an item pair count value (step 318 ) for each item pair (I, J) (step 316 ) in each transaction T in the date range D (step 314 ).
- the item pair count incrementation step 318 is repeated for all the item pairs in a transaction (negative outcome from decision 320 ), and for all transactions in the date range (negative outcome from decision 322 ), until the total count for each item pair in the transaction date range is obtained (affirmative outcome from decisions 320 , 322 ).
- the item pair counting process is repeated for each date range in the transaction database (negative outcome to decision 324 ) by incrementing the date range value (step 326 ), until all date ranges have been processed (affirmative outcome to decision 324 ).
- the pattern frequency for each item pair (I, J) in each date range is used to calculate a predicted pattern frequency at step 305 .
- each item pair in the transaction database meeting a minimum support threshold (step 328 ) is used to calculate a pattern frequency value for the item pair in each date range D (step 330 ), thereby generating historical pattern frequency data.
- This historical data may be processed to generate an approximation curve, such as by using interpolation techniques to derive a trend line based on the historical pattern frequency data and date range information (step 332 ).
- a predicted value of the pattern frequency for a given item pair is obtained (step 334 ).
- Additional processing may be performed when calculating the predicted pattern frequency, such as truncating or capping the predicted value to a predetermined range of values (e.g., 0%-100%) to address situations where the calculated predicted pattern frequency value exceeds the predetermined range of values.
- the prediction calculation process is repeated for each item pair in the transaction database (negative outcome to decision 338 ) until a pattern frequency forecast is calculated for each item pair (affirmative outcome to decision 338 ), at which time the process is finished (step 340 ).
- a computer-based methodology and system are provided for mining patterns from a transaction database.
- a first pattern in a transaction database is identified that meets a minimum support threshold requirement.
- a pattern frequency value for the first pattern is measured over a plurality of predetermined time intervals (e.g., a plurality of recent time intervals or constant time intervals) based on the number of times the first pattern occurs in the predetermined time intervals, and the measured pattern frequency values for the first pattern are then processed to calculate a predicted pattern frequency value for the first pattern.
- the processing of the pattern frequency values to calculate a predicted pattern frequency values can be done in any desired way, including but not limited to applying a linear least-squares approximation to the pattern frequency values and extrapolating to the predicted pattern frequency value for the first pattern.
- the prediction may be accomplished by computing a trend line based on the pattern frequency values and extrapolating the trend line to calculate the predicted frequency value for the first pattern.
- Yet another technique for processing of pattern frequency values is to use a regression analysis to calculate the predicted frequency value for the first pattern.
- the predicted pattern frequency value may be capped to an upper (and/or lower) limit to prevent the predicted pattern frequency value from exceeding the upper (and/or lower) limit.
- the predicted pattern frequency value may be used to calculate an expected value of a recommendation rule that is based the first pattern.
- the methods and systems for applying data regression and pattern mining to predict future demand as shown and described herein may be implemented in software stored on a computer-readable medium and executed as a computer program on a general purpose or special purpose computer to perform certain tasks.
- the software discussed herein may include script, batch, or other executable files.
- the software may be stored on a machine-readable or computer-readable storage medium, and is otherwise available to direct the operation of the computer system as described herein and claimed below.
- the software uses instructions and data stored in a local or database memory to implement the data regression and pattern mining techniques so as to improve the ability to predict the future pattern frequency for purposes of forecasting demand.
- the local or database memory used for storing firmware or hardware modules in accordance with an embodiment of the invention may also include a semiconductor-based memory, which may be permanently, removably or remotely coupled to a microprocessor system.
- a semiconductor-based memory which may be permanently, removably or remotely coupled to a microprocessor system.
- Other new and various types of computer-readable storage media may be used to store the modules discussed herein.
- those skilled in the art will recognize that the separation of functionality into modules is for illustrative purposes. Alternative embodiments may merge the functionality of multiple software modules into a single module or may impose an alternate decomposition of functionality of modules. For example, a software module for calling sub-modules may be decomposed so that each sub-module performs its function and passes control directly to another sub-module.
- the computer-based data processing system described above is for purposes of example only, and may be implemented in any type of computer system or programming or processing environment, or in a computer program, alone or in conjunction with hardware. It is contemplated that the present invention may be run on a stand-alone computer system, or may be run from a server computer system that can be accessed by a plurality of client computer systems interconnected over an intranet network, or that is accessible to clients over the Internet. In addition, many embodiments of the present invention have application to a wide range of industries including the following: computer hardware and software manufacturing and sales, professional services, financial services, automotive sales and manufacturing, telecommunications sales and manufacturing, medical and pharmaceutical sales and manufacturing, and construction industries.
Abstract
A data processing system processes transaction database information to predict future demand using data regression techniques to extract trend line information from historical pattern frequency values. By extrapolating the trend line, a predicted pattern frequency value may be calculated. By applying regression techniques (such as least-squares approximation), the trend line information may be extracted and projected to predict the future pattern frequency which may be applied to calculate the expected value of a recommendation rule.
Description
- This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Application No. 60/704,575, filed Aug. 2, 2005, entitled “Applying Data Regression and Pattern Mining to Predict Future Demand,” the entirety of which is incorporated herein by reference.
- 1. Field of the Invention
- The present invention relates in general to the field of database analysis. In one aspect, the present invention relates to a system and method for database pattern mining operations for generating and evaluating association rules contained in database records.
- 2. Description of the Related Art
- The ability of modern computers to assemble, record and analyze enormous amounts of data has created a field of database analysis referred to as data mining. Data mining is used to discover association relationships in a database by identifying frequently occurring patterns in the database. These association relationships or rules may be applied to extract useful information from large databases in a variety of fields, including selective marketing, market analysis and management applications (such as target marketing, customer relation management, market basket analysis, cross selling, market segmentation), risk analysis and management applications (such as forecasting, customer retention, improved underwriting, quality control, competitive analysis), fraud detection and management applications and other applications (such as text mining (news group, email, documents), stream data mining, web mining, DNA data analysis, etc.). For example, association rules have been applied to model and emulate consumer purchasing activities. Association rules describe how often items are purchased together. For example, an association rule, “laptop speaker(80%),” states that four out of five customers that bought a laptop computer also bought speakers.
- The first step in generating association rules is to review a database of transactions to identify meaningful patterns (referred to as frequent patterns, frequent sets or frequent itemsets) in a transaction database, such as significant purchase patterns that appear as common patterns recurring among a plurality of customers. Typically, this is done by using thresholds such as support and confidence parameters, or other guides to the data mining process. These guides are used to discover frequent patterns, i.e., all sets of itemsets that have transaction support above a predetermined minimum support S and confidence C threshold. Various techniques have been proposed to assist with identifying frequent patterns in transaction databases, including using “Apriori” algorithms to generate and test candidate sets, such as described by R. Agrawal et al., “Mining Association Rules Between Sets of Items in Large Databases,” Proceedings of ACM SIGMOD Int'l Conf. on Management of Data, pp. 207-216 (1993). However, candidate set generation is costly in terms of computational resources consumed, especially when there are prolific patterns or long patterns in the database and when multiple passes through potentially large candidate sets are required. Other techniques (such as described by J. Han et al., “Mining Frequent Patterns Without Candidate Generation,” Proceedings of ACM SIGMOD Int'l Conf. on Management of Data, pp. 1-12 (2000)) attempt to overcome these limitations by using a frequent pattern tree (FPTree) data structure to mine frequent patterns without candidate set generation (a process referred to as FPGrowth). With the FPGrowth approach, frequency pattern information is stored in a compact memory structure.
- Once the frequent sets are identified, the association rules are generated by constructing the power set (set of all subsets) of the identified frequent sets, and then generating rules from each of the elements of the power set. For each rule, its meaningfulness (i.e., support, confidence, lift, etc.) is calculated and examined to see if it meets the required thresholds. For example, if a frequent pattern {A, B, C} is extracted—meaning that this set occurs more frequently than the minimum support S threshold in the set of transactions—then several rules can be generated from this set:
-
-
-
- {A, B}{C}etc. where a rule AB which indicates that “Product A is often purchased together with Product B,” meaning that there is an association between the sales of Products A and B. Such rules can be useful for decisions concerning product pricing, product placement, promotions, store layout and many other decisions.
- Conventional approaches for generating frequent patterns (e.g., with a standard market basket analysis techniques) look at the frequency of item patterns in orders, but do not attempt to determine if patterns are becoming more or less frequent over time. Using shorter and more recent time periods for determining pattern frequency generally increases the weighting of recent pattern frequency, but typically lowers the amount of statistical significance to the data. Conversely, using longer time periods for determining pattern frequency yields more statistical confidence in the data, but decreases the accuracy due to the inclusion of older pattern frequency data. Accordingly, a need exists for methods and/or apparatuses for improving the generation and analysis of frequent patterns for use in data mining. There is also a need for improving pattern mining processes to better predict future demand. In addition, there is a need for methods and/or apparatuses for efficiently generating future expected pattern frequency information. Further limitations and disadvantages of conventional systems will become apparent to one of skill in the art after reviewing the remainder of the present application with reference to the drawings and detailed description which follow.
- In accordance with one or more embodiments of the present invention, a system and method are provided for generating more meaningful frequent set data by analyzing frequent pattern data over time to predict frequent pattern trends. In a selected embodiment, frequent pattern trends may be derived by using frequent pattern generation techniques over discrete time slices of transaction data, and then processing the results using numerical calculation techniques, such as least-squares approximation or other higher order interpolation techniques, to extract trend information. By extrapolating the computed trend information into the future, a more accurate frequency prediction is obtained than can be provided by standard averaged frequency techniques. In addition, more accurate predictions may be obtained by focusing the pattern mining review on more recent time slices, due to the increased relevance of recent data. In addition or in the alternative, the pattern mining review of the present invention may use regression techniques to analyze the change in frequency of patterns to predict future behavior by projecting the regression to calculate the expected value of a recommendation rule. In accordance with another embodiment of the present invention, frequent itemset information is accumulated on a constant time interval (week1, week2, week3, etc.) and is used in a regression analysis to make a prediction about future demand.
- The objects, advantages and other novel features of the present invention will be apparent from the following detailed description when read in conjunction with the appended claims and attached drawings.
-
FIG. 1 depicts an exemplary system for mining association rules from a transaction database. -
FIG. 2 is an exemplary chart comparison of demand prediction using an average pattern frequency technique and a data regression technique. -
FIG. 3 is a flowchart that schematically illustrates a process for applying data regression and pattern mining to predict future demand. - An efficient database mining method and apparatus are described for processing frequent patterns from transaction databases by programmatically computing the trend of each pattern frequency over time to provide more accurate frequency prediction for use with generating and evaluating association rules. While various details are set forth in the following description, it will be appreciated that the present invention may be practiced without these specific details. In addition, selected aspects are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention. Some portions of the detailed descriptions provided herein are presented in terms of algorithms or operations on data within a computer memory. Such descriptions and representations are used by those skilled in the data processing arts to describe and convey the substance of their work to others skilled in the art. In general, an algorithm refers to a self-consistent sequence of steps leading to a desired result, where a “step” refers to a manipulation of physical quantities which may, though need not necessarily, take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It is common usage to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. These and similar terms may be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions using terms such as processing, computing, calculating, determining, displaying or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and/or transforms data represented as physical, electronic and/or magnetic quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
- Referring now to
FIG. 1 , a block diagram depicts an exemplary system for mining attribute-based association rules from frequent patterns identified in a transaction database. InFIG. 1 , thesystem 10 comprises a data processing engine orsystem 30 coupled to adatabase 11. Thesystem 10 also includes aninput device 20 where at least one condition of the association rules to be mined is input by a user. For example, theinput device 20 is used to input the conditions (i.e., support, confidence, lift, etc.) for the association rule to be mined. The output device may be a local display or printer device, or may be a remotely connected computer system, such as a client computer device or network-connected computer device. In a selected embodiment, the system 10 (e.g., a private wide area network (WAN) or the Internet) includes a centralserver computer system 30 and one or more networked client or server computer systems that are connected to the network as aninput device 20 and/or anoutput device 40. Communication between centralserver computer system 30 and the networked computer systems typically occurs over a network, such as a public switched telephone network over asynchronous digital subscriber line (ADSL) telephone lines or high-bandwidth trunks, for example, communications channels providing Ti or OC3 service. Networked client computer system(s) typically access centralserver computer system 30 through a service provider, such as an internet service provider (“ISP”) by executing application specific software, commonly referred to as a browser, on the networked client computer systems. - In the
data processing system 30, anattribute mapper 32 may be included for mapping a first data set to a second, highly granular data set as described more fully in U.S. patent application Ser. No. 10/870,360 (entitled “Attribute Based Association Rule Mining”), which is assigned to Trilogy Development Group and is hereby incorporated by reference in its entirety. In addition, afrequent pattern generator 34 is included for identifying frequent patterns occurring in thedatabase 11. For example, thefrequent pattern generator 34 may use FPGrowth techniques to identify frequent patterns in thetransaction data 12 stored in thedatabase 11 meeting the minimum support count input by the user. Arule generator 36 is included for generating association rules from the frequent pattern information, and anoutput device 40 is also provided for outputting the mined association rules. Thedatabase 11 may be connected to theattribute mapper 32,frequent pattern generator 34 and/orrule generator 36. Alternatively,transaction data 12 from thedatabase 11 may be transformed by theattribute mapper 32, passed directly to thefrequent pattern generator 34 for processing to identify frequent patterns, and then passed to therule generator 36 for rule generation. - The
attribute mapper 32 is provided for transforming generic item descriptors in the transaction database to provide more detailed item description information concerning various product attributes and/or qualities for the item. For example, part number information may be mapped into more granular product or attribute information identifying specific features of the product, where the specific product or attribute information may be presented as native values. At thefrequent pattern generator 34, all of the frequent patterns from thetransaction data 12 in thedatabase 11 are compiled, and the support of each frequent pattern may be obtained. At therule generator 36, at least one association rule is derived by using the frequent pattern information provided by thefrequent pattern generator 34. A broad variety of efficient algorithms for mining association rules have been developed in recent years, including algorithms based on the level-wise Apriori framework, TreeProjection and FPGrowth algorithms. - Referring specifically to the mining of frequent pattern information, it will be appreciated that conventional market basket analysis techniques for mining frequent patterns look at the frequency of item patterns in orders, but do not attempt to determine if patterns are becoming more or less frequent over time. Using shorter and more recent time periods for determining pattern frequency generally increases the weighting of recent pattern frequency, but typically lowers the amount of statistical significance to the data. Conversely, using longer time periods for determining pattern frequency yields more statistical confidence in the data, but decreases the accuracy due to the inclusion of older pattern frequency data. A selected embodiment of the present invention balances the accuracy and confidence requirements with a frequent
pattern generator module 34 that uses standard approaches to pattern frequency (such as pairwise association rule mining, Apriori, or the FP-growth algorithm) against smaller time slices of transaction data to programmatically compute the trend of each pattern's frequency using a variety of numerical calculation techniques, such as least-squares approximation or other higher order interpolation techniques. The future expected pattern frequency is computed by extrapolating the computed trend line into the future to yield a more accurate frequency prediction than the standard averaged frequency technique. - An example illustration of the advantages of the frequency prediction approach of present invention over conventional approaches is provided in the following table, which contrasts the predicted frequency with the average frequency of a particular pattern (in this case, a processor for a desktop computer).
-
A B C D E F G Item Week 1 Week 2Week 3Week 4Average Predicted Processor A 60% 48% 50% 35% 48.25% 30 % Processor B 40% 52% 50% 65% 51.75% 70% - Columns B-E show the pattern frequency of each processor per week. Column F shows the average pattern frequency which, for simplicity in this example, assumes that the number of Desktops sold in each week is constant. Column G shows the result of applying a linear least-squares approximation to the pattern frequencies and extrapolating to the fifth week. In this case, (Desktop, Processor A) is a pattern that is declining in frequency, while (Desktop, Processor B) is an emerging pattern. The improved accuracy of the frequency prediction technique of the present invention is readily demonstrated by charting the example comparison data, such as depicted in
FIG. 2 which depicts an exemplary chart comparison of demand prediction using an average pattern frequency technique and a data regression technique. As shown inFIG. 2 , the dashedtrend line 202 for Processor A is computed on the basis of the first four weeks of pattern frequency data for Processor A (plotted at single line 204) and is used to predict a pattern frequency of 30% atWeek 5. If, instead, an averaging technique were used, the calculated value of 48.25% would result from the average of the first four weeks of pattern frequency data for Processor A (plotted at single line 204). - Similarly, the
dotted trend line 206 for Processor B is computed on the basis of the first four weeks of pattern frequency data for Processor B (plotted at double line 208) and is used to predict a pattern frequency of 70% atWeek 5, as compared to a calculated value of 51.75% based on the average of the first four weeks of pattern frequency data for Processor B (plotted at double line 208). As this example illustrates, the predicted pattern frequency values paint a much different and more accurate picture of the pattern frequency trends (70% for Processor B as compared to 30% for Processor A) than is provided by using averaging techniques (51.75% for Processor B as compared to 48.25% for Processor A). - Depending on the extrapolation technique used, there may be predicted values that require additional post-processing. For example, with relatively rapid changes in pattern frequency, predicted pattern frequencies may fall outside of the range of 0%-100%. To address this situation, the predicted values may be capped so that any predicted frequency of less than 0% is changed to 0%, and any predicted frequency of over 100% is changed to 100%.
- As will be appreciated, frequent patterns or itemsets may be constructed using database mining techniques to find interesting patterns from databases, such as association rules, correlations, sequences, episodes, classifiers, clusters and the like. The task of discovering and evaluating frequent patterns in a database of items is quite challenging, given that the search space is exponential in the number of items occurring in the database. The present invention discloses techniques for discovering more meaningful pattern frequency information by, for example, accumulating frequent itemset information on a constant or predetermined time interval and then using the information at this aggregate level in a regression analysis to make a prediction about future demand. The projected or predicted values may then be used to calculate or quantify an expected value of a recommendation rule that is based on the subject pattern being forecast.
- In an exemplary embodiment, the database pattern mining may be implemented with a data processing system that processes transaction database information to provide a frequent set with attribute-based items identifying the purchased product, and to more efficiently generate association rules from the generated frequent set. For example, data processing may be performed on computer system 10 (see
FIG. 1 ) which may be found in many forms including, for example, mainframes, minicomputers, workstations, servers, personal computers, internet terminals, notebooks, wireless or mobile computing devices (including personal digital assistants), embedded systems and other information handling systems, which are designed to provide computing power to one or more users, either locally or remotely. Acomputer system 10 includes one or more microprocessor or central processing units (CPU) 38,mass storage memory 11 andlocal RAM memory 31. Theprocessor 38, in one embodiment, is a 32-bit or 64-bit microprocessor manufactured by Motorola, such as the 680X0 processor or microprocessor manufactured by Intel, such as the 80X86, or Pentium processor, or IBM. However, any other suitable single or multiple microprocessors or microcomputers may be utilized. Computer programs and data are generally stored as instructions and data inmass storage 11 until loaded intomain memory 31 for execution.Main memory 31 may be comprised of dynamic random access memory (DRAM). TheCPU 38 may be connected directly (or through an interface or bus) to a variety of peripheral and system components, such as a hard disk drive, cache memory, traditional I/O devices (such as display monitors, mouse-type input devices, floppy disk drives, speaker systems, keyboards, hard drive, CD-ROM drive, modems, printers), network interfaces, terminal devices, televisions, sound devices, voice recognition devices, electronic pen devices, and mass storage devices such as tape drives, hard disks, compact disk (“CD”) drives, digital versatile disk (“DVD”) drives, and magneto-optical drives. The peripheral devices usually communicate with the processor over one or more buses and/or bridges. The foregoing components and devices are used as examples for sake of conceptual clarity and that various configuration modifications are common. - Turning now to
FIG. 3 , anexemplary flow methodology 300 is illustrated for predicting or forecasting pattern frequency information by calculating an approximation curve based on historical pattern frequency information. Generally speaking, the depicted methodology determines the single item count for each item in a transaction database (step 301), divides this information into predetermined date ranges for purposes of determining the pattern count values in each date range (step 303), and then calculates the predicted pattern frequency information based on historical pattern frequency data (step 305). While a variety of different calculation techniques may be used to calculate the predicted pattern frequency information, a selected embodiment performs a regression analysis (such as a least-squares approximation or other higher order interpolation technique). As will be appreciated, the methodology illustrated inFIG. 3 shows the steps for generating pattern frequency information within predetermined date ranges and for generating a forecast or prediction of a future pattern frequency value that may be used to evaluate and generate association rules from the items in the frequent set. While the methodology of the present invention may be thought of as performing the identified sequence of steps in the order depicted inFIG. 3 , the steps may also be performed in parallel, in a different order, or as independent operations that use historical pattern frequency data to make predictions about future demand for purposes of generating association rules therefrom. - The description of the
illustrative method 300 can begin atstep 301 where the item count for each item in a transaction database is determined. This count information may be obtained by incrementing a count value (step 308) for each item I (step 306) in each transaction T (step 304) in the transaction database. The itemcount incrementation step 308 is repeated for all the items in a transaction (negative outcome from decision 310), and for all transactions in the database (negative outcome from decision 312), until the total count for each item in the transaction database is obtained (affirmative outcome fromdecisions 310, 312). - With the item count established, the pattern count for each predetermined date range may be determined at
step 303. In particular, the transaction database may be divided or parsed into predetermined date ranges atstep 314. For each transaction in a given date range D, item pairs (I, J) having a single item count that meets a minimum support threshold (step 316) may be counted. The item pair count information may be obtained by incrementing an item pair count value (step 318) for each item pair (I, J) (step 316) in each transaction T in the date range D (step 314). The item paircount incrementation step 318 is repeated for all the item pairs in a transaction (negative outcome from decision 320), and for all transactions in the date range (negative outcome from decision 322), until the total count for each item pair in the transaction date range is obtained (affirmative outcome fromdecisions 320, 322). The item pair counting process is repeated for each date range in the transaction database (negative outcome to decision 324) by incrementing the date range value (step 326), until all date ranges have been processed (affirmative outcome to decision 324). - With the pattern count for each date range established, the pattern frequency for each item pair (I, J) in each date range is used to calculate a predicted pattern frequency at
step 305. In particular, each item pair in the transaction database meeting a minimum support threshold (step 328) is used to calculate a pattern frequency value for the item pair in each date range D (step 330), thereby generating historical pattern frequency data. This historical data may be processed to generate an approximation curve, such as by using interpolation techniques to derive a trend line based on the historical pattern frequency data and date range information (step 332). By extending or extrapolating the trend line to a forecasted or future time range, a predicted value of the pattern frequency for a given item pair is obtained (step 334). Additional processing may be performed when calculating the predicted pattern frequency, such as truncating or capping the predicted value to a predetermined range of values (e.g., 0%-100%) to address situations where the calculated predicted pattern frequency value exceeds the predetermined range of values. The prediction calculation process is repeated for each item pair in the transaction database (negative outcome to decision 338) until a pattern frequency forecast is calculated for each item pair (affirmative outcome to decision 338), at which time the process is finished (step 340). - In accordance with selected embodiments of the present invention, a computer-based methodology and system are provided for mining patterns from a transaction database. As a preliminary step, a first pattern in a transaction database is identified that meets a minimum support threshold requirement. Next, a pattern frequency value for the first pattern is measured over a plurality of predetermined time intervals (e.g., a plurality of recent time intervals or constant time intervals) based on the number of times the first pattern occurs in the predetermined time intervals, and the measured pattern frequency values for the first pattern are then processed to calculate a predicted pattern frequency value for the first pattern. The processing of the pattern frequency values to calculate a predicted pattern frequency values can be done in any desired way, including but not limited to applying a linear least-squares approximation to the pattern frequency values and extrapolating to the predicted pattern frequency value for the first pattern. Alternatively, the prediction may be accomplished by computing a trend line based on the pattern frequency values and extrapolating the trend line to calculate the predicted frequency value for the first pattern. Yet another technique for processing of pattern frequency values is to use a regression analysis to calculate the predicted frequency value for the first pattern. To prevent unreasonable predictions, the predicted pattern frequency value may be capped to an upper (and/or lower) limit to prevent the predicted pattern frequency value from exceeding the upper (and/or lower) limit. However calculated, the predicted pattern frequency value may be used to calculate an expected value of a recommendation rule that is based the first pattern.
- As set forth above, the methods and systems for applying data regression and pattern mining to predict future demand as shown and described herein may be implemented in software stored on a computer-readable medium and executed as a computer program on a general purpose or special purpose computer to perform certain tasks. The software discussed herein may include script, batch, or other executable files. The software may be stored on a machine-readable or computer-readable storage medium, and is otherwise available to direct the operation of the computer system as described herein and claimed below. In one embodiment, the software uses instructions and data stored in a local or database memory to implement the data regression and pattern mining techniques so as to improve the ability to predict the future pattern frequency for purposes of forecasting demand. The local or database memory used for storing firmware or hardware modules in accordance with an embodiment of the invention may also include a semiconductor-based memory, which may be permanently, removably or remotely coupled to a microprocessor system. Other new and various types of computer-readable storage media may be used to store the modules discussed herein. Additionally, those skilled in the art will recognize that the separation of functionality into modules is for illustrative purposes. Alternative embodiments may merge the functionality of multiple software modules into a single module or may impose an alternate decomposition of functionality of modules. For example, a software module for calling sub-modules may be decomposed so that each sub-module performs its function and passes control directly to another sub-module. The computer-based data processing system described above is for purposes of example only, and may be implemented in any type of computer system or programming or processing environment, or in a computer program, alone or in conjunction with hardware. It is contemplated that the present invention may be run on a stand-alone computer system, or may be run from a server computer system that can be accessed by a plurality of client computer systems interconnected over an intranet network, or that is accessible to clients over the Internet. In addition, many embodiments of the present invention have application to a wide range of industries including the following: computer hardware and software manufacturing and sales, professional services, financial services, automotive sales and manufacturing, telecommunications sales and manufacturing, medical and pharmaceutical sales and manufacturing, and construction industries.
- Although the present invention has been described in detail, it is not intended to limit the invention to the particular form set forth, but on the contrary, is intended to cover such alternatives, modifications and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims so that those skilled in the art should understand that they can make various changes, substitutions and alterations without departing from the spirit and scope of the invention in its broadest form.
Claims (17)
1-20. (canceled)
21. A computer-based method of mining one or more patterns from a transaction database, comprising:
for each of a plurality of predetermined time intervals, measuring a pattern frequency value for a first pattern in a transaction database based on how many times the first pattern occurs in said predetermined time interval; and
processing the pattern frequency values for the first pattern to calculate a predicted pattern frequency value for the first pattern.
22. The method of claim 21 , where processing the pattern frequency values comprises applying a linear least-squares approximation to the pattern frequency values and extrapolating to the predicted pattern frequency value for the first pattern.
23. The method of claim 21 , where processing the pattern frequency values comprises computing a trend line based on the pattern frequency values and extrapolating the trend line to calculate the predicted pattern frequency value for the first pattern.
24. The method of claim 21 , where processing the pattern frequency values comprises using a regression analysis to calculate the predicted pattern frequency value for the first pattern.
25. The method of claim 21 , further comprising capping the predicted pattern frequency value to an upper limit to prevent the predicted pattern frequency value from exceeding the upper limit.
26. The method of claim 21 , further comprising capping the predicted pattern frequency value to an lower limit to prevent the predicted pattern frequency value from going below the lower limit.
27. The method of claim 21 , further comprising using the predicted pattern frequency value to calculate an expected value of a recommendation rule that is based the first pattern.
28. The method of claim 21 , where the plurality of predetermined time intervals comprises a plurality of recent time intervals.
29. The method of claim 21 , where the plurality of predetermined time intervals comprises a plurality of constant time intervals.
30. An article of manufacture having at least one recordable medium having stored thereon executable instructions and data which, when executed by at least one processing device, cause the at least one processing device to:
measure, for each of a plurality of predetermined time intervals, a pattern frequency value for a first pattern in a transaction database based on how many times the first pattern occurs in the predetermined time interval; and
process the pattern frequency values for the first pattern to calculate a predicted pattern frequency value for the first pattern.
31. The article of manufacture of claim 30 , wherein the processing device processes the pattern frequency values by applying a linear least-squares approximation to the pattern frequency values and extrapolating to the predicted pattern frequency value for the first pattern.
32. The article of manufacture of claim 30 , wherein the processing device processes the pattern frequency values by computing a trend line based on the pattern frequency values and extrapolating the trend line to calculate the predicted pattern frequency value for the first pattern.
33. The article of manufacture of claim 30 , wherein the processing device processes the pattern frequency values by using a regression analysis to calculate the predicted pattern frequency value for the first pattern.
34. The article of manufacture of claim 30 , wherein the executable instructions and data, when executed by at least one processing device, cause the at least one processing device to cap the predicted pattern frequency value to an upper limit to prevent the predicted pattern frequency value from exceeding the upper limit.
35. The article of manufacture of claim 30 , wherein the executable instructions and data, when executed by at least one processing device, cause the at least one processing device to cap the predicted pattern frequency value to an lower limit to prevent the predicted pattern frequency value from going below the lower limit.
36. The article of manufacture of claim 30 , wherein the executable instructions and data, when executed by at least one processing device, cause the at least one processing device to use the predicted pattern frequency value to calculate an expected value of a recommendation rule that is based the first pattern.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/252,180 US20140222744A1 (en) | 2005-08-02 | 2014-04-14 | Applying Data Regression and Pattern Mining to Predict Future Demand |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US70457505P | 2005-08-02 | 2005-08-02 | |
US11/460,401 US8700607B2 (en) | 2005-08-02 | 2006-07-27 | Applying data regression and pattern mining to predict future demand |
US14/252,180 US20140222744A1 (en) | 2005-08-02 | 2014-04-14 | Applying Data Regression and Pattern Mining to Predict Future Demand |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/460,401 Continuation US8700607B2 (en) | 2005-08-02 | 2006-07-27 | Applying data regression and pattern mining to predict future demand |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140222744A1 true US20140222744A1 (en) | 2014-08-07 |
Family
ID=37718762
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/460,401 Active 2031-09-11 US8700607B2 (en) | 2005-08-02 | 2006-07-27 | Applying data regression and pattern mining to predict future demand |
US14/252,180 Abandoned US20140222744A1 (en) | 2005-08-02 | 2014-04-14 | Applying Data Regression and Pattern Mining to Predict Future Demand |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/460,401 Active 2031-09-11 US8700607B2 (en) | 2005-08-02 | 2006-07-27 | Applying data regression and pattern mining to predict future demand |
Country Status (1)
Country | Link |
---|---|
US (2) | US8700607B2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170063962A1 (en) * | 2015-08-27 | 2017-03-02 | International Business Machines Corporation | Data transfer target applications through content analysis |
US20170300554A1 (en) * | 2016-04-14 | 2017-10-19 | Ge Aviation Systems Llc | Systems and methods for providing data exploration techniques |
US10572836B2 (en) | 2015-10-15 | 2020-02-25 | International Business Machines Corporation | Automatic time interval metadata determination for business intelligence and predictive analytics |
US11016730B2 (en) | 2016-07-28 | 2021-05-25 | International Business Machines Corporation | Transforming a transactional data set to generate forecasting and prediction insights |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070185867A1 (en) * | 2006-02-03 | 2007-08-09 | Matteo Maga | Statistical modeling methods for determining customer distribution by churn probability within a customer population |
US8290913B2 (en) * | 2007-12-31 | 2012-10-16 | Teradata Us, Inc. | Techniques for multi-variable analysis at an aggregate level |
US20100241486A1 (en) * | 2009-03-18 | 2010-09-23 | Yahoo! Inc. | Reducing revenue risk in advertisement allocation |
AU2013267037B2 (en) * | 2009-05-04 | 2016-01-28 | Visa International Service Association | Frequency-based transaction prediction and processing |
WO2010129563A2 (en) | 2009-05-04 | 2010-11-11 | Visa International Service Association | Determining targeted incentives based on consumer transaction history |
US20110087616A1 (en) * | 2009-10-09 | 2011-04-14 | International Business Machines Corporation | System and method for creating a graphical representation of portfolio risk |
JP5528292B2 (en) * | 2010-10-14 | 2014-06-25 | インターナショナル・ビジネス・マシーンズ・コーポレーション | System, method and program for extracting meaningful frequent itemsets |
US8862638B2 (en) * | 2011-02-28 | 2014-10-14 | Red Hat, Inc. | Interpolation data template to normalize analytic runs |
US20130073518A1 (en) * | 2011-09-20 | 2013-03-21 | Manish Srivastava | Integrated transactional and data warehouse business intelligence analysis solution |
US9495702B2 (en) | 2011-09-20 | 2016-11-15 | Oracle International Corporation | Dynamic auction monitor with graphic interpretive data change indicators |
US20130204657A1 (en) * | 2012-02-03 | 2013-08-08 | Microsoft Corporation | Filtering redundant consumer transaction rules |
US10061822B2 (en) * | 2013-07-26 | 2018-08-28 | Genesys Telecommunications Laboratories, Inc. | System and method for discovering and exploring concepts and root causes of events |
US9971764B2 (en) | 2013-07-26 | 2018-05-15 | Genesys Telecommunications Laboratories, Inc. | System and method for discovering and exploring concepts |
US10380204B1 (en) | 2014-02-12 | 2019-08-13 | Pinterest, Inc. | Visual search |
JP6223889B2 (en) * | 2014-03-31 | 2017-11-01 | 株式会社東芝 | Pattern discovery apparatus and program |
US20160092893A1 (en) * | 2014-09-29 | 2016-03-31 | Ebay Inc. | System, method, and apparatus for predicting item characteristic popularity |
US10067981B2 (en) * | 2014-11-21 | 2018-09-04 | Sap Se | Intelligent memory block replacement |
US9672495B2 (en) * | 2014-12-23 | 2017-06-06 | Sap Se | Enhancing frequent itemset mining |
US10614394B2 (en) | 2015-11-09 | 2020-04-07 | Dell Products, L.P. | Data analytics model selection through champion challenger mechanism |
US20170345112A1 (en) * | 2016-05-25 | 2017-11-30 | Tyco Fire & Security Gmbh | Dynamic Threat Analysis Engine for Mobile Users |
US20200295970A1 (en) * | 2017-10-23 | 2020-09-17 | Nokia Solutions And Networks Oy | Method and system for automatic selection of virtual network functions (vnf) in a communication network |
CN109635003B (en) * | 2018-12-07 | 2021-03-16 | 南京华苏科技有限公司 | Multi-data-source-based community population information association method |
CN110400213A (en) * | 2019-07-26 | 2019-11-01 | 中国工商银行股份有限公司 | Data processing method and device and electronic equipment and readable medium |
CN111026956B (en) * | 2019-11-20 | 2021-03-23 | 拉扎斯网络科技(上海)有限公司 | Data list processing method and device, electronic equipment and computer storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5615109A (en) * | 1995-05-24 | 1997-03-25 | Eder; Jeff | Method of and system for generating feasible, profit maximizing requisition sets |
US6032125A (en) * | 1996-11-07 | 2000-02-29 | Fujitsu Limited | Demand forecasting method, demand forecasting system, and recording medium |
US6189005B1 (en) * | 1998-08-21 | 2001-02-13 | International Business Machines Corporation | System and method for mining surprising temporal patterns |
US20020174006A1 (en) * | 2001-05-17 | 2002-11-21 | Rugge Robert D. | Cash flow forecasting |
US20030200189A1 (en) * | 2002-04-19 | 2003-10-23 | Computer Associates Think, Inc. | Automatic neural-net model generation and maintenance |
US20040034612A1 (en) * | 2002-03-22 | 2004-02-19 | Nick Mathewson | Support vector machines for prediction and classification in supply chain management and other applications |
US7062447B1 (en) * | 2000-12-20 | 2006-06-13 | Demandtec, Inc. | Imputed variable generator |
US20060265429A1 (en) * | 2005-05-17 | 2006-11-23 | Travelocity.Com Lp | Systems, methods, and computer program products for optimizing communications with selected product providers and users by identifying trends in transactions between product providers and users |
US8635328B2 (en) * | 2002-10-31 | 2014-01-21 | International Business Machines Corporation | Determining time varying thresholds for monitored metrics |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6134555A (en) * | 1997-03-10 | 2000-10-17 | International Business Machines Corporation | Dimension reduction using association rules for data mining application |
US6820070B2 (en) * | 2000-06-07 | 2004-11-16 | Insyst Ltd. | Method and tool for data mining in automatic decision making systems |
US20020152305A1 (en) * | 2000-03-03 | 2002-10-17 | Jackson Gregory J. | Systems and methods for resource utilization analysis in information management environments |
JP2002278761A (en) * | 2001-03-16 | 2002-09-27 | Hitachi Ltd | Method and system for extracting correlation rule including negative item |
US6829608B2 (en) * | 2001-07-30 | 2004-12-07 | International Business Machines Corporation | Systems and methods for discovering mutual dependence patterns |
US20030217055A1 (en) * | 2002-05-20 | 2003-11-20 | Chang-Huang Lee | Efficient incremental method for data mining of a database |
JP3701633B2 (en) * | 2002-06-21 | 2005-10-05 | 株式会社日立製作所 | Item pattern extraction method, network system, and processing apparatus across multiple databases |
US20040049504A1 (en) * | 2002-09-06 | 2004-03-11 | International Business Machines Corporation | System and method for exploring mining spaces with multiple attributes |
US7418430B2 (en) * | 2003-07-28 | 2008-08-26 | Microsoft Corporation | Dynamic standardization for scoring linear regressions in decision trees |
-
2006
- 2006-07-27 US US11/460,401 patent/US8700607B2/en active Active
-
2014
- 2014-04-14 US US14/252,180 patent/US20140222744A1/en not_active Abandoned
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5615109A (en) * | 1995-05-24 | 1997-03-25 | Eder; Jeff | Method of and system for generating feasible, profit maximizing requisition sets |
US6032125A (en) * | 1996-11-07 | 2000-02-29 | Fujitsu Limited | Demand forecasting method, demand forecasting system, and recording medium |
US6189005B1 (en) * | 1998-08-21 | 2001-02-13 | International Business Machines Corporation | System and method for mining surprising temporal patterns |
US7062447B1 (en) * | 2000-12-20 | 2006-06-13 | Demandtec, Inc. | Imputed variable generator |
US20020174006A1 (en) * | 2001-05-17 | 2002-11-21 | Rugge Robert D. | Cash flow forecasting |
US20040034612A1 (en) * | 2002-03-22 | 2004-02-19 | Nick Mathewson | Support vector machines for prediction and classification in supply chain management and other applications |
US20030200189A1 (en) * | 2002-04-19 | 2003-10-23 | Computer Associates Think, Inc. | Automatic neural-net model generation and maintenance |
US8635328B2 (en) * | 2002-10-31 | 2014-01-21 | International Business Machines Corporation | Determining time varying thresholds for monitored metrics |
US20060265429A1 (en) * | 2005-05-17 | 2006-11-23 | Travelocity.Com Lp | Systems, methods, and computer program products for optimizing communications with selected product providers and users by identifying trends in transactions between product providers and users |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170063962A1 (en) * | 2015-08-27 | 2017-03-02 | International Business Machines Corporation | Data transfer target applications through content analysis |
US20170060355A1 (en) * | 2015-08-27 | 2017-03-02 | International Business Machines Corporation | Data transfer target applications through content analysis |
US10013146B2 (en) * | 2015-08-27 | 2018-07-03 | International Business Machines Corporation | Data transfer target applications through content analysis |
US10048838B2 (en) * | 2015-08-27 | 2018-08-14 | International Business Machines Corporation | Data transfer target applications through content analysis |
US10430033B2 (en) * | 2015-08-27 | 2019-10-01 | International Business Machines Corporation | Data transfer target applications through content analysis |
US10430034B2 (en) * | 2015-08-27 | 2019-10-01 | International Business Machines Corporation | Data transfer target applications through content analysis |
US10572836B2 (en) | 2015-10-15 | 2020-02-25 | International Business Machines Corporation | Automatic time interval metadata determination for business intelligence and predictive analytics |
US10572837B2 (en) | 2015-10-15 | 2020-02-25 | International Business Machines Corporation | Automatic time interval metadata determination for business intelligence and predictive analytics |
US20170300554A1 (en) * | 2016-04-14 | 2017-10-19 | Ge Aviation Systems Llc | Systems and methods for providing data exploration techniques |
US11301493B2 (en) * | 2016-04-14 | 2022-04-12 | Ge Aviation Systems Llc | Systems and methods for providing data exploration techniques |
US11016730B2 (en) | 2016-07-28 | 2021-05-25 | International Business Machines Corporation | Transforming a transactional data set to generate forecasting and prediction insights |
Also Published As
Publication number | Publication date |
---|---|
US8700607B2 (en) | 2014-04-15 |
US20070033185A1 (en) | 2007-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8700607B2 (en) | Applying data regression and pattern mining to predict future demand | |
US11501174B2 (en) | System and method for efficiently generating association rules using scaled lift threshold values to subsume association rules | |
US11816722B1 (en) | Scoring recommendations and explanations with a probabilistic user model | |
US7433879B1 (en) | Attribute based association rule mining | |
CN110837931B (en) | Customer churn prediction method, device and storage medium | |
US7698170B1 (en) | Retail recommendation domain model | |
US6567936B1 (en) | Data clustering using error-tolerant frequent item sets | |
US11250444B2 (en) | Identifying and labeling fraudulent store return activities | |
US6490582B1 (en) | Iterative validation and sampling-based clustering using error-tolerant frequent item sets | |
KR20160121806A (en) | Determining a temporary transaction limit | |
US9009091B2 (en) | Data classification tool using dynamic attribute weights and intervals of variation about static weights determined by conditional entropy of attribute descriptors | |
JP2009520266A (en) | Data independent relevance assessment using cognitive concept relationships | |
CN111966886A (en) | Object recommendation method, object recommendation device, electronic equipment and storage medium | |
JP5061999B2 (en) | Analysis apparatus, analysis method, and analysis program | |
CN115545886A (en) | Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium | |
Liu et al. | Extracting, ranking, and evaluating quality features of web services through user review sentiment analysis | |
Khder et al. | The impact of implementing data mining in business intelligence | |
CN111179051A (en) | Financial target customer determination method and device and electronic equipment | |
CN110599281A (en) | Method and device for determining target shop | |
CN110807687A (en) | Object data processing method, device, computing equipment and medium | |
JP2003323601A (en) | Predicting device with reliability scale | |
CN113902543A (en) | Resource quota adjusting method and device and electronic equipment | |
US9830325B1 (en) | Determining a likelihood that two entities are the same | |
US20230162214A1 (en) | System and method for predicting impact on consumer spending using machine learning | |
US20050114277A1 (en) | Method, system and program product for evaluating a data mining algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |