CA2655735A1 - Data profiling - Google Patents
Data profiling Download PDFInfo
- Publication number
- CA2655735A1 CA2655735A1 CA 2655735 CA2655735A CA2655735A1 CA 2655735 A1 CA2655735 A1 CA 2655735A1 CA 2655735 CA2655735 CA 2655735 CA 2655735 A CA2655735 A CA 2655735A CA 2655735 A1 CA2655735 A1 CA 2655735A1
- Authority
- CA
- Canada
- Prior art keywords
- data
- field
- values
- fields
- records
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
- G06F16/24544—Join order optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
- G06F16/24556—Aggregation; Duplicate elimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/252—Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/40—Data acquisition and logging
Abstract
Processing data includes profiling data from a data source, including reading the data from the data source, computing summary data characterizing the data while reading the data, and storing profile information that is based on the summary data. The data is then processed from the data source. This processing includes accessing the stored profile information and processing the data according to the accessed profile information.
Claims (82)
1. A method for processing data including:
profiling data from a data source, including reading the data from the data source, computing summary data characterizing the data while reading the data, and storing profile information that is based on the summary data; and processing the data from the data source, including accessing the stored profile information and processing the data according to the accessed profile information.
profiling data from a data source, including reading the data from the data source, computing summary data characterizing the data while reading the data, and storing profile information that is based on the summary data; and processing the data from the data source, including accessing the stored profile information and processing the data according to the accessed profile information.
2. The method of claim 1 wherein processing the data from the data source further includes reading the data from the data source.
3. The method of claim 1 wherein profiling the data is performed without maintaining a copy of the data outside the data source.
4. The method of claim 3 wherein the data includes variable record structure records with at least one of a conditional field and a variable number of fields.
5. The method of claim 4 wherein computing summary data characterizing the data while reading the data includes interpreting the variable record structure records while computing summary data characterizing the data.
6. The method of claim 1 wherein the data source includes a data storage system.
7. The method of claim 6 wherein the data storage system includes a database system.
8. The method of claim 1 wherein computing the summary data includes counting a number of occurrences for each of a set of distinct values for a field.
9. The method of claim 8 wherein storing profile information includes storing statistics for the field based on the counted number of occurrences for said field.
10. The method of claim 1 further including maintaining a metadata store that contains metadata related to the data source
11. The method of claim 10 wherein storing the profile information includes updating the metadata related to the data source.
12. The method of claim 10 wherein profiling the data and processing the data each make use of metadata for the data source
13. The method of claim 1 wherein profiling data from the data source further includes determining a format specification based on the profile information.
14. The method of claim 1 wherein profiling data from the data source further includes determining a validation specification based on the profile information.
15. The method of claim 14 wherein processing the data includes identifying invalid records in the data based on the validation specification.
16. The method of claim 1 wherein profiling data from the data source further includes specifying data transformation instructions based on the profile information.
17. The method of claim 16 wherein processing the data includes applying the transformation instructions to the data.
18. The method of claim 1 wherein processing the data includes importing the data into a data storage subsystem.
19. The method of claim 18 wherein processing the data includes validating the data prior to importing the data into a data storage subsystem.
20. The method of claim 19 wherein validating the data includes comparing characteristics of the data to reference characteristics for said data.
21. The method of claim 20 wherein the reference characteristics include statistical properties of the data.
22. The method of claim 1 wherein profiling the data includes profiling said data in parallel, including partitioning the data into parts and processing the parts using separate ones of a first set of parallel components.
23. The method of claim 22 wherein profiling the data in parallel further includes computing the summary data for different fields of the data using separate ones of a second set of parallel components.
24. The method of claim 23 wherein profiling the data in parallel further includes repartitioning outputs of the first set of parallel components to form inputs for the second set of parallel components.
25. The method of claim 22 wherein profiling the data in parallel includes reading the data from a parallel data source, each part of the parallel data source being processed by a different one of the first set of parallel components.
26. A method for processing data including:
profiling data from a data source, including reading the data from the data source, computing summary data characterizing the data while reading the data, and storing profile information that is based on the summary data; wherein profiling the data includes profiling said data in parallel, including partitioning the data into parts and processing the parts using separate ones of a first set of parallel components.
profiling data from a data source, including reading the data from the data source, computing summary data characterizing the data while reading the data, and storing profile information that is based on the summary data; wherein profiling the data includes profiling said data in parallel, including partitioning the data into parts and processing the parts using separate ones of a first set of parallel components.
27. Software including instructions adapted to perform all the method steps of any of claims 1 through 26 when executed on a data processing system.
28. The software of claim 27 embodied on a computer-readable medium.
29. A data processing system including:
a profiling module configured to read data from a data source, to compute summary data characterizing the data while reading the data, and to store profile information that is based on the summary data; and a processing module configured to access the stored profile information and to process the data from the data source according to the accessed profile information.
a profiling module configured to read data from a data source, to compute summary data characterizing the data while reading the data, and to store profile information that is based on the summary data; and a processing module configured to access the stored profile information and to process the data from the data source according to the accessed profile information.
30. A data processing system including:
means for profiling data from a data source, including means for reading the data from the data source, means for computing summary data characterizing the data while reading the data, and means for storing profile information that is based on the summary data; and means for processing the data from the data source, including means for accessing the stored profile information and means for processing the data according to the accessed profile information.
means for profiling data from a data source, including means for reading the data from the data source, means for computing summary data characterizing the data while reading the data, and means for storing profile information that is based on the summary data; and means for processing the data from the data source, including means for accessing the stored profile information and means for processing the data according to the accessed profile information.
31. A method for processing data including:
accepting information characterizing values of a first field in records of a first data source and information characterizing values of a second field in records of a second data source;
computing quantities characterizing a relationship between the first field and the second field based on the accepted information; and presenting information relating the first field and the second field.
accepting information characterizing values of a first field in records of a first data source and information characterizing values of a second field in records of a second data source;
computing quantities characterizing a relationship between the first field and the second field based on the accepted information; and presenting information relating the first field and the second field.
32. The method of claim 31 wherein presenting the information includes presenting the information to a user.
33. The method of claim 31 wherein the first data source and the second data source are the same data source.
34. The method of claim 31 wherein at least one of the first data source and the second data source includes a database table.
35. The method of claim 31 wherein the quantities characterizing the relationship include quantities characterizing joint characteristics of the values of the first field and of the second field.
36. The method of claim 35 wherein the information characterizing the values of the first field includes information characterizing a distribution of values of said first field.
37. The method of claim 36 wherein the information characterizing the distribution of values of the first field includes plurality of data records, each associating a different value and a corresponding number of occurrences of said value in the first field in the first data source.
38. The method of claim 36 wherein the information characterizing the values of the second field includes information characterizing a distribution of values of said field.
39. The method of claim 38 wherein computing the quantities characterizing the joint characteristics includes processing the information characterizing the distribution of values of the first field and of the second field to compute quantities related to a plurality of categories of co-occurrence of values.
40. The method of claim 39 wherein the information characterizing the distribution of values of the first field and of the second field includes a plurality of data records, each associating a different value and a corresponding number of occurrences of said value and wherein processing the information characterizing said distributions of values includes computing information characterizing a distribution of values in a join of the first data source and the second data source on the first field and the second field, respectively.
41. The method of claim 39 wherein the quantities related to the plurality of categories of co-occurrence of values includes a plurality of data records, each associated with one of the categories of co-occurrence and including a number of unique values of the first and the second fields in said category.
42. The method of claim 35 wherein computing the quantities characterizing the joint characteristics of the values of the first field and the second field includes computing information characterizing a distribution of values in a join of the first data source and the second data source using the first field and the second field, respectively.
43. The method of claim 35 wherein computing the quantities characterizing the joint characteristics of the values of the first field and the second field includes computing quantities related to a plurality of categories of co-occurrence of values.
44. The method of claim 42 wherein the categories of co-occurrence of values includes values that occur at least once in one of the first field and the second field but not in the other of said fields.
45. The method of claim 42 wherein the categories of co-occurrence of values includes values that occur exactly once in each of the first field and the second field.
46. The method of claim 42 wherein the categories of co-occurrence of values includes values that occur exactly once in one of the first field and the second field and more than once in the other of said fields.
47. The method of claim 42 wherein the categories of co-occurrence of values includes values that occur more than once in each of the first field and the second field.
48. The method of claim 35 further including the steps of accepting information characterizing values and computing quantities characterizing joint characteristics of the values are repeated for a plurality of pairs of first and second fields.
49. The method of claim 48 wherein each of the plurality of pairs of fields has a unique identifier that is included with values in the pairs of fields to compute the quantities characterizing the joint characteristics of the values.
50. The method of claim 48 further including presenting information relating the fields of one or more of the plurality of pairs of fields.
51. The method of claim 50 wherein presenting the information relating the fields of one or more of the plurality of pairs of fields includes identifying fields as candidate fields of one of a plurality of types of relationships of fields.
52. The method of claim 51 wherein the plurality of types of relationships of fields includes a primary key and foreign key relationship.
53. The method of claim 51 wherein the plurality of types of relationships of fields includes a common domain relationship.
54. The method of claim 31 wherein computing the quantities includes computing said quantities based on logical values that are converted from literal values of the first field and of the second field.
55. The method of claim 37 wherein computing the quantities includes computing said quantities in parallel, including partitioning the data records into parts and processing the parts using separate ones of a set of parallel components.
56. The method of claim 55 wherein the parts are based on values of the first field and of the second field.
57. The method of claim 56 wherein data records having the same value are in the same part.
58. Software including instructions adapted to perform all the method steps of any of claims 31 through 57 when executed on a data processing system.
59. The software of claim 58 embodied on a computer-readable medium.
60. A system for processing data including:
a values processing module configured to accept information characterizing values of a first field in records of a first data source and information characterizing values of a second field in records of a second data source;
a relationship processing module configured to compute quantities characterizing a relationship between the first field and the second field based on the accepted information;
an interface configured to present information relating the first field and the second field.
a values processing module configured to accept information characterizing values of a first field in records of a first data source and information characterizing values of a second field in records of a second data source;
a relationship processing module configured to compute quantities characterizing a relationship between the first field and the second field based on the accepted information;
an interface configured to present information relating the first field and the second field.
61. A system for processing data including:
means for accepting information characterizing values of a first field in records of a first data source and information characterizing values of a second field in records of a second data source;
means for computing quantities characterizing a relationship between the first field and the second field based on the accepted information;
means for presenting information relating the first field and the second field.
means for accepting information characterizing values of a first field in records of a first data source and information characterizing values of a second field in records of a second data source;
means for computing quantities characterizing a relationship between the first field and the second field based on the accepted information;
means for presenting information relating the first field and the second field.
62. A method for processing data including:
identifying a plurality of subsets of fields of data records of a data source;
determining co-occurrence statistics for each of the plurality of subsets; and identifying one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset.
identifying a plurality of subsets of fields of data records of a data source;
determining co-occurrence statistics for each of the plurality of subsets; and identifying one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset.
63. The method of claim 62 wherein at least one of the subsets of fields is a subset of two fields.
64. The method of claim 62 wherein identifying one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset includes identifying one or more of the plurality of subsets as having one of a plurality of possible predetermined functional relationships.
65. The method of claim 62 wherein determining the co-occurrence statistics includes forming data elements each identifying a pair of fields and identifying a pair of values occurring in the pair of fields in one of the data records.
66. The method of claim 62 wherein determining the co-occurrence statistics includes:
partitioning the data records into parts, the data records having a first field and a second field;
determining a quantity based on a distribution of values that occur in the second field of one or more records in a first of the parts, the one or more records having a common value occurring in a first field of the one or more records; and combining the quantity with other quantities from records in other of the parts to generate a total quantity.
partitioning the data records into parts, the data records having a first field and a second field;
determining a quantity based on a distribution of values that occur in the second field of one or more records in a first of the parts, the one or more records having a common value occurring in a first field of the one or more records; and combining the quantity with other quantities from records in other of the parts to generate a total quantity.
67. The method of claim 66 wherein identifying one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset includes identifying a functional relationship between the first and second fields based on the total quantity.
68. The method of claim 66 wherein the parts are based on values of the first field and of the second field.
69. The method of claim 66 wherein the parts are processed using separate ones of a set of parallel components.
70. The method of claim 62 wherein identifying one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset includes determining a degree of match to said functional relationship.
71. The method of claim 70 wherein the degree of match includes a number of exceptional records that are not consistent with said functional relationship.
72. The method of claim 62 wherein the functional relationship includes a mapping of at least some of the values of a first field onto at least some of the values of a second field.
73. The method of claim 72 wherein the mapping is a many-to-one mapping.
74. The method of claim 72 wherein the mapping is a one-to-many mapping.
75. The method of claim 72 wherein the mapping is a one-to-one mapping.
76. The method of claim 62 further including filtering the plurality of subsets based on information characterizing values in fields of the plurality of subsets.
77. The method of claim 62 wherein the data records include records of a database table.
78. The method of claim 77 wherein the data records include records of a plurality of database tables.
79. Software including instructions adapted to perform all the method steps of any of claims 62 through 78 when executed on a data processing system.
80. The software of claim 79 embodied on a computer-readable medium.
81. A system for processing data including:
an identification processing module configured to identify a plurality of subsets of fields of data records of a data source;
a statistics processing module configured to determine co-occurrence statistics for each of the plurality of subsets; and a functional relationship processing module configured to identify one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset.
an identification processing module configured to identify a plurality of subsets of fields of data records of a data source;
a statistics processing module configured to determine co-occurrence statistics for each of the plurality of subsets; and a functional relationship processing module configured to identify one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset.
82. A system for processing data including:
means for identifying a plurality of subsets of fields of data records of a data source;
means for determining co-occurrence statistics for each of the plurality of subsets; and means for identifying one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset.
means for identifying a plurality of subsets of fields of data records of a data source;
means for determining co-occurrence statistics for each of the plurality of subsets; and means for identifying one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset.
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US50290803P | 2003-09-15 | 2003-09-15 | |
US60/502,908 | 2003-09-15 | ||
US51303803P | 2003-10-20 | 2003-10-20 | |
US60/513,038 | 2003-10-20 | ||
US53295603P | 2003-12-22 | 2003-12-22 | |
US60/532,956 | 2003-12-22 | ||
CA002538568A CA2538568C (en) | 2003-09-15 | 2004-09-15 | Data profiling |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002538568A Division CA2538568C (en) | 2003-09-15 | 2004-09-15 | Data profiling |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2655735A1 true CA2655735A1 (en) | 2005-03-31 |
CA2655735C CA2655735C (en) | 2011-01-18 |
Family
ID=34381971
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002538568A Active CA2538568C (en) | 2003-09-15 | 2004-09-15 | Data profiling |
CA2655735A Active CA2655735C (en) | 2003-09-15 | 2004-09-15 | Data profiling |
CA2655731A Active CA2655731C (en) | 2003-09-15 | 2004-09-15 | Functional dependency data profiling |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002538568A Active CA2538568C (en) | 2003-09-15 | 2004-09-15 | Data profiling |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2655731A Active CA2655731C (en) | 2003-09-15 | 2004-09-15 | Functional dependency data profiling |
Country Status (10)
Country | Link |
---|---|
US (5) | US7849075B2 (en) |
EP (3) | EP2261820A3 (en) |
JP (3) | JP5328099B2 (en) |
KR (4) | KR20090039803A (en) |
CN (1) | CN102982065B (en) |
AT (1) | ATE515746T1 (en) |
AU (3) | AU2004275334B9 (en) |
CA (3) | CA2538568C (en) |
HK (1) | HK1093568A1 (en) |
WO (1) | WO2005029369A2 (en) |
Families Citing this family (202)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE515746T1 (en) | 2003-09-15 | 2011-07-15 | Ab Initio Technology Llc | DATA PROFILING |
US7653641B2 (en) * | 2004-05-04 | 2010-01-26 | Accruent, Inc. | Abstraction control solution |
US7349898B2 (en) * | 2004-06-04 | 2008-03-25 | Oracle International Corporation | Approximate and exact unary inclusion dependency discovery |
US7647293B2 (en) * | 2004-06-10 | 2010-01-12 | International Business Machines Corporation | Detecting correlation from data |
US7386566B2 (en) * | 2004-07-15 | 2008-06-10 | Microsoft Corporation | External metadata processing |
US8732004B1 (en) | 2004-09-22 | 2014-05-20 | Experian Information Solutions, Inc. | Automated analysis of data to generate prospect notifications based on trigger events |
US20060082581A1 (en) | 2004-10-14 | 2006-04-20 | Microsoft Corporation | Encoding for remoting graphics to decoder device |
US7852342B2 (en) | 2004-10-14 | 2010-12-14 | Microsoft Corporation | Remote client graphics rendering |
US7610264B2 (en) * | 2005-02-28 | 2009-10-27 | International Business Machines Corporation | Method and system for providing a learning optimizer for federated database systems |
EP1880266A4 (en) * | 2005-04-25 | 2010-09-08 | Invensys Sys Inc | Recording and tracing non-trending production data and events in an industrial process control environment |
US7836104B2 (en) * | 2005-06-03 | 2010-11-16 | Sap Ag | Demonstration tool for a business information enterprise system |
US7877350B2 (en) * | 2005-06-27 | 2011-01-25 | Ab Initio Technology Llc | Managing metadata for graph-based computations |
US20070006070A1 (en) * | 2005-06-30 | 2007-01-04 | International Business Machines Corporation | Joining units of work based on complexity metrics |
US8788464B1 (en) * | 2005-07-25 | 2014-07-22 | Lockheed Martin Corporation | Fast ingest, archive and retrieval systems, method and computer programs |
US20070033198A1 (en) * | 2005-08-02 | 2007-02-08 | Defries Anthony | Data representation architecture for media access |
US8527563B2 (en) | 2005-09-12 | 2013-09-03 | Microsoft Corporation | Remoting redirection layer for graphics device interface |
US20070073721A1 (en) * | 2005-09-23 | 2007-03-29 | Business Objects, S.A. | Apparatus and method for serviced data profiling operations |
US20070074176A1 (en) * | 2005-09-23 | 2007-03-29 | Business Objects, S.A. | Apparatus and method for parallel processing of data profiling information |
US8996586B2 (en) * | 2006-02-16 | 2015-03-31 | Callplex, Inc. | Virtual storage of portable media files |
US7873628B2 (en) * | 2006-03-23 | 2011-01-18 | Oracle International Corporation | Discovering functional dependencies by sampling relations |
US20070271259A1 (en) * | 2006-05-17 | 2007-11-22 | It Interactive Services Inc. | System and method for geographically focused crawling |
US7526486B2 (en) * | 2006-05-22 | 2009-04-28 | Initiate Systems, Inc. | Method and system for indexing information about entities with respect to hierarchies |
EP2030134A4 (en) | 2006-06-02 | 2010-06-23 | Initiate Systems Inc | A system and method for automatic weight generation for probabilistic matching |
US7711736B2 (en) * | 2006-06-21 | 2010-05-04 | Microsoft International Holdings B.V. | Detection of attributes in unstructured data |
US7698268B1 (en) | 2006-09-15 | 2010-04-13 | Initiate Systems, Inc. | Method and system for filtering false positives |
US7685093B1 (en) | 2006-09-15 | 2010-03-23 | Initiate Systems, Inc. | Method and system for comparing attributes such as business names |
US8356009B2 (en) | 2006-09-15 | 2013-01-15 | International Business Machines Corporation | Implementation defined segments for relational database systems |
US8700579B2 (en) * | 2006-09-18 | 2014-04-15 | Infobright Inc. | Method and system for data compression in a relational database |
US8266147B2 (en) * | 2006-09-18 | 2012-09-11 | Infobright, Inc. | Methods and systems for database organization |
US8762834B2 (en) * | 2006-09-29 | 2014-06-24 | Altova, Gmbh | User interface for defining a text file transformation |
US9846739B2 (en) | 2006-10-23 | 2017-12-19 | Fotonation Limited | Fast database matching |
US7809747B2 (en) * | 2006-10-23 | 2010-10-05 | Donald Martin Monro | Fuzzy database matching |
US20080097992A1 (en) * | 2006-10-23 | 2008-04-24 | Donald Martin Monro | Fast database matching |
US7774329B1 (en) | 2006-12-22 | 2010-08-10 | Amazon Technologies, Inc. | Cross-region data access in partitioned framework |
US8150870B1 (en) | 2006-12-22 | 2012-04-03 | Amazon Technologies, Inc. | Scalable partitioning in a multilayered data service framework |
US7613707B1 (en) * | 2006-12-22 | 2009-11-03 | Amazon Technologies, Inc. | Traffic migration in a multilayered data service framework |
CN101226523B (en) * | 2007-01-17 | 2012-09-05 | 国际商业机器公司 | Method and system for analyzing data general condition |
US8359339B2 (en) | 2007-02-05 | 2013-01-22 | International Business Machines Corporation | Graphical user interface for configuration of an algorithm for the matching of data records |
US20080195575A1 (en) * | 2007-02-12 | 2008-08-14 | Andreas Schiffler | Electronic data display management system and method |
US8515926B2 (en) * | 2007-03-22 | 2013-08-20 | International Business Machines Corporation | Processing related data from information sources |
US8429220B2 (en) | 2007-03-29 | 2013-04-23 | International Business Machines Corporation | Data exchange among data sources |
WO2008121700A1 (en) | 2007-03-29 | 2008-10-09 | Initiate Systems, Inc. | Method and system for managing entities |
US8423514B2 (en) | 2007-03-29 | 2013-04-16 | International Business Machines Corporation | Service provisioning |
WO2008121170A1 (en) | 2007-03-29 | 2008-10-09 | Initiate Systems, Inc. | Method and system for parsing languages |
US20120164613A1 (en) * | 2007-11-07 | 2012-06-28 | Jung Edward K Y | Determining a demographic characteristic based on computational user-health testing of a user interaction with advertiser-specified content |
US8069129B2 (en) | 2007-04-10 | 2011-11-29 | Ab Initio Technology Llc | Editing and compiling business rules |
US20090254588A1 (en) * | 2007-06-19 | 2009-10-08 | Zhong Li | Multi-Dimensional Data Merge |
US20110010214A1 (en) * | 2007-06-29 | 2011-01-13 | Carruth J Scott | Method and system for project management |
CN101689089B (en) * | 2007-07-12 | 2012-05-23 | 爱特梅尔公司 | Two-dimensional touch panel |
US20090055828A1 (en) * | 2007-08-22 | 2009-02-26 | Mclaren Iain Douglas | Profile engine system and method |
KR101572599B1 (en) * | 2007-09-20 | 2015-11-27 | 아브 이니티오 테크놀로지 엘엘시 | Managing data flows in graph-based computations |
US9690820B1 (en) | 2007-09-27 | 2017-06-27 | Experian Information Solutions, Inc. | Database system for triggering event notifications based on updates to database records |
JP5306360B2 (en) * | 2007-09-28 | 2013-10-02 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Method and system for analysis of systems for matching data records |
US8713434B2 (en) | 2007-09-28 | 2014-04-29 | International Business Machines Corporation | Indexing, relating and managing information about entities |
BRPI0817530B1 (en) | 2007-09-28 | 2020-02-04 | Initiate Systems Inc | method and system for processing data records in multiple languages and computer-readable storage media |
US8321914B2 (en) * | 2008-01-21 | 2012-11-27 | International Business Machines Corporation | System and method for verifying an attribute in records for procurement application |
US8224797B2 (en) * | 2008-03-04 | 2012-07-17 | International Business Machines Corporation | System and method for validating data record |
US8046385B2 (en) * | 2008-06-20 | 2011-10-25 | Ab Initio Technology Llc | Data quality tracking |
CN104679807B (en) | 2008-06-30 | 2018-06-05 | 起元技术有限责任公司 | Data log record in calculating based on figure |
US8239389B2 (en) * | 2008-09-29 | 2012-08-07 | International Business Machines Corporation | Persisting external index data in a database |
JP5535230B2 (en) | 2008-10-23 | 2014-07-02 | アビニシオ テクノロジー エルエルシー | Fuzzy data manipulation |
KR20150042866A (en) * | 2008-12-02 | 2015-04-21 | 아브 이니티오 테크놀로지 엘엘시 | Mapping instances of a dataset within a data management system |
US8478706B2 (en) * | 2009-01-30 | 2013-07-02 | Ab Initio Technology Llc | Processing data using vector fields |
KR20150038758A (en) | 2009-02-13 | 2015-04-08 | 아브 이니티오 테크놀로지 엘엘시 | Managing task execution |
CN102395950B (en) * | 2009-02-13 | 2016-03-16 | 起元技术有限责任公司 | With the communication of data-storage system |
US8051060B1 (en) * | 2009-02-13 | 2011-11-01 | At&T Intellectual Property I, L.P. | Automatic detection of separators for compression |
US10102398B2 (en) | 2009-06-01 | 2018-10-16 | Ab Initio Technology Llc | Generating obfuscated data |
CN102460076B (en) * | 2009-06-10 | 2015-06-03 | 起元技术有限责任公司 | Generating test data |
JP2011008560A (en) * | 2009-06-26 | 2011-01-13 | Hitachi Ltd | Information management system |
US8205113B2 (en) | 2009-07-14 | 2012-06-19 | Ab Initio Technology Llc | Fault tolerant batch processing |
EP2478433A4 (en) * | 2009-09-16 | 2016-09-21 | Ab Initio Technology Llc | Mapping dataset elements |
US8683214B2 (en) * | 2009-09-17 | 2014-03-25 | Panasonic Corporation | Method and device that verifies application program modules |
US8700577B2 (en) * | 2009-12-07 | 2014-04-15 | Accenture Global Services Limited GmbH | Method and system for accelerated data quality enhancement |
US10845962B2 (en) | 2009-12-14 | 2020-11-24 | Ab Initio Technology Llc | Specifying user interface elements |
US9477369B2 (en) * | 2010-03-08 | 2016-10-25 | Salesforce.Com, Inc. | System, method and computer program product for displaying a record as part of a selected grouping of data |
US8205114B2 (en) | 2010-04-07 | 2012-06-19 | Verizon Patent And Licensing Inc. | Method and system for partitioning data files for efficient processing |
US8577094B2 (en) | 2010-04-09 | 2013-11-05 | Donald Martin Monro | Image template masking |
US8417727B2 (en) | 2010-06-14 | 2013-04-09 | Infobright Inc. | System and method for storing data in a relational database |
US8521748B2 (en) | 2010-06-14 | 2013-08-27 | Infobright Inc. | System and method for managing metadata in a relational database |
CA2801573C (en) | 2010-06-15 | 2018-08-14 | Ab Initio Technology Llc | Dynamically loading graph-based computations |
CN103080932B (en) | 2010-06-22 | 2016-08-31 | 起元技术有限责任公司 | Process associated data set |
US8990165B2 (en) * | 2010-07-13 | 2015-03-24 | Hewlett-Packard Development Company, L.P. | Methods, apparatus and articles of manufacture to archive data |
US8515863B1 (en) * | 2010-09-01 | 2013-08-20 | Federal Home Loan Mortgage Corporation | Systems and methods for measuring data quality over time |
AU2011323773B2 (en) | 2010-10-25 | 2015-07-23 | Ab Initio Technology Llc | Managing data set objects in a dataflow graph that represents a computer program |
KR20120061308A (en) * | 2010-12-03 | 2012-06-13 | 삼성전자주식회사 | Apparatus and method for db controlling in portable terminal |
US9418095B2 (en) | 2011-01-14 | 2016-08-16 | Ab Initio Technology Llc | Managing changes to collections of data |
JP6066927B2 (en) | 2011-01-28 | 2017-01-25 | アビニシオ テクノロジー エルエルシー | Generation of data pattern information |
US9021299B2 (en) | 2011-02-18 | 2015-04-28 | Ab Initio Technology Llc | Restarting processes |
US9116759B2 (en) | 2011-02-18 | 2015-08-25 | Ab Initio Technology Llc | Restarting data processing systems |
CN102893284B (en) * | 2011-03-15 | 2016-07-06 | 松下电器产业株式会社 | Manipulation monitoring system, managing device, protection control module and detection module |
US9558519B1 (en) | 2011-04-29 | 2017-01-31 | Consumerinfo.Com, Inc. | Exposing reporting cycle information |
US20120330880A1 (en) * | 2011-06-23 | 2012-12-27 | Microsoft Corporation | Synthetic data generation |
US9116934B2 (en) * | 2011-08-26 | 2015-08-25 | Qatar Foundation | Holistic database record repair |
US8782016B2 (en) * | 2011-08-26 | 2014-07-15 | Qatar Foundation | Database record repair |
US8863082B2 (en) * | 2011-09-07 | 2014-10-14 | Microsoft Corporation | Transformational context-aware data source management |
US8719271B2 (en) | 2011-10-06 | 2014-05-06 | International Business Machines Corporation | Accelerating data profiling process |
US9438656B2 (en) | 2012-01-11 | 2016-09-06 | International Business Machines Corporation | Triggering window conditions by streaming features of an operator graph |
US9430117B2 (en) * | 2012-01-11 | 2016-08-30 | International Business Machines Corporation | Triggering window conditions using exception handling |
US20130304712A1 (en) * | 2012-05-11 | 2013-11-14 | Theplatform For Media, Inc. | System and method for validation |
US9582553B2 (en) * | 2012-06-26 | 2017-02-28 | Sap Se | Systems and methods for analyzing existing data models |
US9633076B1 (en) * | 2012-10-15 | 2017-04-25 | Tableau Software Inc. | Blending and visualizing data from multiple data sources |
US10489360B2 (en) * | 2012-10-17 | 2019-11-26 | Ab Initio Technology Llc | Specifying and applying rules to data |
KR102129643B1 (en) * | 2012-10-22 | 2020-07-02 | 아브 이니티오 테크놀로지 엘엘시 | Profiling data with source tracking |
CN104756106B (en) * | 2012-10-22 | 2019-03-22 | 起元科技有限公司 | Data source in characterize data storage system |
US10108521B2 (en) | 2012-11-16 | 2018-10-23 | Ab Initio Technology Llc | Dynamic component performance monitoring |
US9507682B2 (en) | 2012-11-16 | 2016-11-29 | Ab Initio Technology Llc | Dynamic graph performance monitoring |
US9703822B2 (en) | 2012-12-10 | 2017-07-11 | Ab Initio Technology Llc | System for transform generation |
EP2757467A1 (en) * | 2013-01-22 | 2014-07-23 | Siemens Aktiengesellschaft | Management apparatus and method for managing data elements of a version control system |
US9892026B2 (en) * | 2013-02-01 | 2018-02-13 | Ab Initio Technology Llc | Data records selection |
US9135280B2 (en) * | 2013-02-11 | 2015-09-15 | Oracle International Corporation | Grouping interdependent fields |
US9110949B2 (en) | 2013-02-11 | 2015-08-18 | Oracle International Corporation | Generating estimates for query optimization |
US9471545B2 (en) | 2013-02-11 | 2016-10-18 | Oracle International Corporation | Approximating value densities |
US9811233B2 (en) | 2013-02-12 | 2017-11-07 | Ab Initio Technology Llc | Building applications for configuring processes |
US10332010B2 (en) | 2013-02-19 | 2019-06-25 | Business Objects Software Ltd. | System and method for automatically suggesting rules for data stored in a table |
US9576036B2 (en) | 2013-03-15 | 2017-02-21 | International Business Machines Corporation | Self-analyzing data processing job to determine data quality issues |
KR101444249B1 (en) * | 2013-05-13 | 2014-09-26 | (주) 아트리아트레이딩 | Method, system and non-transitory computer-readable recording medium for providing information on securities lending and borrowing transaction, short selling or equity swap transaction |
CA2912420C (en) | 2013-05-17 | 2020-09-15 | Ab Initio Technology Llc | Managing memory and storage space for a data operation |
US20150032907A1 (en) * | 2013-07-26 | 2015-01-29 | Alcatel-Lucent Canada, Inc. | Universal adapter with context-bound translation for application adaptation layer |
WO2015027085A1 (en) | 2013-08-22 | 2015-02-26 | Genomoncology, Llc | Computer-based systems and methods for analyzing genomes based on discrete data structures corresponding to genetic variants therein |
EP3049913B1 (en) | 2013-09-27 | 2022-05-11 | Ab Initio Technology LLC | Evaluating rules applied to data |
US20150120224A1 (en) * | 2013-10-29 | 2015-04-30 | C3 Energy, Inc. | Systems and methods for processing data relating to energy usage |
WO2015085152A1 (en) | 2013-12-05 | 2015-06-11 | Ab Initio Technology Llc | Managing interfaces for dataflow graphs composed of sub-graphs |
WO2015095275A1 (en) | 2013-12-18 | 2015-06-25 | Ab Initio Technology Llc | Data generation |
US9529849B2 (en) | 2013-12-31 | 2016-12-27 | Sybase, Inc. | Online hash based optimizer statistics gathering in a database |
US11487732B2 (en) * | 2014-01-16 | 2022-11-01 | Ab Initio Technology Llc | Database key identification |
US9984173B2 (en) * | 2014-02-24 | 2018-05-29 | International Business Machines Corporation | Automated value analysis in legacy data |
EP3114578A1 (en) * | 2014-03-07 | 2017-01-11 | AB Initio Technology LLC | Managing data profiling operations related to data type |
WO2015138497A2 (en) | 2014-03-10 | 2015-09-17 | Interana, Inc. | Systems and methods for rapid data analysis |
US9846567B2 (en) | 2014-06-16 | 2017-12-19 | International Business Machines Corporation | Flash optimized columnar data layout and data access algorithms for big data query engines |
US9633058B2 (en) | 2014-06-16 | 2017-04-25 | International Business Machines Corporation | Predictive placement of columns during creation of a large database |
EP3742284A1 (en) | 2014-07-18 | 2020-11-25 | AB Initio Technology LLC | Managing lineage information |
JP6479966B2 (en) * | 2014-09-02 | 2019-03-06 | アビニシオ テクノロジー エルエルシー | Visual definition of a subset of components in a graph-based program through user interaction |
US9626393B2 (en) | 2014-09-10 | 2017-04-18 | Ab Initio Technology Llc | Conditional validation rules |
US9880818B2 (en) * | 2014-11-05 | 2018-01-30 | Ab Initio Technology Llc | Application testing |
US10055333B2 (en) | 2014-11-05 | 2018-08-21 | Ab Initio Technology Llc | Debugging a graph |
US10296507B2 (en) | 2015-02-12 | 2019-05-21 | Interana, Inc. | Methods for enhancing rapid data analysis |
US9952808B2 (en) | 2015-03-26 | 2018-04-24 | International Business Machines Corporation | File system block-level tiering and co-allocation |
CN104850590A (en) * | 2015-04-24 | 2015-08-19 | 百度在线网络技术(北京)有限公司 | Method and device for generating metadata of structured data |
US11068647B2 (en) * | 2015-05-28 | 2021-07-20 | International Business Machines Corporation | Measuring transitions between visualizations |
KR101632073B1 (en) * | 2015-06-04 | 2016-06-20 | 장원중 | Method, device, system and non-transitory computer-readable recording medium for providing data profiling based on statistical analysis |
EP3278213A4 (en) | 2015-06-05 | 2019-01-30 | C3 IoT, Inc. | Systems, methods, and devices for an enterprise internet-of-things application development platform |
US9384203B1 (en) * | 2015-06-09 | 2016-07-05 | Palantir Technologies Inc. | Systems and methods for indexing and aggregating data records |
US10409802B2 (en) | 2015-06-12 | 2019-09-10 | Ab Initio Technology Llc | Data quality analysis |
US10241979B2 (en) * | 2015-07-21 | 2019-03-26 | Oracle International Corporation | Accelerated detection of matching patterns |
US9977805B1 (en) * | 2017-02-13 | 2018-05-22 | Sas Institute Inc. | Distributed data set indexing |
US10657134B2 (en) | 2015-08-05 | 2020-05-19 | Ab Initio Technology Llc | Selecting queries for execution on a stream of real-time data |
US10127264B1 (en) | 2015-09-17 | 2018-11-13 | Ab Initio Technology Llc | Techniques for automated data analysis |
US10607139B2 (en) | 2015-09-23 | 2020-03-31 | International Business Machines Corporation | Candidate visualization techniques for use with genetic algorithms |
US10140337B2 (en) * | 2015-10-30 | 2018-11-27 | Sap Se | Fuzzy join key |
CN108351898B (en) * | 2015-10-30 | 2021-10-08 | 安客诚公司 | Automated interpretation for structured multi-field file layout |
US11410230B1 (en) | 2015-11-17 | 2022-08-09 | Consumerinfo.Com, Inc. | Realtime access and control of secure regulated data |
US10757154B1 (en) | 2015-11-24 | 2020-08-25 | Experian Information Solutions, Inc. | Real-time event-based notification system |
US10459730B2 (en) * | 2016-02-26 | 2019-10-29 | Hitachi, Ltd. | Analysis system and analysis method for executing analysis process with at least portions of time series data and analysis data as input data |
US10685035B2 (en) | 2016-06-30 | 2020-06-16 | International Business Machines Corporation | Determining a collection of data visualizations |
US10423387B2 (en) | 2016-08-23 | 2019-09-24 | Interana, Inc. | Methods for highly efficient data sharding |
US10146835B2 (en) | 2016-08-23 | 2018-12-04 | Interana, Inc. | Methods for stratified sampling-based query execution |
US11860940B1 (en) | 2016-09-26 | 2024-01-02 | Splunk Inc. | Identifying buckets for query execution using a catalog of buckets |
US10353965B2 (en) | 2016-09-26 | 2019-07-16 | Splunk Inc. | Data fabric service system architecture |
US11620336B1 (en) | 2016-09-26 | 2023-04-04 | Splunk Inc. | Managing and storing buckets to a remote shared storage system based on a collective bucket size |
US11093703B2 (en) * | 2016-09-29 | 2021-08-17 | Google Llc | Generating charts from data in a data table |
US9633078B1 (en) | 2016-09-30 | 2017-04-25 | Semmle Limited | Generating identifiers for tuples of recursively defined relations |
US9720961B1 (en) | 2016-09-30 | 2017-08-01 | Semmle Limited | Algebraic data types for database query languages |
JP7170638B2 (en) | 2016-12-01 | 2022-11-14 | アビニシオ テクノロジー エルエルシー | Generating, Accessing, and Displaying Lineage Metadata |
US10650050B2 (en) | 2016-12-06 | 2020-05-12 | Microsoft Technology Licensing, Llc | Synthesizing mapping relationships using table corpus |
US10936555B2 (en) * | 2016-12-22 | 2021-03-02 | Sap Se | Automated query compliance analysis |
US10565173B2 (en) * | 2017-02-10 | 2020-02-18 | Wipro Limited | Method and system for assessing quality of incremental heterogeneous data |
US10514993B2 (en) | 2017-02-14 | 2019-12-24 | Google Llc | Analyzing large-scale data processing jobs |
CN107220283B (en) * | 2017-04-21 | 2019-11-08 | 东软集团股份有限公司 | Data processing method, device, storage medium and electronic equipment |
US9934287B1 (en) * | 2017-07-25 | 2018-04-03 | Capital One Services, Llc | Systems and methods for expedited large file processing |
US20200050612A1 (en) * | 2017-07-31 | 2020-02-13 | Splunk Inc. | Supporting additional query languages through distributed execution of query engines |
US11921672B2 (en) | 2017-07-31 | 2024-03-05 | Splunk Inc. | Query execution at a remote heterogeneous data store of a data fabric service |
US20200065303A1 (en) * | 2017-07-31 | 2020-02-27 | Splunk Inc. | Addressing memory limits for partition tracking among worker nodes |
US11423083B2 (en) | 2017-10-27 | 2022-08-23 | Ab Initio Technology Llc | Transforming a specification into a persistent computer program |
US11055074B2 (en) * | 2017-11-13 | 2021-07-06 | Ab Initio Technology Llc | Key-based logging for processing of structured data items with executable logic |
EP3743820A1 (en) * | 2018-01-25 | 2020-12-02 | Ab Initio Technology LLC | Techniques for integrating validation results in data profiling and related systems and methods |
US11068540B2 (en) | 2018-01-25 | 2021-07-20 | Ab Initio Technology Llc | Techniques for integrating validation results in data profiling and related systems and methods |
US11334543B1 (en) | 2018-04-30 | 2022-05-17 | Splunk Inc. | Scalable bucket merging for a data intake and query system |
EP3575980A3 (en) | 2018-05-29 | 2020-03-04 | Accenture Global Solutions Limited | Intelligent data quality |
JP7464543B2 (en) * | 2018-07-19 | 2024-04-09 | アビニシオ テクノロジー エルエルシー | Publishing to a Data Warehouse |
US11080266B2 (en) * | 2018-07-30 | 2021-08-03 | Futurewei Technologies, Inc. | Graph functional dependency checking |
US20200074541A1 (en) | 2018-09-05 | 2020-03-05 | Consumerinfo.Com, Inc. | Generation of data structures based on categories of matched data items |
US11227065B2 (en) | 2018-11-06 | 2022-01-18 | Microsoft Technology Licensing, Llc | Static data masking |
US11423009B2 (en) * | 2019-05-29 | 2022-08-23 | ThinkData Works, Inc. | System and method to prevent formation of dark data |
US11704494B2 (en) * | 2019-05-31 | 2023-07-18 | Ab Initio Technology Llc | Discovering a semantic meaning of data fields from profile data of the data fields |
US11153400B1 (en) | 2019-06-04 | 2021-10-19 | Thomas Layne Bascom | Federation broker system and method for coordinating discovery, interoperability, connections and correspondence among networked resources |
CN111143433A (en) * | 2019-12-10 | 2020-05-12 | 中国平安财产保险股份有限公司 | Method and device for counting data of data bins |
KR102365910B1 (en) * | 2019-12-31 | 2022-02-22 | 가톨릭관동대학교산학협력단 | Data profiling method and data profiling system using attribute value quality index |
FR3105844A1 (en) * | 2019-12-31 | 2021-07-02 | Bull Sas | PROCESS AND SYSTEM FOR IDENTIFYING RELEVANT VARIABLES |
US11922222B1 (en) | 2020-01-30 | 2024-03-05 | Splunk Inc. | Generating a modified component for a data intake and query system using an isolated execution environment image |
US11200215B2 (en) * | 2020-01-30 | 2021-12-14 | International Business Machines Corporation | Data quality evaluation |
US11321340B1 (en) | 2020-03-31 | 2022-05-03 | Wells Fargo Bank, N.A. | Metadata extraction from big data sources |
US11556563B2 (en) * | 2020-06-12 | 2023-01-17 | Oracle International Corporation | Data stream processing |
US11403268B2 (en) * | 2020-08-06 | 2022-08-02 | Sap Se | Predicting types of records based on amount values of records |
US11704313B1 (en) | 2020-10-19 | 2023-07-18 | Splunk Inc. | Parallel branch operation using intermediary nodes |
KR102265937B1 (en) * | 2020-12-21 | 2021-06-17 | 주식회사 모비젠 | Method for analyzing sequence data and apparatus thereof |
US11847390B2 (en) | 2021-01-05 | 2023-12-19 | Capital One Services, Llc | Generation of synthetic data using agent-based simulations |
US20220215243A1 (en) * | 2021-01-05 | 2022-07-07 | Capital One Services, Llc | Risk-Reliability Framework for Evaluating Synthetic Data Models |
US11537594B2 (en) | 2021-02-05 | 2022-12-27 | Oracle International Corporation | Approximate estimation of number of distinct keys in a multiset using a sample |
CN112925792B (en) * | 2021-03-26 | 2024-01-05 | 北京中经惠众科技有限公司 | Data storage control method, device, computing equipment and medium |
CN113656430B (en) * | 2021-08-12 | 2024-02-27 | 上海二三四五网络科技有限公司 | Control method and device for automatic expansion of batch table data |
KR102437098B1 (en) | 2022-04-15 | 2022-08-25 | 이찬영 | Method and apparatus for determining error data based on artificial intenigence |
US11907051B1 (en) | 2022-09-07 | 2024-02-20 | International Business Machines Corporation | Correcting invalid zero value for data monitoring |
Family Cites Families (73)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2760794B2 (en) * | 1988-01-29 | 1998-06-04 | 株式会社日立製作所 | Database processing method and apparatus |
US5179643A (en) * | 1988-12-23 | 1993-01-12 | Hitachi, Ltd. | Method of multi-dimensional analysis and display for a large volume of record information items and a system therefor |
JPH032938A (en) | 1989-05-31 | 1991-01-09 | Hitachi Ltd | Data base processing method |
JPH04152440A (en) * | 1990-10-17 | 1992-05-26 | Hitachi Ltd | Intelligent inquiry processing method |
FR2698465B1 (en) | 1992-11-20 | 1995-01-13 | Bull Sa | Method for extracting statistics profiles, use of statistics created by the method. |
US5742806A (en) * | 1994-01-31 | 1998-04-21 | Sun Microsystems, Inc. | Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system |
JP3519126B2 (en) | 1994-07-14 | 2004-04-12 | 株式会社リコー | Automatic layout system |
US5842200A (en) * | 1995-03-31 | 1998-11-24 | International Business Machines Corporation | System and method for parallel mining of association rules in databases |
US6601048B1 (en) * | 1997-09-12 | 2003-07-29 | Mci Communications Corporation | System and method for detecting and managing fraud |
US5966072A (en) | 1996-07-02 | 1999-10-12 | Ab Initio Software Corporation | Executing computations expressed as graphs |
US5778373A (en) | 1996-07-15 | 1998-07-07 | At&T Corp | Integration of an information server database schema by generating a translation map from exemplary files |
US6138123A (en) * | 1996-07-25 | 2000-10-24 | Rathbun; Kyle R. | Method for creating and using parallel data structures |
JPH1055367A (en) | 1996-08-09 | 1998-02-24 | Hitachi Ltd | Data utilization system |
US5845285A (en) * | 1997-01-07 | 1998-12-01 | Klein; Laurence C. | Computer system and method of data analysis |
US5987453A (en) | 1997-04-07 | 1999-11-16 | Informix Software, Inc. | Method and apparatus for performing a join query in a database system |
US6134560A (en) * | 1997-12-16 | 2000-10-17 | Kliebhan; Daniel F. | Method and apparatus for merging telephone switching office databases |
US6826556B1 (en) * | 1998-10-02 | 2004-11-30 | Ncr Corporation | Techniques for deploying analytic models in a parallel |
US6959300B1 (en) * | 1998-12-10 | 2005-10-25 | At&T Corp. | Data compression method and apparatus |
US6343294B1 (en) | 1998-12-15 | 2002-01-29 | International Business Machines Corporation | Data file editor for multiple data subsets |
JP4037001B2 (en) * | 1999-02-23 | 2008-01-23 | 三菱電機株式会社 | Database creation device and database search device |
US6741995B1 (en) * | 1999-03-23 | 2004-05-25 | Metaedge Corporation | Method for dynamically creating a profile |
US6430539B1 (en) * | 1999-05-06 | 2002-08-06 | Hnc Software | Predictive modeling of consumer financial behavior |
US6163774A (en) | 1999-05-24 | 2000-12-19 | Platinum Technology Ip, Inc. | Method and apparatus for simplified and flexible selection of aggregate and cross product levels for a data warehouse |
US6801938B1 (en) * | 1999-06-18 | 2004-10-05 | Torrent Systems, Inc. | Segmentation and processing of continuous data streams using transactional semantics |
JP4600847B2 (en) | 1999-06-18 | 2010-12-22 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Segmentation and processing of continuous data streams using transaction semantics |
JP3318834B2 (en) | 1999-07-30 | 2002-08-26 | 三菱電機株式会社 | Data file system and data retrieval method |
JP3567861B2 (en) | 2000-07-07 | 2004-09-22 | 日本電信電話株式会社 | Information source location estimation method and apparatus, and storage medium storing information source location estimation program |
JP4366845B2 (en) * | 2000-07-24 | 2009-11-18 | ソニー株式会社 | Data processing apparatus, data processing method, and program providing medium |
US6788302B1 (en) * | 2000-08-03 | 2004-09-07 | International Business Machines Corporation | Partitioning and load balancing graphical shape data for parallel applications |
US20020073138A1 (en) * | 2000-12-08 | 2002-06-13 | Gilbert Eric S. | De-identification and linkage of data records |
US6952693B2 (en) * | 2001-02-23 | 2005-10-04 | Ran Wolff | Distributed mining of association rules |
US20020161778A1 (en) * | 2001-02-24 | 2002-10-31 | Core Integration Partners, Inc. | Method and system of data warehousing and building business intelligence using a data storage model |
US20020120602A1 (en) * | 2001-02-28 | 2002-08-29 | Ross Overbeek | System, method and computer program product for simultaneous analysis of multiple genomes |
JP2002269114A (en) * | 2001-03-14 | 2002-09-20 | Kousaku Ookubo | Knowledge database, and method for constructing knowledge database |
US20030033138A1 (en) * | 2001-07-26 | 2003-02-13 | Srinivas Bangalore | Method for partitioning a data set into frequency vectors for clustering |
US7130852B2 (en) * | 2001-07-27 | 2006-10-31 | Silicon Valley Bank | Internal security system for a relational database system |
AU2002355530A1 (en) * | 2001-08-03 | 2003-02-24 | John Allen Ananian | Personalized interactive digital catalog profiling |
US6801903B2 (en) | 2001-10-12 | 2004-10-05 | Ncr Corporation | Collecting statistics in a database system |
US20030140027A1 (en) * | 2001-12-12 | 2003-07-24 | Jeffrey Huttel | Universal Programming Interface to Knowledge Management (UPIKM) database system with integrated XML interface |
US7813937B1 (en) * | 2002-02-15 | 2010-10-12 | Fair Isaac Corporation | Consistency modeling of healthcare claims to detect fraud and abuse |
US7031969B2 (en) | 2002-02-20 | 2006-04-18 | Lawrence Technologies, Llc | System and method for identifying relationships between database records |
WO2003081391A2 (en) * | 2002-03-19 | 2003-10-02 | Mapinfo Corporation | Location based service provider |
US20040083199A1 (en) * | 2002-08-07 | 2004-04-29 | Govindugari Diwakar R. | Method and architecture for data transformation, normalization, profiling, cleansing and validation |
US6657568B1 (en) | 2002-08-27 | 2003-12-02 | Fmr Corp. | Data packing for real-time streaming |
US7047230B2 (en) * | 2002-09-09 | 2006-05-16 | Lucent Technologies Inc. | Distinct sampling system and a method of distinct sampling for optimizing distinct value query estimates |
WO2004036461A2 (en) * | 2002-10-14 | 2004-04-29 | Battelle Memorial Institute | Information reservoir |
US7698163B2 (en) * | 2002-11-22 | 2010-04-13 | Accenture Global Services Gmbh | Multi-dimensional segmentation for use in a customer interaction |
US7403942B1 (en) * | 2003-02-04 | 2008-07-22 | Seisint, Inc. | Method and system for processing data records |
US7117222B2 (en) * | 2003-03-13 | 2006-10-03 | International Business Machines Corporation | Pre-formatted column-level caching to improve client performance |
US7433861B2 (en) * | 2003-03-13 | 2008-10-07 | International Business Machines Corporation | Byte-code representations of actual data to reduce network traffic in database transactions |
US20040249810A1 (en) * | 2003-06-03 | 2004-12-09 | Microsoft Corporation | Small group sampling of data for use in query processing |
GB0314591D0 (en) | 2003-06-21 | 2003-07-30 | Ibm | Profiling data in a data store |
US7426520B2 (en) | 2003-09-10 | 2008-09-16 | Exeros, Inc. | Method and apparatus for semantic discovery and mapping between data sources |
ATE515746T1 (en) * | 2003-09-15 | 2011-07-15 | Ab Initio Technology Llc | DATA PROFILING |
US7587394B2 (en) * | 2003-09-23 | 2009-09-08 | International Business Machines Corporation | Methods and apparatus for query rewrite with auxiliary attributes in query processing operations |
US7149736B2 (en) | 2003-09-26 | 2006-12-12 | Microsoft Corporation | Maintaining time-sorted aggregation records representing aggregations of values from multiple database records using multiple partitions |
US7698345B2 (en) | 2003-10-21 | 2010-04-13 | The Nielsen Company (Us), Llc | Methods and apparatus for fusing databases |
US20050177578A1 (en) | 2004-02-10 | 2005-08-11 | Chen Yao-Ching S. | Efficient type annontation of XML schema-validated XML documents without schema validation |
US7376656B2 (en) * | 2004-02-10 | 2008-05-20 | Microsoft Corporation | System and method for providing user defined aggregates in a database system |
US8447743B2 (en) * | 2004-08-17 | 2013-05-21 | International Business Machines Corporation | Techniques for processing database queries including user-defined functions |
US7774346B2 (en) | 2005-08-26 | 2010-08-10 | Oracle International Corporation | Indexes that are based on bitmap values and that use summary bitmap values |
US20070073721A1 (en) * | 2005-09-23 | 2007-03-29 | Business Objects, S.A. | Apparatus and method for serviced data profiling operations |
US8271452B2 (en) | 2006-06-12 | 2012-09-18 | Rainstor Limited | Method, system, and database archive for enhancing database archiving |
US8412713B2 (en) | 2007-03-06 | 2013-04-02 | Mcafee, Inc. | Set function calculation in a database |
US7912867B2 (en) * | 2008-02-25 | 2011-03-22 | United Parcel Services Of America, Inc. | Systems and methods of profiling data for integration |
US9251212B2 (en) * | 2009-03-27 | 2016-02-02 | Business Objects Software Ltd. | Profiling in a massive parallel processing environment |
EP2478433A4 (en) | 2009-09-16 | 2016-09-21 | Ab Initio Technology Llc | Mapping dataset elements |
CA2779087C (en) | 2009-11-13 | 2019-08-20 | Ab Initio Technology Llc | Managing record format information |
US8396873B2 (en) * | 2010-03-10 | 2013-03-12 | Emc Corporation | Index searching using a bloom filter |
US8296274B2 (en) * | 2011-01-27 | 2012-10-23 | Leppard Andrew | Considering multiple lookups in bloom filter decision making |
JP6066927B2 (en) * | 2011-01-28 | 2017-01-25 | アビニシオ テクノロジー エルエルシー | Generation of data pattern information |
US8610605B2 (en) | 2011-06-17 | 2013-12-17 | Sap Ag | Method and system for data compression |
US8762396B2 (en) * | 2011-12-22 | 2014-06-24 | Sap Ag | Dynamic, hierarchical bloom filters for network data routing |
-
2004
- 2004-09-15 AT AT04784113T patent/ATE515746T1/en not_active IP Right Cessation
- 2004-09-15 US US10/941,373 patent/US7849075B2/en active Active
- 2004-09-15 JP JP2006526986A patent/JP5328099B2/en active Active
- 2004-09-15 EP EP20100009155 patent/EP2261820A3/en not_active Withdrawn
- 2004-09-15 WO PCT/US2004/030144 patent/WO2005029369A2/en active Search and Examination
- 2004-09-15 CA CA002538568A patent/CA2538568C/en active Active
- 2004-09-15 KR KR1020097003696A patent/KR20090039803A/en not_active Application Discontinuation
- 2004-09-15 KR KR1020077021526A patent/KR100922141B1/en active IP Right Grant
- 2004-09-15 US US10/941,402 patent/US8868580B2/en active Active
- 2004-09-15 EP EP04784113A patent/EP1676217B1/en active Active
- 2004-09-15 KR KR1020067005255A patent/KR100899850B1/en active IP Right Grant
- 2004-09-15 CA CA2655735A patent/CA2655735C/en active Active
- 2004-09-15 EP EP10009234.5A patent/EP2261821B1/en active Active
- 2004-09-15 CN CN201210367944.3A patent/CN102982065B/en active Active
- 2004-09-15 KR KR1020077021527A patent/KR101033179B1/en active IP Right Grant
- 2004-09-15 AU AU2004275334A patent/AU2004275334B9/en active Active
- 2004-09-15 CA CA2655731A patent/CA2655731C/en active Active
- 2004-09-15 US US10/941,401 patent/US7756873B2/en active Active
-
2006
- 2006-12-28 HK HK06114200.1A patent/HK1093568A1/en unknown
-
2009
- 2009-01-28 AU AU2009200293A patent/AU2009200293B2/en active Active
- 2009-01-28 AU AU2009200294A patent/AU2009200294A1/en not_active Abandoned
-
2010
- 2010-07-06 JP JP2010153800A patent/JP5372851B2/en active Active
- 2010-07-06 JP JP2010153799A patent/JP5372850B2/en active Active
-
2014
- 2014-10-20 US US14/519,030 patent/US9323802B2/en active Active
-
2016
- 2016-04-22 US US15/135,852 patent/US20160239532A1/en not_active Abandoned
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2655735A1 (en) | Data profiling | |
JP2007506191A5 (en) | ||
US8019795B2 (en) | Data warehouse test automation framework | |
CN102799634B (en) | Data storage method and device | |
US7856416B2 (en) | Automated latent star schema discovery tool | |
US9135296B2 (en) | System, method, and data structure for automatically generating database queries which are data model independent and cardinality independent | |
GB2513472A (en) | Resolving similar entities from a database | |
CN104765745B (en) | Loading data in database are carried out with the method and system of logic checking | |
WO2006022739B1 (en) | Method and system for processing grammar-based legality expressions | |
US11347719B2 (en) | Multi-table data validation tool | |
US11550790B2 (en) | Object relational mapper for non-relational databases | |
US20050004918A1 (en) | Populating a database using inferred dependencies | |
US20200192897A1 (en) | Grouping datasets | |
CN110659282B (en) | Data route construction method, device, computer equipment and storage medium | |
CN110795524B (en) | Main data mapping processing method and device, computer equipment and storage medium | |
US20110106775A1 (en) | Method and apparatus for managing multiple document versions in a large scale document repository | |
US20180032603A1 (en) | Extracting graph topology from distributed databases | |
WO2021036452A1 (en) | Real-time data deduplication counting method and device | |
CN102402615A (en) | Method for tracking source information based on structured query language (SQL) sentences | |
CN106844320B (en) | Financial statement integration method and equipment | |
EP2251802B1 (en) | Method and program for generating a subset of data from a database | |
US20180341709A1 (en) | Unstructured search query generation from a set of structured data terms | |
CN110147396B (en) | Mapping relation generation method and device | |
CN110008264B (en) | Data acquisition method and device of cost accounting system | |
CN115292297B (en) | Method and system for constructing data quality monitoring rule of data warehouse |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |