WO2017116341A2 - A system for parallel processing and data modelling - Google Patents

A system for parallel processing and data modelling Download PDF

Info

Publication number
WO2017116341A2
WO2017116341A2 PCT/TR2016/000209 TR2016000209W WO2017116341A2 WO 2017116341 A2 WO2017116341 A2 WO 2017116341A2 TR 2016000209 W TR2016000209 W TR 2016000209W WO 2017116341 A2 WO2017116341 A2 WO 2017116341A2
Authority
WO
WIPO (PCT)
Prior art keywords
xml
unit
processing
data
parallel
Prior art date
Application number
PCT/TR2016/000209
Other languages
French (fr)
Other versions
WO2017116341A3 (en
Inventor
Sezgin ONDER
Original Assignee
Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi filed Critical Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi
Publication of WO2017116341A2 publication Critical patent/WO2017116341A2/en
Publication of WO2017116341A3 publication Critical patent/WO2017116341A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • G06F16/86Mapping to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/221Parsing markup language streams

Definitions

  • the present invention relates to a system which ensures that XML (Extensible Markup Language) documents are processed in parallel, the data included inside the XML documents is written to a database such that it is ready for a data analysis to be performed on thereof upon being modelled, thus data analysis can be performed in parallel as well.
  • XML Extensible Markup Language
  • XML is a frequently used technology which is used for creating semi-structured data structure; which is extensible, universal and used generally for configuration management of software solutions and messaging of devices and software inside networks.
  • network providers in terms of turning the data -which are used while carrying out system maintenance- into value; being able to perform data analysis after processing and data modelling of XML documents can provide positive gains for reducing operational costs by performing analysis of changes made in an operational sense manually and including automation solutions to
  • a solution enabling to perform efficient and quick XML processing, data modelling and then efficient data analysis for frequently used XML technology can also make serious contributions for data analysis methods which can be used in the fields of OSS/BSS (Operations Support System/Business Support System) in particularly telecom world.
  • OSS/BSS Operations Support System/Business Support System
  • the United States patent document no. US20140089332 an application in the state of the art, discloses a system for converting XML documents in parallel.
  • the United States patent document no. US2009089658 another application in the state of the art, discloses a system for modelling XML documents and recording them to a database.
  • An objective of the present invention is to realize a system which ensures that XML (Extensible Markup Language) documents are processed in parallel, the data included inside the XML documents is written to a database such that it is ready for a data analysis to be performed on thereof upon being modelled, thus data analysis can be performed in parallel as well.
  • XML Extensible Markup Language
  • Figure 1 is a schematic view of the inventive system. The components illustrated in the figure are individually numbered, where the numbers refer to the following: 1.
  • the inventive system (1) for parallel processing and data modelling comprises:
  • At least one XML processing and modelling unit (3) where the XML receiving unit (2) transmits the received XML documents, which processes the XML documents in parallel and models the data obtained from the processed XML documents in a memory tree structure;
  • M mediator unit
  • At least one XML writing unit (5) where the XML processing and modelling unit (3) transmits the data obtained and modelled by processing the XML documents and which writes this data to the database (4) in parallel.
  • XML documents fed or received from XML source (K) will be mentioned in the description of the inventive system (1) and the expression of XML document will be used in the rest of the description of the invention.
  • XML documents can also be considered as XML messages in different embodiments of the invention and all kinds of transactions carried out in the inventive system (1) with respect to XML documents can also be carried out by XML messages.
  • the transactions which are stated to be carried out in the inventive system (1) in parallel are also transactions which are carried out by a plurality of threads at the same time as well.
  • the XML receiving unit (2) is a unit where XML documents flow from a XML source (K) or which receives XML documents from a XML source (K).
  • the XML receiving unit (2) is a unit which receives XML documents from XML source (K) by means of a plurality of threads in parallel such that it will process each thread in the same way and transfers them to the XML processing and modelling unit (3) -that will process XML documents in parallel and model the data- again in parallel.
  • the XML processing and modelling unit (3) is a unit where the XML receiving unit (2) transmits the received XML documents, which processes the XML documents in parallel and models the data obtained from the processed XML documents in a memory tree structure.
  • the XML processing and modelling unit (3) is a unit which is configured such that it can process an XML document having a correct format without knowing its schema, data sequence and structure, data types.
  • the XML processing and modelling unit (3) is a unit which is configured such that it will not require an obligation such that XML document is valid.
  • the modelled document which is created by modelling the data obtained by processing the XML document can also be created by the XML processing and modelling unit (3) as a modelled document on thereof such that analysis can be performed via a text-based search by any mediator unit (M) without having to know XML structure.
  • the XML processing and modelling unit (3) is a unit which is configured such that it takes a XML document -which does not completely have a correct format but part of it has a correct format- and can process its part having a correct format.
  • the XML processing and modelling unit (3) is a unit which creates data modelled such that it can be written to the databases (4) that are relational or non-relational as output.
  • the mediator unit (M) can send query to the database (4) only for the related part inside the data modelled over SQL (Structured Query Language) commands directly.
  • queries can be made by the mediator unit (M) by means of SQL and its derivative methods via an interface layer.
  • the XML processing and modelling unit (3) is a unit which models XML documents such that they can be inserted to a database (4) model having a single table.
  • the database (4) is a central unit wherein the modelled data is written by the XML writing unit (5) in parallel and which is configured such that a mediator unit (M) generating report, event, analysis result or alarm accesses so as to be transmitted to end systems (S) or displayed to persons by means of communication technique or displayed on an interface can perform analysis on thereof in parallel.
  • the database (4) can be a relational or non-relational database.
  • the database (4) is a unit keeping data in a table which has: ID that is an original identifier; PARENTID which is an original identifier of an upper tag in tag hierarchy in XML document; TAGNAME which is the string value of the tag; a CONTENT TYPE which indicates the content type of the tag, is evaluated by an enumeration value and relates to the tag hierarchy; CONTENT which indicates the value inside the tag; and CONTENT SEQ columns which helps multiple lines to be sent to the table upon being parsed if the hierarchy value of the feature under the tag or of the tag is too long.
  • CREATE DATE wherein there are date values created by the value of that moment for the lines added to the table and APP_NAME which determines for which application the table is being filled can be two of the columns included in the database (4) as well.
  • the said columns can be varied in many more ways.
  • the table located on the database (4) is a table wherein logical relations are created by means of additional columns over fields (for example, CONTENT TYPE) that can be diversified by enumeration.
  • the database (4) is a central storage space which stores XML data whereon analysis that can serve many purposes can be performed according to the content of the XML document received from the XML source (K) and modelled by the XML processing and modelling unit (3).
  • the XML writing unit (5) is a unit where the XML processing and modelling unit (3) transmits the data obtained and modelled by processing the XML documents and which writes this data to the database (4) in parallel.
  • the XML writing unit (5) is a unit which serializes the modelled XML data -that is received from the XML processing and modelling unit (3) in parallel- to as one serializer for each parallel branch.

Abstract

The present invention relates to a system (1) which ensures that XML (Extensible Markup Language) documents are processed in parallel, the data included inside the XML documents is written to a database such that it is ready for a data analysis to be performed on thereof upon being modelled, thus data analysis can be performed in parallel as well. The inventive system (1) comprises XML receiving unit (2), XML processing unit (3), database (4) and XML writing unit (5).

Description

DESCRIPTION
A SYSTEM FOR PARALLEL PROCESSING AND DATA MODELLING Technical Field
The present invention relates to a system which ensures that XML (Extensible Markup Language) documents are processed in parallel, the data included inside the XML documents is written to a database such that it is ready for a data analysis to be performed on thereof upon being modelled, thus data analysis can be performed in parallel as well.
Background of the Invention XML is a frequently used technology which is used for creating semi-structured data structure; which is extensible, universal and used generally for configuration management of software solutions and messaging of devices and software inside networks. There are information valuable in terms of capture and interpretation inside the messages sent by these configuration methods and software to one another for communication and messaging of devices. Therefore, analysis to be performed over XML documents and messages have great value; solutions which will ensure that processings of XML documents and messages will written to the database such that the data inside these documents and messages are ready to be data analysed on thereof are needed in the current technique.
In addition to making contribution to owners of platforms wherein XML documents are created, network providers in terms of turning the data -which are used while carrying out system maintenance- into value; being able to perform data analysis after processing and data modelling of XML documents can provide positive gains for reducing operational costs by performing analysis of changes made in an operational sense manually and including automation solutions to
l systems at such points. A solution enabling to perform efficient and quick XML processing, data modelling and then efficient data analysis for frequently used XML technology can also make serious contributions for data analysis methods which can be used in the fields of OSS/BSS (Operations Support System/Business Support System) in particularly telecom world.
The United States patent document no. US20140089332, an application in the state of the art, discloses a system for converting XML documents in parallel. The United States patent document no. US2009089658, another application in the state of the art, discloses a system for modelling XML documents and recording them to a database.
The United States patent document no. US20050055355, another application in the state of the art, discloses a system for recording XML documents to databases and querying them.
Summary of the Invention An objective of the present invention is to realize a system which ensures that XML (Extensible Markup Language) documents are processed in parallel, the data included inside the XML documents is written to a database such that it is ready for a data analysis to be performed on thereof upon being modelled, thus data analysis can be performed in parallel as well.
Detailed Description of the Invention
"A System for Parallel Processing and Data Modelling" realized to fulfill the objective of the present invention is shown in the figure attached, in which:
Figure 1 is a schematic view of the inventive system. The components illustrated in the figure are individually numbered, where the numbers refer to the following: 1. System
2. XML receiving unit
3. XML processing and modelling unit
4. Database
5. XML writing unit
K. XML source
M. Mediator unit
S. End system The inventive system (1) for parallel processing and data modelling comprises:
at least one XML receiving unit (2) where XML documents flow from a XML source (K) or which receives XML documents from a XML source (K);
at least one XML processing and modelling unit (3) where the XML receiving unit (2) transmits the received XML documents, which processes the XML documents in parallel and models the data obtained from the processed XML documents in a memory tree structure;
at least one database (4) wherein the modelled data is written by the XML writing unit (5) in parallel and which is configured such that a mediator unit (M) generating report, event, analysis result or alarm accesses so as to be transmitted to end systems (S) or displayed to persons by means of communication technique or displayed on an interface can perform analysis on thereof in parallel;
at least one XML writing unit (5) where the XML processing and modelling unit (3) transmits the data obtained and modelled by processing the XML documents and which writes this data to the database (4) in parallel.
XML documents fed or received from XML source (K) will be mentioned in the description of the inventive system (1) and the expression of XML document will be used in the rest of the description of the invention. However, as will be understood by a person skilled in the art, XML documents can also be considered as XML messages in different embodiments of the invention and all kinds of transactions carried out in the inventive system (1) with respect to XML documents can also be carried out by XML messages. In addition, as will be understood by a person skilled in the art, the transactions which are stated to be carried out in the inventive system (1) in parallel are also transactions which are carried out by a plurality of threads at the same time as well. The XML receiving unit (2) is a unit where XML documents flow from a XML source (K) or which receives XML documents from a XML source (K). In one preferred embodiment of the invention, the XML receiving unit (2) is a unit which receives XML documents from XML source (K) by means of a plurality of threads in parallel such that it will process each thread in the same way and transfers them to the XML processing and modelling unit (3) -that will process XML documents in parallel and model the data- again in parallel.
The XML processing and modelling unit (3) is a unit where the XML receiving unit (2) transmits the received XML documents, which processes the XML documents in parallel and models the data obtained from the processed XML documents in a memory tree structure.
The XML processing and modelling unit (3) is a unit which is configured such that it can process an XML document having a correct format without knowing its schema, data sequence and structure, data types. The XML processing and modelling unit (3) is a unit which is configured such that it will not require an obligation such that XML document is valid. By means of the fact that such type of requirements do not exist during processing and only the XML document is a XML document having a correct format, the modelled document which is created by modelling the data obtained by processing the XML document can also be created by the XML processing and modelling unit (3) as a modelled document on thereof such that analysis can be performed via a text-based search by any mediator unit (M) without having to know XML structure.
In one embodiment of the invention, the XML processing and modelling unit (3) is a unit which is configured such that it takes a XML document -which does not completely have a correct format but part of it has a correct format- and can process its part having a correct format.
The XML processing and modelling unit (3) is a unit which creates data modelled such that it can be written to the databases (4) that are relational or non-relational as output. When this data is stored in the relational databases (4), the mediator unit (M) can send query to the database (4) only for the related part inside the data modelled over SQL (Structured Query Language) commands directly. Whereas when the said data is stored in the non-relational databases (4), queries can be made by the mediator unit (M) by means of SQL and its derivative methods via an interface layer. In one preferred embodiment of the invention, the XML processing and modelling unit (3) is a unit which models XML documents such that they can be inserted to a database (4) model having a single table. The database (4) is a central unit wherein the modelled data is written by the XML writing unit (5) in parallel and which is configured such that a mediator unit (M) generating report, event, analysis result or alarm accesses so as to be transmitted to end systems (S) or displayed to persons by means of communication technique or displayed on an interface can perform analysis on thereof in parallel. In different embodiments of the invention, the database (4) can be a relational or non-relational database. In an exemplary embodiment of the invention, the database (4) is a unit keeping data in a table which has: ID that is an original identifier; PARENTID which is an original identifier of an upper tag in tag hierarchy in XML document; TAGNAME which is the string value of the tag; a CONTENT TYPE which indicates the content type of the tag, is evaluated by an enumeration value and relates to the tag hierarchy; CONTENT which indicates the value inside the tag; and CONTENT SEQ columns which helps multiple lines to be sent to the table upon being parsed if the hierarchy value of the feature under the tag or of the tag is too long. In addition, CREATE DATE wherein there are date values created by the value of that moment for the lines added to the table and APP_NAME which determines for which application the table is being filled can be two of the columns included in the database (4) as well. As will be understood by a person skilled in the art, the said columns can be varied in many more ways. In one embodiment of the invention, the table located on the database (4) is a table wherein logical relations are created by means of additional columns over fields (for example, CONTENT TYPE) that can be diversified by enumeration.
As will be understood by a person skilled in the art, the database (4) is a central storage space which stores XML data whereon analysis that can serve many purposes can be performed according to the content of the XML document received from the XML source (K) and modelled by the XML processing and modelling unit (3). The XML writing unit (5) is a unit where the XML processing and modelling unit (3) transmits the data obtained and modelled by processing the XML documents and which writes this data to the database (4) in parallel. In one preferred embodiment of the invention, the XML writing unit (5) is a unit which serializes the modelled XML data -that is received from the XML processing and modelling unit (3) in parallel- to as one serializer for each parallel branch. With the inventive system (1), it is ensured that XML (Extensible Markup Language) documents are processed in parallel, the data included inside the XML documents is written to a database such that it is ready for a data analysis to be performed on thereof upon being modelled, thus data analysis can be performed in parallel as well and quick, automatized analysis having plenty of results can be performed by mediator units (M) consequently.
Within these basic concepts; it is possible to develop a wide range of embodiments of the inventive system (1), the invention cannot be limited to examples disclosed herein and it is essentially according to claims.

Claims

A system (1) for parallel processing and data modelling; comprising:
at least one XML receiving unit (2) where XML documents flow from a XML source (K) or which receives XML documents from a XML source (K);
at least one XML processing and modelling unit (3);
at least one database (4);
at least one XML writing unit (5) where the XML processing and modelling unit (3) transmits the data obtained and modelled by processing the XML documents and which writes this data to the database (4) in parallel;
and characterized by
at least one XML processing and modelling unit (3) where the XML receiving unit (2) transmits the received XML documents, which processes the XML documents in parallel and models the data obtained from the processed XML documents in a memory tree structure;
at least one database (4) wherein the modelled data is written by the XML writing unit (5) in parallel and which is configured such that a mediator unit (M) generating report, event, analysis result or alarm accesses so as to be transmitted to end systems (S) or displayed to persons by means of communication technique or displayed on an interface can perform analysis on thereof in parallel.
A system (1) according to Claim 1, characterized by the XML receiving unit (2) which receives XML documents from XML source (K) by means of a plurality of threads in parallel such that it will process each thread in the same way and transfers them to the XML processing and modelling unit (3) -that will process XML documents in parallel and model the data- again in parallel.
3. A system (1) according to Claim 1, characterized by the XML processing and modelling unit (3) which is configured such that it can process an XML document having a correct format without knowing its schema, data sequence and structure, data types.
4. A system (1) according to Claim 1, characterized by the XML processing and modelling unit (3) which is configured such that it will not require an obligation such that XML document is valid.
5. A system (1) according to Claim 1, characterized by the XML processing and modelling unit (3) which creates a modelled document on thereof such that analysis can be performed via a text-based search by any mediator unit (M) without having to know XML structure.
6. A system (1) according to Claim 1, characterized by the XML processing and modelling unit (3) which is configured such that it takes a XML document -that does not completely have a correct format but part of it has a correct format- and can process its part having a correct format.
7. A system (1) according to Claim 1, characterized by the XML processing and modelling unit (3) which creates data modelled such that it can be written to the databases (4) that are relational or non-relational as output.
8. A system (1) according to Claim 1, characterized by the XML processing and modelling unit (3) which models XML documents such that they can be inserted to a database (4) model having a single table.
9. A system (1) according to Claim 1, characterized by the database (4) which is a central storage space that stores XML data whereon analysis that can serve many purposes can be performed according to the content of the XML document received from the XML source (K) and modelled by the XML processing and modelling unit (3).
10. A system (1) according to Claim 1, characterized by the XML writing unit (5) which serializes the modelled XML data -that is received from the XML processing and modelling unit (3) in parallel- to as one serializer for each parallel branch.
PCT/TR2016/000209 2015-12-31 2016-12-26 A system for parallel processing and data modelling WO2017116341A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TR201517649 2015-12-31
TR2015/17649 2015-12-31

Publications (2)

Publication Number Publication Date
WO2017116341A2 true WO2017116341A2 (en) 2017-07-06
WO2017116341A3 WO2017116341A3 (en) 2017-08-03

Family

ID=58213311

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/TR2016/000209 WO2017116341A2 (en) 2015-12-31 2016-12-26 A system for parallel processing and data modelling

Country Status (1)

Country Link
WO (1) WO2017116341A2 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050055355A1 (en) 2003-09-05 2005-03-10 Oracle International Corporation Method and mechanism for efficient storage and query of XML documents based on paths
US20090089658A1 (en) 2007-09-27 2009-04-02 The Research Foundation, State University Of New York Parallel approach to xml parsing
US20140089332A1 (en) 2012-09-27 2014-03-27 Siemens Product Lifecycle Management Software Inc. Efficient conversion of xml data into a model using persistent stores and parallelism

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6631379B2 (en) * 2001-01-31 2003-10-07 International Business Machines Corporation Parallel loading of markup language data files and documents into a computer database
US7899834B2 (en) * 2004-12-23 2011-03-01 Sap Ag Method and apparatus for storing and maintaining structured documents
US20110289118A1 (en) * 2010-05-20 2011-11-24 Microsoft Corporation Mapping documents to a relational database table with a document position column

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050055355A1 (en) 2003-09-05 2005-03-10 Oracle International Corporation Method and mechanism for efficient storage and query of XML documents based on paths
US20090089658A1 (en) 2007-09-27 2009-04-02 The Research Foundation, State University Of New York Parallel approach to xml parsing
US20140089332A1 (en) 2012-09-27 2014-03-27 Siemens Product Lifecycle Management Software Inc. Efficient conversion of xml data into a model using persistent stores and parallelism

Also Published As

Publication number Publication date
WO2017116341A3 (en) 2017-08-03

Similar Documents

Publication Publication Date Title
CN111400408B (en) Data synchronization method, device, equipment and storage medium
US7024425B2 (en) Method and apparatus for flexible storage and uniform manipulation of XML data in a relational database system
US8583704B2 (en) Systems and methods for efficient data transfer
US8799240B2 (en) System and method for investigating large amounts of data
CN105183860B (en) Method of data synchronization and system
TWI290698B (en) System and method for updating and displaying patent citation information
CN101098248B (en) Method and system for implementing universal network management based on configuration describing document
US20040220927A1 (en) Techniques for retaining hierarchical information in mapping between XML documents and relational data
CN103034735A (en) Big data distributed file export method
US20180218052A1 (en) Extensible data driven etl framework
US10997131B1 (en) Using a member attribute to perform a database operation on a computing device
US20100185696A1 (en) Data tranformations for applications supporting different data formats
US20070027897A1 (en) Selectively structuring a table of contents for accesing a database
US11347620B2 (en) Parsing hierarchical session log data for search and analytics
CN103914572A (en) Database construction system, device and method
US8762398B2 (en) Method of integrating data of XML document with database on web
US7844601B2 (en) Quality of service feedback for technology-neutral data reporting
CN111241065B (en) Database adaptation development and operation method supporting domestic database
US7475090B2 (en) Method and apparatus for moving data from an extensible markup language format to normalized format
CN103176801B (en) A kind of generation method and device of table entry operation-interface function
US20040054686A1 (en) System and method for collecting and transferring sets of related data from a mainframe to a workstation
Park et al. A Study on the Link Server Development Using B-Tree Structure in the Big Data Environment
WO2017116341A2 (en) A system for parallel processing and data modelling
US20110055279A1 (en) Application server, object management method, and object management program
CN105740997A (en) Method and device for controlling task flow, and database management system

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16843211

Country of ref document: EP

Kind code of ref document: A2