WO2015138374A1 - Methods to represent and interact with complex knowledge - Google Patents

Methods to represent and interact with complex knowledge Download PDF

Info

Publication number
WO2015138374A1
WO2015138374A1 PCT/US2015/019573 US2015019573W WO2015138374A1 WO 2015138374 A1 WO2015138374 A1 WO 2015138374A1 US 2015019573 W US2015019573 W US 2015019573W WO 2015138374 A1 WO2015138374 A1 WO 2015138374A1
Authority
WO
WIPO (PCT)
Prior art keywords
knowledge
represent
network
sublists
entities
Prior art date
Application number
PCT/US2015/019573
Other languages
French (fr)
Inventor
Toni R. FARLEY
Spyro Mousses
Christopher YOO
Original Assignee
Systems Imagination, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Systems Imagination, Inc. filed Critical Systems Imagination, Inc.
Publication of WO2015138374A1 publication Critical patent/WO2015138374A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Methods for representing and interacting with semantic knowledge.

Description

METHODS TO REPRESENT AND INTERACT WITH COMPLEX
KNOWLEDGE
FIELD OF THE INVENTION
[0001] The present invention relates to databases and in particular to a database model to represent complex data relationships in knowledge, and a method to interact with the model.
BACKGROUND OF THE INVENTION
[0002] Data is not currently captured and processed by machines in a way that supports
semantics, abstract knowledge concepts, and the evolution of knowledge. Existing database models either do not capture knowledge semantics, or do not do so in a flexible and scalable way. Knowledge may be defined by the relationships among a plurality of data elements. Relational and graph data models are suited to capturing binary relationships, are not flexible to representing complex relationships, do not readily scale with increasing volumes of relationships. Complex relationships comprise relationships that are similar in construct, but indicative of different semantics in different contexts. The commonly used relational data model requires a schema defined a priori, which presents difficulties in capturing changing relationship structures as knowledge evolves.
[0003] The present disclosure provides a data model for capturing knowledge semantics, and a method to interact with the model. The data model is flexible to evolving knowledge defined by complex relationships, and scalable to large volumes of data. A concise method to interact with knowledge stored in the data model is provided. SUMMARY OF THE INVENTION
[0004] In order to overcome the challenges of representing semantics in a database in a flexible and scalable manner, we present a new data model, and method to interact with the model.
DETAILED DESCRIPTION OF THE INVENTION
[0005] The present invention discloses methods for representing semantic knowledge in a data model and interacting with knowledge in a persistent data store based on the model.
[0006] Semantic knowledge may be defined by a collection of related entities. In the present disclosure, an entity is defined by a unique identifier, and a pair of ordered lists comprising a number of sublists each. For example, the following four sublists represent four ways in which an entity relates to another entity, as described in Table 1.
Table 1 Four sublists of an entity (C4).
Figure imgf000003_0001
[0007] In Table 1 the pair of lists are denoted (X and β, and the four sublists are collectively referred to as C4. These lists, along with a unique identifier (UID) define an entity as:
UID, Ot, β (1)
[0008] The lists Ot and β in Table 1 have a reciprocal relationship. For instance, given an entity, x, the sublists in x(ot) contain the UIDs of other entities that relate to this entity by the semantic meaning given in Table 1, as:
1. composed of (has-a): x is an entity that is made up of the entities in this sub list
2. includes: x is a general classification of the specific entities in this sublist
3. derived from: x is a concept or entity that is derived from the combination of the entities in this sublist
4. caused by: x is an effect that is caused by the entities in this sublist
[0009] The sublists in χ(β) contain the UIDs of other entities that relate to this entity by the semantic meaning given in Table 1, as:
1. part of: x is a part of the entities in this sublist
2. member of (is-a): x is a member of the classification entities in this sublist
3. contributes to: x is one of the pluralities of interacting entities that contribute to the derived entity or concept in this sublist
4. effects: x is a cause that results in the effects in this sublist
[0010] By these definitions, the first sublist allows for abstractions, where an entity x can be viewed as a singular entity, or expanded and viewed by the sum of its parts using ^(Composition).
[0011] As an example, if x is defined as: x, [[a], [b], [c, d], [e]], [[], [], [], []] (2)
then, a, b, c, d and e are defined as: a, [[], [], [], []], [[x], [], [], []] (3)
b, [[], [], [], []], [[], [x], [], []] (4)
c, [[], Π, [], []], [[], [], [x], []] (5)
d, [[], [], D, []], [[], [], [x], []] (6)
e, [[], Π, [], []], [[], [], [], M] (7)
[0012] and it follows that a is a part of x, b is an x, c and d, when combined, form x, and e, when present, results in JC. To further elaborate on this example, if x is a car, then a may be a tire; if x is the classification, fruit, then b may be an apple; if x is mud, then c and d may be dirt and water respectively; or if x is a sister concept, then b may be a sibling concept, and c may be a female concept; and if JC is smoke, then e may be fire.
[0013] Further, a sublist may begin with a binary digit specifying whether or not ordering is imposed on the items in the list, where 0 denotes unordered and 1 denotes ordered.
[0014] A method to traverse a network of related entities to recover data elements and semantic knowledge is presented. In this method, we refer to the sublists C4 of a and β as: a(Ci), a (C2), a (Cs), a (C4) (8)
KCi), β (C2), p (C3), p (C4) (9)
[0015] A query is defined by a string of tokens in a language using the following grammar rules:
query -> path {operator path} (10)
path -> idlist list [count] (sublist) [return] (11) idlist ^ UID{, UID} (12) return 0 1 (13)
list -» |β (14)
sublist ^ Ci|C2|C3|C4 (15)
operator Λ V ^ (16)
count -> digit {digit} (17)
digit -» 0| 1 |2|3|4|5|6|7|8|9 (18)
[0016] with the following syntax rules:
delineates a non-terminal on the left hand side (LHS) and a production rule on the right hand side (RHS)
{. . . } denotes repetition of zero or more
[. . . ] denotes optional (zero or one)
I denotes a choice (logical or)
[0017] and the terminal symbols:
Figure imgf000005_0001
[0018] where the operators in (16) are logical conjunction, disjunction, and negation; the binary value of (13) specifies whether to return the entities traversed at this step, where 1 denotes return, 0 otherwise, wherein this value is optional and defaults to zero; the numeric value of (17) specifies how many times to repeat the following step, wherein this value is optional and defaults to one; and UID is a unique identifier on an entity.
[0019] The equation in (25) is an example query to extract "all parts of A that are members of B".
Aa(Ci) Λ Ba (C2)l (25)
[0020] In some embodiments the present invention may be used to persist and interact with knowledge in a network of systems to achieve computational intelligence that is analogous to the human mind. A system, Sj may be used for generating, capturing and representing models of reality, hypotheses, beliefs, predictions, contingencies and any other imagined relationship structures which may or may not exist in reality. These beliefs may take any form, and may be contributed by human imagination through an interface that allows users to pose models, hypotheses, predictions, contingencies etc. Depending on the source and origin of imagined beliefs, there may also be different kinds of systems that leverage human interfaces for manual inputs of testable models, or learning engine applications to automate the generation of new testable models.
[0021] In some embodiments a network of systems based on the present invention may include another system ¾. This system may comprise an engine to automatically generate knowledge structures that represent novel relationships that a system imagines, and applies unsupervised combinatorial approaches to generate novel structures that represent unique models with measurable predictions. The system may also emulate imagination by comparison of prior knowledge analogies to infer new contingencies that have not yet been imagined. The present invention supports a platform for this automated generation of testable models that can greatly expand the limited ability of humans to generate testable models, and therefore eliminate a critical bottleneck in the generation of knowledge. By linking automated belief generation to learned knowledge, such a system may iteratively improve its ability to infer or imagine testable hypotheses based on learning and experience, accelerating the evolution of new knowledge. The present invention provides a flexible and scalable platform to generate, capture, and store testable models, which may then provide input to a network of systems-based model testing processes.
[0022] In some embodiments a network of systems based on the present invention may include another system & that represents relationships that are precisely captured from observations, measurements, perceptions, or any other means that represents data inputs from perceiving reality. Such a system may provide an input to another system ¾ and by doing so, S3 provides the real world evidence repository to enable automated testing of models that are stored in and originated in Si or S2.
[0023] In some embodiments a network of systems based on the present invention may include another system ¾, which may function as a testing and learning system in the network by accepting model inputs from Si or ¾, and inputs from S3, and performing model fitting functions to test how well the predictions of the model fit data structures in S3. The output of model testing may be new structures used to annotate the models as contextually validated or not supported based on how well the predictions, contingencies, or models fit specific real world data in S3. Validation may comprise qualitative or quantitative evaluation to determine how well a prediction or contingency in Si or S2 matches the actual structures or relationships in S3. Si, S2, and S3 may store information primarily in primary and secondary network structures, while S4 may operate at a secondary and/or tertiary network level.
[0024] In some embodiments a network of networks based on the present invention may comprise federated networks of networks, wherein networks containing similar types of primary networks with different content may be linked in higher order secondary networks to share content. For example, hundreds of distributed observation networks containing primary networks with different content could be combined into a secondary federated (overlay) network with content awareness, so relationships could be hierarchically structured even further across the secondary network in a way that is analogous to how white matter networks tie together neocortical columns of neurons in the human brain.
[0025] In some embodiments a network of networks based on the present invention may comprise tertiary learning networks, wherein an overlay network may link different kinds of federated networks, which may be designed to achieve higher order functions, such as learning, to create and recover knowledge using the methods of the present invention. Knowledge may be defined as an imagined belief that has been tested and determined to fit well with real world observations. For instance, in the human brain, the two hemispheres of the brain divide the functions of storing and generating imagined beliefs, from capturing and storing observations that precisely represent the world as it is perceived.
[0026] Illustrative Example
[0027] The present invention may be used to structure knowledge in molecular biology, capturing the reciprocal hierarchy of nested relationships in biological structures, and mechanistic processes, such as: 1. A human comprises trillions of cells.
2. Zooming in on one cell, a bone cell normally comprises 46 chromosomes, divided into 23 pairs.
3. Zooming in on one chromosome, the entire sequence of DNA for chromosome 17 comprises about 12,000 different gene loci.
4. Zooming in on one gene on chromosome 17, the DNA region of the locus from base pair 7,668,401 to base pair 7,687,549 encompasses the TP53 gene.
5. The normal TP53 gene sequence is transcribed and translated resulting in codons that encode a sequence of 393 amino acids, which form the p53 protein.
6. Zooming in on the TP53 gene locus, a segment of DNA comprising the nucleotides, G, A, T, C defines exons and introns of the TP53 genomic DNA sequence.
7. The TP53 genomic DNA sequence results in the transcription of multiple RNA molecules which comprise a set of variant messenger RNA transcripts defined by alternative splicing of primary mRNA transcripts.
8. Exposure to radiation can cause mutations in a normal p53 gene sequence, and/or such mutations can be inherited.
9. A single base pair change in DNA in the first base of three-base codon 72 will encode for a different amino acid at protein position 72.
10. An altered amino acid causes the p53 protein to function differently.
11. p53 is also related to multiple causes/functions, including binding to dozens of other proteins, binding to DNA elements to transcriptionally control specific set of genes, and those genes in turn regulate higher order functions such as double stranded DNA repair, which in turn regulates genomic stability. Structurally altering the sequence of the TP53 gene therefore causes loss of the function of the p53 protein, which in turn deregulates genes controlled by p53, and changes to those genes in turn causes loss of DNA repair functions, and that in turn causes genomic instability, which in turn causes cancer. Representing the complex mechanism that starts with first order TP53 gene sequence changes at the DNA level, and maps/links to intermediary causes and effects in subsequent steps in a multistep mechanism that ends with higher order semantic concepts such as cancer predisposition or clinical drug response, therefore requires a system like the invention described in this disclosure, that can support capturing the multiscalar hierarchical knowledge relationships.
12. Thus, alterations in p53 protein function may cause changes in other pathways and cellular functions, such as DNA repair.
13. A bone cell might lose both copies of the TP53 gene. For example, the first hit may be an inherited deletion in the p53 gene, and the second, a codon 72 mutation caused by radiation.
14. Without any normal TP53, no normal p53 protein product is formed, which means DNA damage cannot be repaired by p53 like it is supposed to, and that lack of repair causes genomic instability to accumulate in that bone cell. When a critical mass of genetic instability and mutations have occurred, that bone cell's progeny inherit malignant traits, such as bypassing natural cell growth control, that allow selective growth advantages for cells that inherit those genetic mutations, thereby allowing for cellular evolution that can produce bone cancer (e.g. osteosarcoma).
15. Patients that have inherited mutations in the p53 gene, are classified as having Li-Fraumeni Syndrome, which is associated with sensitivity to DNA damaging agents, and predisposition to multiple types of cancer.
16. Therefore, families where a mutated p53 allele (an alternative form of the gene or locus) is passed down from generation to generation have higher incidence of certain types of cancer, like sarcoma, leukemia, breast cancer, etc. That increase in risk
(predisposition) is caused by inheriting the mutated p53 gene, in combination with exposure to environmental agents that cause additional DNA damage.
[0028] This predisposition example demonstrates that a system using the present invention may be applied in a real world example to represent the fundamental code of life as computable knowledge, allowing us to navigate up and down this hierarchy of relationships, compositions, combinations, classifications, causes, etc., and link new learned knowledge, or recover knowledge at any point of the hierarchy. For instance, a child with osteosarcoma may have p53 mutations in all of their normal and cancer cells, but the additional mutation in p 3 in his cancer cells may have caused additional somatic (meaning they are present only in the cancer cells) mutations in the EGFR gene. These EGFR mutations may have been linked to favorable response to a drug called erlotinib, but only in Squamous Cell Carcinoma. The context of having germ line (non-somatic) p53 mutations makes the patient a poor candidate for D A damaging chemotherapy because the normal cells will not be able to repair the DNA damage and the patient will likely die of toxicity or secondary cancer. EGFR inhibitors are targeted drugs that do not cause DNA damage, but erlotinib might not be indicated for Osteosarcoma (i.e. there is no clinical evidence). The knowledge above provides a causal mechanistic model, and observing that there is an EGFR mutation supports that hypothesis. Testing that n = 1 hypothesis, and observing a favorable response to erlotinib in this particular context may suggest that the hypothesis might be more generalizable, and that could be tested by suggesting a clinical trial of an adult lung cancer drug for a pediatric bone cancer. The new hypothesis is that other children with the same context, Li-Fraumeni Syndrome who have Osteosarcoma should be tested for EGFR mutations, and if they are present, they should enter into a trial to evaluate the effectiveness of erlotinib as an alternative to other treatments.
[0029] In this n = 1 analysis, imagine that we knew nothing about a child, other than their germline and cancer genomes. A system may use the present invention to:
1. comprehensively and fundamentally represent prior and patient-specific knowledge;
2. capture this knowledge in a way that allows all of the genetic and therapeutic mechanistic knowledge described in the example above to be effectively and efficiently applied to classify this osteosarcoma patient as having Li-Fraumeni Syndrome;
3. support systems level imagination of mechanistic hypotheses about plausible treatments given the context;
4. pose genetic analysis to test that imagined hypothesis; and
5. recover treatments beyond the standard of care (which would not be suitable for this context) that may be more relevant for that context.
[0030] To illustrate aspects of the present example, entity types may be represented by the entities in Table 2, wherein the unique identifiers (UIDs) comprise one or two alpha characters, which denote the UID for a "classification" entity, followed by x = [1..«], where n is the number of items in each classification respectively. For instance, CT is an entity and CTa(C/) = {CTx|x[l ..n]}, and so on.
[0031] Relationships for the items in Table 2 may be defined using the semantics of Table 1 as shown in Table 3, wherein "comprises" denotes a composition (composed of) relationship, "arise from" and "posits" denote combination (derived from)
relationships, "encoded by" denotes a causality (caused by) relationship, and "codes for" denotes a reciprocal causality (effects) relationship. Data sets and sequences in Table 3 are captured using set notation, and x, y, m, and n are variables.
[0032] The relationships in Table 3 may be represented by sublists for each entity defined in Table 4, where a lists that begin with a 1 are sequences (ordered), and a lists that begin with a 0 are unordered sets.
[0033] Given the data structure of the present example, we can compose queries to recover knowledge. For instance a query to recover biologically relevant treatment options (drugs) for PI may search for molecular contexts that contribute to PI, which in this case recovers sets of amplified and expressed genes, and recover any treatment options these genes
Table 2 Items in an illustrative example.
UIDs Item Specific Item in Example
CTx Clinical Trials CT1 represents an erlotinib trial
Px Patients (Clinical Trial Subjects) PI represents a clinical trial subject
CLx Cells CL1 and CL2 represent normal (germ line) and cancer (somatic) bone cells respectively
CHx Chromosomes CHI 7 represents Chromosome 17
Lx Loci LI 713 represents the Locus 17p 13.1
Cx Codons C54 and C22 represent the codons CGC and CCC respectively
Nx Nucleotides l, N2 represent the nucleotides guanine (G) and cytosine (C) respectively
AAx Amino Acids AA2 and AA15 represent the amino acids
arginine and proline respectively
Gx Genes Gl and G2 represents the genes TP53 and EGFR respectively
PRx Proteins PR1 represents the protein p53
Mx Gene Mutations Ml represents a specific inherited mutation in
TP53, and M2 represents a mutation in EGFR
Ex Environmental factors El represents exposure to radiation
COx Concepts COl represents sensitivity to DNA damaging agents, and C02 represents a favorable treatment outcome
GSx Gene States GS1 and GS2 represent classes of amplified and mutated gene states respectively
Dx Diseases/Disorders Dl and D2 represent Squamous Cell Carcinoma, and Osteosarcoma respectively
DRx Drugs (Pharmaceuticals) DR1 represents the drug erlotinib (an EGFR inhibitor)
Ax Molecular Actions Al represents inhibits
Tx Targets Tl represents a specific molecular target
Hx Hypotheses HI represents a hypothesis to be tested, wherein the hypotheses may be automatically generated
Table 3 Relationships in the illustrative example.
Figure imgf000012_0001
contribute to. Using the grammar rules of the present disclosure, this query may translate to: Pla(C3)a(C2)a(C3)a(C3)l (26)
[0034] The query of (26) returns a null set as there are no drug targets present in this knowledge that are associated with the patient's amplified and mutated genes.
[0035] A subsequent query (27) may expand the search to include additional molecular contexts that are related to the patient's amplified and mutated genes, as:
PI a (C3)2a (C2)ip(C4)ip(C2MC3)a(C3)l (27)
Table 4 Sublists generated in the illustrative example.
Figure imgf000013_0001
Dl [[ΠΠΠ]] [[],[],[C02],[]]
D2 [[MUM]] [[],[],[H1],[]]
DR1 [[],[],[],□ [[],[],[T1],[]]
Al [[],[],[],□ [[],[],[T1],[]]
Tl [[],[],[1,DR1,A1,G2],[]] [[],[],[C02,H1],[]]
HI [[],[],[0,CO2,D2, Tl],[]] [[],[],[],[]]
[0036] The query of (27) returns a subset of knowledge relating TP53 mutations (Ml) to EGFR mutations (M2) (the 1 's in the paths (11) specify to return these intermediate results), and the subset further comprises the target Tl, and its related drug erlotinib (DR1), mechanism of action, "inhibits" (Al), and the gene EGFR (G2).
[0037] A subsequent query on the recovered target may recover more knowledge of contexts related to this target:
Tip(C3)la(C3)l (28)
[0038] The query of (28) returns a subset of knowledge comprising the concept of favorable treatment outcome (C02) for Tl in the context of the disease Squamous Cell Carcinoma (Dl), and the hypothesis (HI) that the same treatment outcome (C02) may arise for Tl in the context of the disease Osteosarcoma (D2).
[0039] Illustrative Example 2
[0040] To further illustrate an exhaustive capture of prior knowledge, relations among nucleotides, codons, and amino acids may be represented in the present invention as shown in Table 5, where Cod refers to Codons in the first four sets.
Table 5 Capturing the Relationships Among
Nucleotides, Codons, and Amino Acids.
Figure imgf000014_0001
CTC [[C, T,C],[],[],[]] [[],D,[],[Leu]]
CTA [[C, T,A],[],[],[]] [[],D,[],[Leu]]
CTG [[C, T,G],[],[],[]] [[],[],[],[Leu]]
ATT [[A, T, T],[],[],[]] [[],[],[],[He]]
ATC [[A, T,C],[],[],[]] [[],[],[],[He]]
ATA [[A, T,A],[],[],[]] [[UnUIle]]
ATG [[A, T,G],[],[],[]] [[],[],[],[Met]]
GTT [[G, T, T],[],[],[]] [[],[],[],[Val]]
GTC [[G, T,C],[],[],[]] [[],[],[],[Val]]
GTA [[G, T,A],[],[],[]] [[],[],[],[Val]]
GTG [[G, T,G],[],[],[]] [[],[],[],[Val]]
TCT [[T,C, T],[],[],[]] [[],[],D,[Ser]]
TCC [[T,C,C],[],[],[]] [[],[],[],[Ser]]
TCA [[T,C,A],[],[],[]] [[],[],[],[Ser]]
TCG [[T,C,G],[],[],[]] [[],[],D,[Ser]]
CCT [[C,C, T],[],[],[]] [[],[],[],[Pro]]
CCC [[C,C,C],[],[],[]] [[],[],[],[Pro]]
CCA [[C,C,A],[],[],[]] [[],[],[],[Pro]]
CCG [[C,C,G],[],[],[]] [[],[],[],[Pro]]
ACT [[A,C, T],[],[],[]] [[],[],[],[T r]]
ACC [[A,C,C],[],[],[]] [[],[],[],[T r]]
ACA [[A,C,A],[],[],[]] [[],[],[],[¾]]
ACG [[A,C,G],[],[],[]] [[],[],[],[T r]]
GCT [[G,C, T],[],[],[]] [[],[],[],[Ala]]
GCC [[G,C,C],[],[],[]] [[],[],[],[Ala]]
GCA [[G,C,A],[],[],[]] [[MMMAla]]
GCG [[G,C,G],[],[],[]] [[],[],[],[Ala]]
TAT [[T, A, T],[],[],[]] [[UUUTyrll
TAC [[T, A,C],[],[],[]] [[],[],[],[Tyr]]
TAA [[T, A,A],[],[],[]] [[],[],[],[STOP]]
TAG [[T, A,G],[],[],[]] [[],[],[],[STOP]]
CAT [[C, A, T],[],[],[]] [[],[],[],[His]]
CAC [[C, A,C],[],[],[]] [[],[],[],[His]]
CAA [[C, A,A],[],[],[]] [[],[],[],[Gln]]
CAG [[C, A,G],[],[],[]] [[],[],[],[Gln]]
AAT [[A, A, T],[],[],[]] [[],[],[],[Asn]]
AAC [[A, A,C],[],[],[]] [[],[],[],[Asn]]
AAA [[A, A,A],[],[],[]] [[],[],[],[Lys]]
AAG [[A, A,G],[],[],[]] [[],D,[],[Lys]]
GAT [[G, A, T],[],[],[]] [n,il,n,[Aspll
GAC [[G, A,C],[],[],[]] [[],[],[],[Asp]]
GAA [[G, A,A],[],[],[]] [[],[],[],[Glu]]
GAG [[G, A,G],[],[],[]] [[],D,[],[Glu]]
TGT [[T, G, T],[],[],[]] [[lil UCysll
TGC [[T, G,C],[],[],[]] m,n,n,[cysii
Figure imgf000016_0001
[0041] Table 5 comprises:
1. 4 nucleotide bases T, C, A, and G denoted respectively as the UID;
2. 4 = 64 codons, each comprising a sequences of 3 bases denoted by sequence as the UID;
3. 20 amino acids coded by the codons, and denoted by the abbreviations in Table 6 as the UID;
4. START and STOP sequences coded by codons, and denoted respectively as the UID; and
5. the codon ATG represents both the START sequence, and the amino acid Met.
Table 6 Amino Acid Abbreviations.
Figure imgf000017_0001
[0042] Table 5 further captures the relationships among these entities, representing the knowledge that: 1. a single amino acid can be coded by anywhere from one to six codons,
2. a codon has an exclusive 1 to 1 causal relationship with an amino acid,
3. 1 codon represents the start of an amino acid sequence, and
4. 3 codons represent a stop in the amino acid sequences.
[0043] Polypeptides (or proteins) are derived from sequences of amino acids. A nucleotide sequence (codon) in a gene transcript can be replaced by its related amino acid, using the methods of the present invention, to derive a polypeptide from a gene transcript.

Claims

What is claimed:
1. A method to represent semantic knowledge as a collection of related entities, wherein the entities are represented by a data model comprising:
a) a unique identifier;
b) a pair of ordered lists, wherein items in the second list have a reciprocal relationship to items in the first list;
c) the lists further comprise a plurality of sublists; and
d) the sublists represent distinct semantics.
2. The method of claim 1 wherein the lists comprise 4 sublists, representing the semantics:
composition, classification, combination, and causality.
3. A method to interact with the data model comprising:
a) a method to traverse a network of related entities;
b) a method to extract data elements and semantic knowledge from the network; and
c) a method to define traversal and extraction procedures, comprising a query language, wherein the query language is defined by a grammar, and legal (allowed/valid) strings in the language represent queries on the network.
PCT/US2015/019573 2014-03-10 2015-03-10 Methods to represent and interact with complex knowledge WO2015138374A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201461950390P 2014-03-10 2014-03-10
US61/950,390 2014-03-10

Publications (1)

Publication Number Publication Date
WO2015138374A1 true WO2015138374A1 (en) 2015-09-17

Family

ID=54072305

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/019573 WO2015138374A1 (en) 2014-03-10 2015-03-10 Methods to represent and interact with complex knowledge

Country Status (1)

Country Link
WO (1) WO2015138374A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5481665A (en) * 1991-07-15 1996-01-02 Institute For Personalized Information Environment User interface device for creating an environment of moving parts with selected functions
US20070179776A1 (en) * 2006-01-27 2007-08-02 Xerox Corporation Linguistic user interface
US20100198841A1 (en) * 2009-01-12 2010-08-05 Parker Charles T Systems and methods for automatically identifying and linking names in digital resources
US20100235307A1 (en) * 2008-05-01 2010-09-16 Peter Sweeney Method, system, and computer program for user-driven dynamic generation of semantic networks and media synthesis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5481665A (en) * 1991-07-15 1996-01-02 Institute For Personalized Information Environment User interface device for creating an environment of moving parts with selected functions
US20070179776A1 (en) * 2006-01-27 2007-08-02 Xerox Corporation Linguistic user interface
US20100235307A1 (en) * 2008-05-01 2010-09-16 Peter Sweeney Method, system, and computer program for user-driven dynamic generation of semantic networks and media synthesis
US20100198841A1 (en) * 2009-01-12 2010-08-05 Parker Charles T Systems and methods for automatically identifying and linking names in digital resources

Similar Documents

Publication Publication Date Title
KR102165734B1 (en) Deep learning-based technology for pre-training deep convolutional neural networks
US20190295687A1 (en) Method and system for genome identification
CN114270376A (en) Artificial intelligence engine for generating drug candidates
JP2022184947A (en) Variant Classifier Based on Deep Neural Networks
KR20200010488A (en) Deep learning-based variant classifier
Lobo et al. Towards a bioinformatics of patterning: a computational approach to understanding regulative morphogenesis
Huang et al. Machine learning applications for therapeutic tasks with genomics data
Nam et al. Drug repurposing with network reinforcement
Zeng et al. OrthoCluster: a new tool for mining synteny blocks and applications in comparative genomics
CN111951886A (en) Drug relocation prediction method based on Bayesian inductive matrix completion
Singh et al. Towards probabilistic generative models harnessing graph neural networks for disease-gene prediction
CN115244623A (en) Protein family mapping
Berman et al. MutaGAN: A sequence-to-sequence GAN framework to predict mutations of evolving protein populations
Frasca Gene2disco: Gene to disease using disease commonalities
WO2015138374A1 (en) Methods to represent and interact with complex knowledge
Li et al. Understanding sequence conservation with deep learning
Roussel et al. Mapping of morpho-electric features to molecular identity of cortical inhibitory neurons
Numcharoenpinij et al. Predicting synergistic drug interaction with dnn and gat
Nakashima et al. An overview of bioinformatics methods for analyzing autism spectrum disorders
Wang et al. MayoNLP at the biocreative VI PM track: entity-enhanced hierarchical attention neural networks for mining protein interactions from biomedical text
Yuan et al. A hybrid neural collaborative filtering model for drug repositioning
Watson et al. A cautionary note on the use of unsupervised machine learning algorithms to characterise malaria parasite population structure from genetic distance matrices
Gao et al. Advancing to precision medicine through big data and artificial intelligence
Deng et al. Integrating phenotypic features and tissue-specific information to prioritize disease genes
Bi et al. SSLpheno: a self-supervised learning approach for gene–phenotype association prediction using protein–protein interactions and gene ontology data

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15760649

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.01.17)

122 Ep: pct application non-entry in european phase

Ref document number: 15760649

Country of ref document: EP

Kind code of ref document: A1