WO2010011691A2 - Methods and systems to fingerprint textual information using word runs - Google Patents
Methods and systems to fingerprint textual information using word runs Download PDFInfo
- Publication number
- WO2010011691A2 WO2010011691A2 PCT/US2009/051313 US2009051313W WO2010011691A2 WO 2010011691 A2 WO2010011691 A2 WO 2010011691A2 US 2009051313 W US2009051313 W US 2009051313W WO 2010011691 A2 WO2010011691 A2 WO 2010011691A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word
- text
- recited
- secure information
- implemented method
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 230000008520 organization Effects 0.000 claims abstract description 19
- 238000006467 substitution reaction Methods 0.000 claims abstract 4
- 230000006870 function Effects 0.000 claims description 31
- 238000013507 mapping Methods 0.000 claims description 17
- 230000009471 action Effects 0.000 claims description 6
- 238000000926 separation method Methods 0.000 claims description 3
- 210000001072 colon Anatomy 0.000 claims 2
- 238000012217 deletion Methods 0.000 claims 2
- 230000037430 deletion Effects 0.000 claims 2
- 238000012544 monitoring process Methods 0.000 claims 2
- 230000000903 blocking effect Effects 0.000 claims 1
- 238000013459 approach Methods 0.000 abstract description 10
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 230000015654 memory Effects 0.000 description 10
- 238000012805 post-processing Methods 0.000 description 10
- 238000010606 normalization Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000007689 inspection Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/606—Protecting data by securing the transmission between two devices or processes
- G06F21/608—Secure printing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
- H04L63/0245—Filtering by information in the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/12—Applying verification of the received information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
Definitions
- the present invention relates to information security and more specifically relates to systems and methods for detecting and preventing unauthorized disclosure of secure information. Furthermore, the present invention pertains to fingerprinting textual information using word runs for the purpose of detecting and preventing unauthorized disclosure of secure information.
- One method to detect similar data is by examining the database at the file level. This can be done by comparing the file names, or by comparing the file sizes, or by doing a checksum of the contents of the file. However, even minor differences between the two files will evade a detection method.
- Other prior art solutions teach partial text matching methods using various k- gram approaches. In such approaches, text-characters of a fixed length, called k-grams, are selected from the secure text. These k-grams are hashed into a number called a fingerprint. In order to increase storage and resource efficiency, the various prior art approaches propose different means by which the k-grams can sampled, so as to store only a representative subset of the k-grams.
- the present invention provides methods and systems to efficiently fingerprint vast amounts textual information using word runs and allows these fingerprints to be recorded in a repository.
- This embodiment comprises a receiving module to receive textual information from a plurality of input sources. It further includes a normalization module to convert the textual information to a standardized canonical format. It then includes a word boundary detection module that detects the boundaries of words in a language independent manner. It additionally includes a word hash list generator, where each word of the textual information is converted to a representative hash value.
- This embodiment also includes a fingerprint generator, which generates fingerprints by applying hash functions over the elements of the word hash list.
- the fingerprint generator uses algorithms to generate only a representative subset of the entire word hash list, thus further enhancing the memory and resource efficiencies of the system.
- a repository which can include any database or storage medium, is then used to record the fingerprints generated for the vast amounts of textual information received at the receiver module.
- the present invention provides methods and systems to receive any textual information entered in by a user and to match such information against a fingerprint database.
- This embodiment includes a receiving module to receive the user entered information, a normalization module to convert the textual information to a standardized canonical format, a language independent word boundary detector to detect the start and end of each word, a word hash list generator to generate representative hash values to every word, and a fingerprint generator that uses a sliding window to efficiently generate a representative subset of fingerprints for the received user information.
- This embodiment finally matches the generated fingerprints against a previously developed fingerprint database, and provides alerts to the user in the event that any secure or protected information is indeed being disclosed.
- FIG. 1 Another embodiment of the present invention allows the fingerprints to be generated without any dependence on human languages, and without any linguistic understanding of the underlying text, thereby allowing the invention to be applied to most languages.
- the present invention also provides embodiments where the fingerprints are made independent of presence of punctuations, ordering of words within sentences or paragraphs, and/or presence of upper and lower case characters in the words. By doing this, the present invention allows word runs to be matched and detected both at sentence and paragraph level.
- this invention allows even derivative works of the original text (e.g., changes to the sentence structure or word ordering at the sentence/paragraph level, use of comparable words in the form of synonyms/hpernyms, varied usage of punctuations, removal or addition of certain stop words, etc.) to be matched and detected.
- derivative works of the original text e.g., changes to the sentence structure or word ordering at the sentence/paragraph level, use of comparable words in the form of synonyms/hpernyms, varied usage of punctuations, removal or addition of certain stop words, etc.
- FIG. 1 illustrates an overall embodiment of a method for fingerprinting textual information using word runs
- FIG. 2 is a flowchart depicting an embodiment of a method for generating a word hash list
- FIG. 3 is a block diagram providing the various methods by which post processing can be performed on the word hash list to improve efficiency;
- FIG. 4 is a flowchart depicting a preferred embodiment of a method to generate a first fingerprint for the received textual information
- FIG. 5 is a block diagram providing examples of methods by which the fingerprints can be made word-order independent
- FIG. 6 is a flowchart depicting a preferred embodiment of a method to generate a set of fingerprints for the entire textual information
- FIG. 7 illustrates an embodiment for generating the fingerprints for secure and protected information of an organization and then recording the fingerprints in a repository
- FIG. 8 illustrates an embodiment for generating the fingerprints for user-entered information and then matching that fingerprint against fingerprints stored in a repository
- FIG. 9 provides an overall embodiment of a system for fingerprinting textual information using word runs; and FIG. 10 is a block diagram depicting various embodiments of systems by which fingerprints can be either recorded or used for matching and detecting an unauthorized disclosure.
- Fig.l shows one embodiment of an overall method to fingerprint textual information using word runs
- the information that needs to be fingerprinted is received from a plurality of sources 110.
- This information is then normalized 120 to a standardized or canonical text format.
- the boundaries of each word are then detected 125 in a language independent manner.
- the words from the normalized text are then used to generate a word run based hash list, called the word hash list 130.
- This word hash list is then used to generate the final fingerprints 140.
- Information may be received from several sources.
- the source could include confidential, important, or secure information maintained by an organization, where such information needs to be recorded or registered into a database.
- the source could include any information entered by a user having access to an organization's secure information, where such information would need to be matched and inspected against an existing database of secure information.
- the textual information received from either of these sources includes a plurality of words. Such words are may be present as a plurality of text-characters, with one word distinguished from another by the presence of at least one space-character. The words may also be present as plurality of text-characters, with one word separated from another by the use of punctuation marks.
- the received information is first normalized to a canonical text representation 120. This can be done by converting the computer files containing the textual information into one of several raw text formats.
- One example of such normalization is to convert a PDF (Portable Document Format) file into a Unicode transformation format file.
- An example of a Unicode transformation format is UTF-16.
- the present invention uses a word boundary detector 125 to detect the separation of one word from a preceding or following word.
- the word boundary detector 125 uses a state machine and employs character-classes that dictate boundary analysis across languages, hi this embodiment, the state machine utilizes mapping tables to determine what character-class a particular character belongs to.
- the detector determines whether a word has just started or ended. Because the character- classes include generic word separators or delimiters common to most languages, this word boundary detector can be used in a language independent manner. Additionally, the characters within the words may be case-folded, such that the word-value hash assigned to a particular word does not depend upon whether the word has any upper or lower case characters. Note that the case folding can be done at any time prior to the generation of a word hash list.
- Fig. 2 depicts one method of generating a word hash list 200.
- the normalized textual information is read in as input 210.
- Each of the words present in this normalized input is then converted to a word-value hash 220.
- One example of generating a word- value hash is to compute a hash based function over every character of a word and generating an integer value corresponding to that word.
- Such word-value hashes are generated for every word of the received normalized information, hi this embodiment, only words are processed, and punctuations are not assigned any word-value hashes. This allows the method to remain impervious to changes in punctuation.
- the resulting word- value hashes from all the words are compiled together to obtain a word hash list 230.
- This word hash list may then be subject to post-processing steps 240 (explained below in detail in Fig. 3) to generate fingerprints that are robust and remain impervious to edits in derivative works of the original text.
- the word hash list received after such postprocessing steps is designated as the final word hash list 250.
- the word-value hashes are computed as 32-bit unsigned integers. This is advantageous because the computation of the word-value hashes could then use 32-bit arithmetic, which would be much faster than performing 64-bit arithmetic on 32-bit architectures.
- Fig. 3 is a block diagram 240 providing information on various methods to achieve post processing of the word hash lists.
- word-value hashes corresponding to certain stop-words are excluded 320 from the final word hash list.
- Stop- words include those words of any language that occur frequently in the usage of the language, but do not add any substantive content to meaningful understanding of the language. Examples of stop-words include prepositions (e.g., beside, to, until), gender denoting terms (e.g., she, he, her), etc.
- certain predefined sets of words are mapped to a distinct word-value hash 330.
- Examples include mapping all stems of a frequently used word to the same root, mapping nouns to common synonyms or hypernyms, etc.
- the word-value hashes 220 are generated as integers such that words of the textual information are represented by unique integer values. Operating the post processing steps with integer values results in increased computational efficiencies as compared to operating on character or string values.
- Fig. 3 is a flowchart 400 depicting a method of generating one fingerprint from the final word hash list 250.
- the method comprises receiving the final word hash list 410 and assigning a sliding window of fixed-size W (where W is an integer greater than or equal to 1) to read the first W word-value hashes from the word hash list 420.
- An anchor 430 is then determined for this first window, by selecting a distinct- valued word- value hash from the W number of word-value hashes currently read in by the sliding window. Examples of distinct-valued word-value hashes include those word-value hashes that have the highest integer value, or those word-value hashes with the lowest integer value.
- a new hash H f 440 is computed by applying a hash function over all the words starting from the first word-value hash within the window, up until the word- value hash that is designated as the anchor.
- This new hash is effectively a hash of one or more word-value hashes, and this new hash is designated as the first fingerprint.
- the present invention also discloses methods by which the hash function can optionally be made word-order independent.
- Fig. 5 is a block diagram illustrating several possible embodiments of the hash function H f . These embodiments represent different ways by which H f can be made word order independent 500. In one embodiment, Hf can be implemented as an addition hash function 520.
- H f can be implemented as a multiplication hash function 530.
- H f can be implemented as an exclusive-or hash function 540.
- These hash functions are examples of symmetric hash functions, and would therefore allow the fingerprints to be word order independent.
- another embodiment of H f can be developed by combining the symmetric hash functions 540. One method of realizing such an embodiment would be by splitting a large word-value hash into two parts and performing a different symmetric operation on the two parts.
- Word-order independence of H f allows for a much larger range of modifications to the original text to be detected at the inspection level, than is possible with prior art approaches.
- the combination of this word-order independence 500 and the various post-processing methods 300 disclosed in Fig. 3 makes it possible to detect similar text at the inspection stage, even when such text is modified from the original text at the sentence or paragraph level.
- Fig. 6 is a flowchart illustrating one method for generating a complete set of fingerprints 600 for the entire word hash list 250.
- a first fingerprint 450 is generated using the method explained previously in Fig 4.
- the sliding window of size W 420 is moved one position to the right 620, thereby reading W word- value hashes 220 starting from the second word-value hash in the word hash list 250.
- a new anchor 630 is designated for this new window by selecting a new distinct-valued word-value hash, similar to the anchor selection method 430 for the first fingerprint as explained in Fig. 4. This new anchor 630 is then compared against the anchor that was generated for the immediately preceding window.
- the new anchor 630 is identical to the immediately preceding anchor, no new fingerprint is generated 640. However, if the new anchor 630 is not identical to the immediately preceding anchor, a new fingerprint is generated 650 using the hash function Hf 440 explained in Fig. 4. After the completion of this step, the sliding window is moved another position to the right, reading a new set of W word- value hashes. This process is repeated until all the word-value hashes in the word hash list are completely scanned by the sliding window.
- Fig. 7 presents one embodiment of registering the fingerprints.
- the fingerprints generated for each word hash list 250 using the methods explained in Figs. 2-6 are stored in a repository 700.
- This repository would then serve as a database 730, containing fingerprint data for all confidential, important, or secure information of an organization.
- Fig. 8 depicts another embodiment of generating fingerprints, where the embodiment can be used for the purpose of inspecting any user entered information. This can be done by matching the fingerprint generated for the user-entered information 820 against fingerprints stored in a central fingerprint database 830.
- This central fingerprint database contains a plurality of fingerprints of an organization's secure information, as explained in Fig. 7.
- a new set of fingerprints is then generated for text that a user desires to transmit outside of the organization 810. Examples of such transmitted text includes text contained in an email that a user desires to send out from his computer, text contained in any files that a user attaches to an email, text contained in any files that a user transfers outside of his computer using any of the computer's output devices, etc.
- Examples of a computer's output devices include data transferred to a floppy disc in a floppy drive, data transferred to a flash memory device, data transferred to a disc in a CD/DVD drive, data transferred to another computer using the computer's network connectivity, data transferred over the internet using a file transfer protocol, etc.
- the new set of fingerprints is compared against the fingerprints stored in the central fingerprint database 830. hi one embodiment, a security action is performed if any of the new set of fingerprints match against any of the fingerprints in the central database.
- Figs. 9-11 includes an overview of computer hardware and other operating components suitable for implementing the systems of the invention described here.
- the invention can be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor- based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like.
- the invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- Fig. 9 shows one embodiment of an overall system that can be used to generate fingerprints for wordruns.
- the system has a first receiver module 910.
- the receiver module 910 is a computer, which can receive textual information from several sources.
- the textual information can be entered into the computer by a user, using any I/O device attached to the computer.
- I/O devices could include any device used for entering information into a computer, including a keyboard, pointing device (e.g., a mouse), microphone, joystick, game pad, scanner, digital camera, etc.
- the textual information could be in the form of data files, including an organization's secure or confidential information, stored in the memory of the computer.
- Such memory may include but is not limited to RAM, ROM, and/or any combination of volatile and non-volatile memory.
- the information could be available in the form of a database in a computer's memory.
- the information could be stored in a network server, or could be received from an external source via a network router.
- the received information is converted to a normalized text format within the text normalization module 920.
- this text normalization module is any computer implemented software application that can be used to convert the data file from a non-Unicode format to a Unicode text format. A person of skill in the art can immediately appreciate the wealth of third party software applications that are readily available to perform this normalization.
- the received normalized information is then transmitted to a word detector 930.
- the word detector could be a computer implemented software for running an algorithm to detect the boundaries of each word.
- the word boundary detector uses a state machine and employs character-classes that dictate boundary analysis across languages.
- the state machine utilizes mapping tables to determine what character-class a particular character belongs to. By mapping the current character and comparing that against the mapping of the previous character, the detector determines whether a word has just started or ended. Because the character-classes include generic word separators or delimiters common to most languages, this word boundary detector can be used in a language independent manner. Thus, various embodiments of this system can be developed for different languages.
- a case-folding operation may be done on the words to remove any distinction between words containing upper case and lower case characters. This ensures that duplicate fingerprints are not generated for upper and lower case formats of the same word. Note that the case folding can be done at any time prior to the operation of the word hash list generation module.
- this word hash list generation module is a computer implemented software that operates on every word of the received normalized textual information.
- the module further comprises a computer implemented software to compute a hash function over all the characters of each word, resulting in a word- value hash for every word. These word-value hashes are compiled together in a list, and this list is designated as the word hash list.
- the word hash list can further be post-processed to exclude some word-value hashes in order to generate fingerprints that are robust and remain impervious to edits in derivative works of the original text. Examples of this include removing certain stop words that occur frequently in a language and grouping certain categories of words and mapping them to one common word- value hash.. These post-processing steps can also be achieved by means of a computer implemented software.
- the word hash list is finally used to generate a set of fingerprints by operation of the fingerprint generation module 950.
- the fingerprint generation module is a computer implemented software capable of performing arithmetic and logic operations.
- the software reads word-value hashes using a sliding window of size W, reading W number of word-value hashes at a given time.
- the software designates a distinct-valued word-value hash as an anchor, and generates a new fingerprint every time the anchor of the current window is not identical to the anchor from the immediately preceding window.
- the software computes the fingerprint by computing a new hash function over all word- value hashes starting from the first word-value hash of the current window up until the word- value hash corresponding to the anchor of the current window.
- Fig. 10 depicts an embodiment where the fingerprints generated using the system explained in Fig. 9 can be stored in a repository.
- a receiver 101- identical to the receiver 910 explained in Fig. 9, can be used to receive textual information. Examples of such information include an organization's confidential, secure, or any other important information that needs to be protected from unauthorized disclosure.
- the fingerprints for this information are generated using the word run based fingerprint generation module 1020, which uses the steps described in the fingerprint generation system explained in Fig. 9.
- the resulting fingerprints are stored in a repository 1030 for later use. Examples of a repository include recording the fingerprints in a database, a network server, a local computer, or any other magnetic or optical storage media.
- Fig. 10 provides another embodiment where the fingerprints generated using the system of Fig. 9 can be used to be matched and inspected against a repository 1030 of fingerprints.
- the system receives textual information entered into a computer by a user 1050, wherein the information may be entered in using one of several input devices. Examples of such input devices include keyboards, microphones, scanners, pointing devices (e.g., mouse), etc.
- the fingerprint generating module 1060 generates fingerprints for this information using the fingerprint generation system explained in Fig. 9.
- the inspection module 1070 accepts the resulting fingerprints, which are then compared against the bank of fingerprints stored in the repository 1030.
- a computer implemented software can be used to build the inspection module, wherein the software code enables the module to match the current fingerprint with a fingerprint in the repository 1030 and report any successful matches.
- This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer.
- Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system.
- a computer readable storage medium such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system.
Abstract
The present invention provides methods and systems to enable fast, efficient, and scalable means for fingerprinting textual information using word runs. The present system receives textual information and provides algorithms to convert the information into representative fingerprints. In one embodiment, the fingerprints are recorded in a repository to maintain a database of an organization's secure data. In another embodiment, textual information entered by a user is verified against the repository of fingerprints to prevent unauthorized disclosure of secure data. This invention provides approaches to allow derivative works (e.g., different ordering of words, substitution of words with synonyms, etc.) of the original information to be detected at the sentence level or even at the paragraph level. This invention also provides means for enhancing storage and resource efficiencies by providing approaches to optimize the number of fingerprints generated for the textual information.
Description
METHODS AND SYSTEMS TO FINGERPRINT TEXTUAL INFORMATION USING WORD RUNS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Patent Application No. 12/177,043 filed July 21, 2008, which is hereby incorporated by reference in its entirety.
FIELD OF INVENTION
[0002] The present invention relates to information security and more specifically relates to systems and methods for detecting and preventing unauthorized disclosure of secure information. Furthermore, the present invention pertains to fingerprinting textual information using word runs for the purpose of detecting and preventing unauthorized disclosure of secure information.
BACKGROUND OF THE INVENTION [0003] With the rapid increase and advances in digital documentation capabilities and document management systems, organizations are increasingly storing important, confidential, and secure information in the form of digital documents. Unauthorized dissemination of this information, either by accident or by wanton means, presents serious security risks to these organizations. Therefore, it is imperative for the organizations to protect such secure information and detect and react to any secure information (or derivatives thereof) from being disclosed beyond the perimeters of the organization.
[0004] Additionally, the organizations face the challenge of categorizing and maintaining the large corpus of digital information across potentially thousands of data stores, content management systems, end-user desktops, etc. It is therefore valuable to the organization to be able to identify and disregard redundant information from this vast database. At the same time, it is critical to the organization's security to be able to identify derivative forms of the secure data (e.g., changes to the sentence structure or word ordering at the sentence/paragraph level, use of comparable words in the form of synonyms/hpernyms, varied usage of punctuations, etc.) and identify any unauthorized disclosure of even such derivative forms. Therefore, any system or method built to accomplish the task of preventing unauthorized disclosure would have to address these two conflicting challenges.
[0005] One method to detect similar data is by examining the database at the file level. This can be done by comparing the file names, or by comparing the file sizes, or by doing a checksum of the contents of the file. However, even minor differences between the two files will evade a detection method. [0006] Other prior art solutions teach partial text matching methods using various k- gram approaches. In such approaches, text-characters of a fixed length, called k-grams, are selected from the secure text. These k-grams are hashed into a number called a fingerprint. In order to increase storage and resource efficiency, the various prior art approaches propose different means by which the k-grams can sampled, so as to store only a representative subset of the k-grams. However, these prior art approaches suffer a number of disadvantages. For example, these prior systems are not robust against derivate works of the secure text. Additionally, the k-gram approaches are not suitable for use in multi- language environments (e.g., a document containing a mixture of Mandarin and English words). Also, using a character-based approach as opposed to a word-based approach does not allow for the exclusion of common or repeated words, thus resulting in overall memory and resource inefficiencies.
SUMMARY OF THE INVENTION
[0007] Methods and systems to provide fast, efficient, and scalable means to fingerprint textual information using word runs is presented. In one embodiment, the present invention provides methods and systems to efficiently fingerprint vast amounts textual information using word runs and allows these fingerprints to be recorded in a repository. This embodiment comprises a receiving module to receive textual information from a plurality of input sources. It further includes a normalization module to convert the textual information to a standardized canonical format. It then includes a word boundary detection module that detects the boundaries of words in a language independent manner. It additionally includes a word hash list generator, where each word of the textual information is converted to a representative hash value. Several means are provided by which the word hash list can be post-processed to significantly improve memory and resource efficiencies. Examples of such post-processing include eliminating certain stop words, grouping certain categories of words and mapping them to one hash value, etc. This embodiment also includes a fingerprint generator, which generates fingerprints by applying hash functions over the elements of the word hash list. The fingerprint generator
uses algorithms to generate only a representative subset of the entire word hash list, thus further enhancing the memory and resource efficiencies of the system. A repository, which can include any database or storage medium, is then used to record the fingerprints generated for the vast amounts of textual information received at the receiver module. [0008] In another embodiment, the present invention provides methods and systems to receive any textual information entered in by a user and to match such information against a fingerprint database. This embodiment includes a receiving module to receive the user entered information, a normalization module to convert the textual information to a standardized canonical format, a language independent word boundary detector to detect the start and end of each word, a word hash list generator to generate representative hash values to every word, and a fingerprint generator that uses a sliding window to efficiently generate a representative subset of fingerprints for the received user information. This embodiment finally matches the generated fingerprints against a previously developed fingerprint database, and provides alerts to the user in the event that any secure or protected information is indeed being disclosed.
[0009] Other embodiments of the present invention allow the fingerprints to be generated without any dependence on human languages, and without any linguistic understanding of the underlying text, thereby allowing the invention to be applied to most languages. The present invention also provides embodiments where the fingerprints are made independent of presence of punctuations, ordering of words within sentences or paragraphs, and/or presence of upper and lower case characters in the words. By doing this, the present invention allows word runs to be matched and detected both at sentence and paragraph level. Additionally, this invention allows even derivative works of the original text (e.g., changes to the sentence structure or word ordering at the sentence/paragraph level, use of comparable words in the form of synonyms/hpernyms, varied usage of punctuations, removal or addition of certain stop words, etc.) to be matched and detected.
BRIEF DESCRIPTION OF DRAWINGS
[0010] These and other objects, features and characteristics of the present invention will become more apparent to those skilled in the art from a study of the following detailed description in conjunction with the appended claims and drawings, all of which form a part of this specification. In the drawings:
FIG. 1 illustrates an overall embodiment of a method for fingerprinting textual information using word runs;
FIG. 2 is a flowchart depicting an embodiment of a method for generating a word hash list; FIG. 3 is a block diagram providing the various methods by which post processing can be performed on the word hash list to improve efficiency;
FIG. 4 is a flowchart depicting a preferred embodiment of a method to generate a first fingerprint for the received textual information;
FIG. 5 is a block diagram providing examples of methods by which the fingerprints can be made word-order independent;
FIG. 6 is a flowchart depicting a preferred embodiment of a method to generate a set of fingerprints for the entire textual information;
FIG. 7 illustrates an embodiment for generating the fingerprints for secure and protected information of an organization and then recording the fingerprints in a repository;
FIG. 8 illustrates an embodiment for generating the fingerprints for user-entered information and then matching that fingerprint against fingerprints stored in a repository;
FIG. 9 provides an overall embodiment of a system for fingerprinting textual information using word runs; and FIG. 10 is a block diagram depicting various embodiments of systems by which fingerprints can be either recorded or used for matching and detecting an unauthorized disclosure.
DETAILED DESCRIPTION OF THE INVENTION
[0011] The present invention may be embodied in several forms and manners. The description provided below and the drawings show exemplary embodiments of the invention. Those of skill in the art will appreciate that the invention may be embodied in other forms and manners not shown below. It is understood that the use of relational terms, if any, such as first, second, top and bottom, and the like are used solely for distinguishing one entity or action from another, without necessarily requiring or implying any such actual relationship or order between such entities or actions.
[0012] Fig.l shows one embodiment of an overall method to fingerprint textual information using word runs, hi this embodiment, the information that needs to be fingerprinted is received from a plurality of sources 110. This information is then normalized 120 to a standardized or canonical text format. The boundaries of each word are then detected 125 in a language independent manner. The words from the normalized text are then used to generate a word run based hash list, called the word hash list 130. This word hash list is then used to generate the final fingerprints 140. Each of these steps are discussed in detail below.
[0013] Information may be received from several sources. In one embodiment, the source could include confidential, important, or secure information maintained by an organization, where such information needs to be recorded or registered into a database. In another embodiment, the source could include any information entered by a user having access to an organization's secure information, where such information would need to be matched and inspected against an existing database of secure information. The textual information received from either of these sources includes a plurality of words. Such words are may be present as a plurality of text-characters, with one word distinguished from another by the presence of at least one space-character. The words may also be present as plurality of text-characters, with one word separated from another by the use of punctuation marks.
[0014] The received information is first normalized to a canonical text representation 120. This can be done by converting the computer files containing the textual information into one of several raw text formats. One example of such normalization is to convert a PDF (Portable Document Format) file into a Unicode transformation format file. An example of a Unicode transformation format is UTF-16.
[0015] In one embodiment, the present invention uses a word boundary detector 125 to detect the separation of one word from a preceding or following word. The word boundary detector 125 uses a state machine and employs character-classes that dictate boundary analysis across languages, hi this embodiment, the state machine utilizes mapping tables to determine what character-class a particular character belongs to. By mapping the current character and comparing that against the mapping of the previous character, the detector determines whether a word has just started or ended. Because the character- classes include generic word separators or delimiters common to most languages, this word boundary detector can be used in a language independent manner. Additionally, the characters within the words may be case-folded, such that the word-value hash assigned to a particular word does not depend upon whether the word has any upper or lower case characters. Note that the case folding can be done at any time prior to the generation of a word hash list.
[0016] Fig. 2 depicts one method of generating a word hash list 200. Here, the normalized textual information is read in as input 210. Each of the words present in this normalized input is then converted to a word-value hash 220. One example of generating a word- value hash is to compute a hash based function over every character of a word and generating an integer value corresponding to that word. Such word-value hashes are generated for every word of the received normalized information, hi this embodiment, only words are processed, and punctuations are not assigned any word-value hashes. This allows the method to remain impervious to changes in punctuation. The resulting word- value hashes from all the words are compiled together to obtain a word hash list 230. This word hash list may then be subject to post-processing steps 240 (explained below in detail in Fig. 3) to generate fingerprints that are robust and remain impervious to edits in derivative works of the original text. The word hash list received after such postprocessing steps is designated as the final word hash list 250.
[0017] In one embodiment, the word-value hashes are computed as 32-bit unsigned integers. This is advantageous because the computation of the word-value hashes could then use 32-bit arithmetic, which would be much faster than performing 64-bit arithmetic on 32-bit architectures.
[0018] Fig. 3 is a block diagram 240 providing information on various methods to achieve post processing of the word hash lists. In one method, word-value hashes
corresponding to certain stop-words are excluded 320 from the final word hash list. Stop- words include those words of any language that occur frequently in the usage of the language, but do not add any substantive content to meaningful understanding of the language. Examples of stop-words include prepositions (e.g., beside, to, until), gender denoting terms (e.g., she, he, her), etc. In yet another method, certain predefined sets of words are mapped to a distinct word-value hash 330. Examples include mapping all stems of a frequently used word to the same root, mapping nouns to common synonyms or hypernyms, etc. In one embodiment, the word-value hashes 220 are generated as integers such that words of the textual information are represented by unique integer values. Operating the post processing steps with integer values results in increased computational efficiencies as compared to operating on character or string values.
[0019] The post processing steps of Fig. 3 ensure that the final fingerprints remain robust and impervious to any changes or edits in derivative works of the original information. Specifically, these steps allow even derivative works of the original work to be matched and detected at a later inspection stage. Derivative works of the original information may include changes in word ordering, removal or addition of stop-words, changes in punctuations, and usage of different stems for a particular word. Additionally, the post-processing steps also improve the efficiency of the process by reducing the number of word-value hashes that will need further processing. [0020] Fig. 4 is a flowchart 400 depicting a method of generating one fingerprint from the final word hash list 250. The method comprises receiving the final word hash list 410 and assigning a sliding window of fixed-size W (where W is an integer greater than or equal to 1) to read the first W word-value hashes from the word hash list 420. An anchor 430 is then determined for this first window, by selecting a distinct- valued word- value hash from the W number of word-value hashes currently read in by the sliding window. Examples of distinct-valued word-value hashes include those word-value hashes that have the highest integer value, or those word-value hashes with the lowest integer value. After selecting an anchor, a new hash Hf 440 is computed by applying a hash function over all the words starting from the first word-value hash within the window, up until the word- value hash that is designated as the anchor. This new hash is effectively a hash of one or more word-value hashes, and this new hash is designated as the first fingerprint.
[0021] The present invention also discloses methods by which the hash function can optionally be made word-order independent. Fig. 5 is a block diagram illustrating several possible embodiments of the hash function Hf. These embodiments represent different ways by which Hf can be made word order independent 500. In one embodiment, Hf can be implemented as an addition hash function 520. hi another embodiment, Hf can be implemented as a multiplication hash function 530. In yet another embodiment, Hf can be implemented as an exclusive-or hash function 540. These hash functions are examples of symmetric hash functions, and would therefore allow the fingerprints to be word order independent. To make Hf more robust, another embodiment of Hf can be developed by combining the symmetric hash functions 540. One method of realizing such an embodiment would be by splitting a large word-value hash into two parts and performing a different symmetric operation on the two parts. Word-order independence of Hf allows for a much larger range of modifications to the original text to be detected at the inspection level, than is possible with prior art approaches. The combination of this word-order independence 500 and the various post-processing methods 300 disclosed in Fig. 3 makes it possible to detect similar text at the inspection stage, even when such text is modified from the original text at the sentence or paragraph level.
[0022] Fig. 6 is a flowchart illustrating one method for generating a complete set of fingerprints 600 for the entire word hash list 250. hi one embodiment, a first fingerprint 450 is generated using the method explained previously in Fig 4. After this, the sliding window of size W 420 is moved one position to the right 620, thereby reading W word- value hashes 220 starting from the second word-value hash in the word hash list 250. From this new set of W word-value hashes, a new anchor 630 is designated for this new window by selecting a new distinct-valued word-value hash, similar to the anchor selection method 430 for the first fingerprint as explained in Fig. 4. This new anchor 630 is then compared against the anchor that was generated for the immediately preceding window. If the new anchor 630 is identical to the immediately preceding anchor, no new fingerprint is generated 640. However, if the new anchor 630 is not identical to the immediately preceding anchor, a new fingerprint is generated 650 using the hash function Hf 440 explained in Fig. 4. After the completion of this step, the sliding window is moved another position to the right, reading a new set of W word- value hashes. This process is repeated
until all the word-value hashes in the word hash list are completely scanned by the sliding window.
[0023] Fig. 7 presents one embodiment of registering the fingerprints. In this embodiment, the fingerprints generated for each word hash list 250 using the methods explained in Figs. 2-6 are stored in a repository 700. This repository would then serve as a database 730, containing fingerprint data for all confidential, important, or secure information of an organization.
[0024] Fig. 8 depicts another embodiment of generating fingerprints, where the embodiment can be used for the purpose of inspecting any user entered information. This can be done by matching the fingerprint generated for the user-entered information 820 against fingerprints stored in a central fingerprint database 830. This central fingerprint database contains a plurality of fingerprints of an organization's secure information, as explained in Fig. 7. A new set of fingerprints is then generated for text that a user desires to transmit outside of the organization 810. Examples of such transmitted text includes text contained in an email that a user desires to send out from his computer, text contained in any files that a user attaches to an email, text contained in any files that a user transfers outside of his computer using any of the computer's output devices, etc. Examples of a computer's output devices include data transferred to a floppy disc in a floppy drive, data transferred to a flash memory device, data transferred to a disc in a CD/DVD drive, data transferred to another computer using the computer's network connectivity, data transferred over the internet using a file transfer protocol, etc. Here, the new set of fingerprints is compared against the fingerprints stored in the central fingerprint database 830. hi one embodiment, a security action is performed if any of the new set of fingerprints match against any of the fingerprints in the central database. Examples of such security actions include sending out an email alert to a person responsible for the organization's information security, denying the user's access to the information, logging the event as a potential security violation, requiring the user to enter a password to allow such information to be transferred, preventing the secure information from being transferred out, etc. [0025] The following description of Figs. 9-11 includes an overview of computer hardware and other operating components suitable for implementing the systems of the invention described here. The invention can be practiced with other computer system
configurations, including hand-held devices, multiprocessor systems, microprocessor- based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
[0026] Fig. 9 shows one embodiment of an overall system that can be used to generate fingerprints for wordruns. Here, the system has a first receiver module 910. In this embodiment, the receiver module 910 is a computer, which can receive textual information from several sources. In one embodiment, the textual information can be entered into the computer by a user, using any I/O device attached to the computer. Such I/O devices could include any device used for entering information into a computer, including a keyboard, pointing device (e.g., a mouse), microphone, joystick, game pad, scanner, digital camera, etc. In another embodiment, the textual information could be in the form of data files, including an organization's secure or confidential information, stored in the memory of the computer. Such memory may include but is not limited to RAM, ROM, and/or any combination of volatile and non-volatile memory. In yet another embodiment, the information could be available in the form of a database in a computer's memory. In other embodiments, the information could be stored in a network server, or could be received from an external source via a network router. [0027] The received information is converted to a normalized text format within the text normalization module 920. In one embodiment, this text normalization module is any computer implemented software application that can be used to convert the data file from a non-Unicode format to a Unicode text format. A person of skill in the art can immediately appreciate the wealth of third party software applications that are readily available to perform this normalization.
[0028] The received normalized information is then transmitted to a word detector 930. In one embodiment, the word detector could be a computer implemented software for running an algorithm to detect the boundaries of each word. In this embodiment, the word boundary detector uses a state machine and employs character-classes that dictate boundary analysis across languages. Here, the state machine utilizes mapping tables to determine what character-class a particular character belongs to. By mapping the current character and comparing that against the mapping of the previous character, the detector
determines whether a word has just started or ended. Because the character-classes include generic word separators or delimiters common to most languages, this word boundary detector can be used in a language independent manner. Thus, various embodiments of this system can be developed for different languages. Additionally, a case-folding operation may be done on the words to remove any distinction between words containing upper case and lower case characters. This ensures that duplicate fingerprints are not generated for upper and lower case formats of the same word. Note that the case folding can be done at any time prior to the operation of the word hash list generation module.
[0029] The received normalized information is then used to generate a word hash list using the word hash list generation module 940. In one embodiment, this word hash list generation module is a computer implemented software that operates on every word of the received normalized textual information. In this embodiment, the module further comprises a computer implemented software to compute a hash function over all the characters of each word, resulting in a word- value hash for every word. These word-value hashes are compiled together in a list, and this list is designated as the word hash list. The word hash list can further be post-processed to exclude some word-value hashes in order to generate fingerprints that are robust and remain impervious to edits in derivative works of the original text. Examples of this include removing certain stop words that occur frequently in a language and grouping certain categories of words and mapping them to one common word- value hash.. These post-processing steps can also be achieved by means of a computer implemented software.
[0030] The word hash list is finally used to generate a set of fingerprints by operation of the fingerprint generation module 950. In one embodiment, the fingerprint generation module is a computer implemented software capable of performing arithmetic and logic operations. Here, the software reads word-value hashes using a sliding window of size W, reading W number of word-value hashes at a given time. At each window instant, the software designates a distinct-valued word-value hash as an anchor, and generates a new fingerprint every time the anchor of the current window is not identical to the anchor from the immediately preceding window. The software computes the fingerprint by computing a new hash function over all word- value hashes starting from the first word-value hash of the current window up until the word- value hash corresponding to the anchor of the current window. This method of fingerprinting using wordruns is advantageous over other
methods because it results in memory and resource efficiency, by reducing the total number of fingerprints that need to be stored in a fingerprint database.
[0031] Fig. 10 depicts an embodiment where the fingerprints generated using the system explained in Fig. 9 can be stored in a repository. Here, a receiver 101-, identical to the receiver 910 explained in Fig. 9, can be used to receive textual information. Examples of such information include an organization's confidential, secure, or any other important information that needs to be protected from unauthorized disclosure. The fingerprints for this information are generated using the word run based fingerprint generation module 1020, which uses the steps described in the fingerprint generation system explained in Fig. 9. The resulting fingerprints are stored in a repository 1030 for later use. Examples of a repository include recording the fingerprints in a database, a network server, a local computer, or any other magnetic or optical storage media.
[0032] Fig. 10 provides another embodiment where the fingerprints generated using the system of Fig. 9 can be used to be matched and inspected against a repository 1030 of fingerprints. In one embodiment, the system receives textual information entered into a computer by a user 1050, wherein the information may be entered in using one of several input devices. Examples of such input devices include keyboards, microphones, scanners, pointing devices (e.g., mouse), etc. The fingerprint generating module 1060 generates fingerprints for this information using the fingerprint generation system explained in Fig. 9. The inspection module 1070 then accepts the resulting fingerprints, which are then compared against the bank of fingerprints stored in the repository 1030. In one embodiment, a computer implemented software can be used to build the inspection module, wherein the software code enables the module to match the current fingerprint with a fingerprint in the repository 1030 and report any successful matches. [0033] The systems explained in Figs. 9-11 and all its embodiments relate to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs,
magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system.
[0034] The algorithms and software presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from other portions of this description. In addition, the present invention is not described with reference to any particular programming language, and various embodiments may thus be implemented using a variety of programming languages.
[0035] In addition to the above mentioned examples, various other modifications and alterations of the invention may be made without departing from the invention. Accordingly, the above disclosure is not to be considered as limiting and the appended claims are to be interpreted as encompassing the true spirit and the entire scope of the invention.
Claims
1. A computer implemented method for preventing unauthorized disclosure of secure information, the computer implemented method comprising: receiving information including a first text, said first text including a plurality of words; normalizing said first text into a first canonical text expression, said first canonical text expression including a plurality of normalized words; generating a first word hash list for said first canonical text expression, where said first word hash list is generated at a word level; and generating a first set of fingerprints for said first word hash list.
2. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 1, wherein each of said plurality of words are defined as a combination of one or more text characters not separated by a specific predefined character, and each of said plurality of words are separated from a previous word and a subsequent word by at least one of said specific predefined character
3. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 2, wherein said specific predefined character is a space.
4. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 2, wherein said specific predefined character is a punctuation character.
5. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 2, wherein said specific predefined character includes one or more character types including a space, a period, a comma, a semi-colon, a colon, an exclamation point, a dash, a parenthesis, and a quote.
6. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 1 , wherein said plurality of words includes detecting separation of a specific word from a following or preceding word using a word boundary detector.
7. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 6, wherein said word boundary detector is language independent.
8. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 1, wherein said receiving information includes receiving secure information from a local database.
9. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 1 , wherein said normalizing said first text into a first canonical text expression includes converting said first text into Unicode.
10. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 9, wherein said Unicode is UTFl 6.
11. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 1, wherein generating said first word hash list includes converting said plurality of normalized words into a plurality of word-value hashes, each specific one of said word-value hashes representing a specific normalized word.
12. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 11, wherein said first text is case-folded before generating said first word hash list, such that a specific word-value hash assigned to a specific normalized word does not depend upon whether the characters in said specific normalized word are upper case or lower case characters.
13. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 11, wherein said first word hash list includes representing said word-value hash as an integer.
14. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 13, wherein said integer includes a 32-bit unsigned integer.
15. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 11 , wherein converting said plurality of normalized words into said plurality of word-value hashes includes applying a hash function over all characters of each normalized word.
16. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 11, where an ordering of said plurality of word-value hashes in said word hash list corresponds to an original ordering of said plurality of normalized words.
17. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 11, wherein said converting said plurality of normalized words includes excluding word-value hashes for each of said plurality of normalized words which are stop words.
18. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 17, wherein said stop words include a predefined set of words that occur frequently and do not add substantive content.
19. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 11, wherein said converting said plurality of normalized words includes mapping a predefined set of common words to a unique word- value hash.
20. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 11 , wherein said converting said plurality of normalized words includes mapping a predefined set of synonyms to a unique word- value hash.
21. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 11, wherein said converting said plurality of normalized words includes mapping a predefined set of words in a particular category to a unique word- value hash.
22. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 11, wherein said generating said first set of fingerprints includes: assigning a sliding window of size W, wherein said sliding window is used for reading a W number of said word- value hashes from said first word hash list; using said sliding window to read said W number of said word-level hashes from said first word hash list; designating said word-value hash with a distinct value within said sliding window as an anchor; and generating a fingerprint using a fingerprint hash function, wherein said fingerprint hash function is applied over all said word-value hashes contained within a start of said sliding window to where said anchor resides in said sliding window.
23. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 22, wherein said generating said first set of fingerprints further comprises: moving said sliding window one said word-value hash to the right at a time within said first word hash list; reading said W number of said word-value hashes from said sliding window at said time; designating a second word-value hash with a new distinct value within said sliding window as a new anchor at said time; and generating a new fingerprint at said time using said fingerprint hash function, wherein said new fingerprint is generated only when said new anchor from said sliding window at said time is not identical to said anchor from said sliding window at the immediately previous said time.
24. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 22, wherein said fingerprint hash function includes any hash function that allows said fingerprint to be independent of the order of said words in said first word hash list.
25. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 24, wherein said fingerprint hash function includes a symmetric hash function.
26. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 25, wherein said symmetric hash function uses an addition hash function.
27. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 25, wherein said symmetric hash function uses a multiplication hash function.
28. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 25, wherein said symmetric hash function uses an exclusive-or hash function.
29. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 23, wherein said first set of fingerprints are recorded in a fingerprint database.
30. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 29, wherein said fingerprint database serves as a repository for said first set of fingerprints generated for all said secure information.
31. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 29, wherein said first set of fingerprints are matched against any repository that maintains a database of said first set of fingerprints that were previously generated for all said secure information.
32. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 1, wherein said first set of fingerprints are impervious to derivative works of said first text.
33. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 32, wherein said derivative work includes at least one of: change in the position at which a specific word of said plurality of words appears in said first text; change in the order in which said plurality of words appear in said first text; addition of one or more stop words to said first text; deletion of one or more stop words from said first text; addition of one or more punctuation marks to said first text; deletion of one or more punctuation marks from said first text; substitution of a specific word in said first text with a synonym of said specific word; substitution of a specific word in said first text with a hypernym of said specific word; or substitution of a specific word in said first text with a different word, wherein said different word has the same stem as said specific word.
34. A system to prevent unauthorized disclosure of secure information, the system comprising: means for receiving information including a first text, said first text including a plurality of words; means for normalizing said first text into a first canonical text expression, said first canonical text expression including a plurality of normalized words; means for generating a first word hash list for said first canonical text expression, where said first word hash list is generated at a word level; and means for generating one or more fingerprints for said first word hash list.
35. A system to prevent unauthorized disclosure of secure information as recited in claim 34, wherein each of said plurality of words are defined as a combination of one or more text characters not separated by a specific predefined character, and each of said plurality of words are separated from a previous word and a subsequent word by at least one of said specific predefined character
36. A system to prevent unauthorized disclosure of secure information as recited in claim 35, wherein said specific predefined character includes one or more character types including a space, a period, a comma, a semi-colon, a colon, an exclamation point, a dash, a parenthesis, and a quote.
37. A system to prevent unauthorized disclosure of secure information as recited in claim 34, wherein a word boundary detector is used to detect separation of a specific said word from a following or preceding word of said specific word.
38. A system to prevent unauthorized disclosure of secure information as recited in 37, wherein said word boundary detector is language independent.
39. A system to prevent unauthorized disclosure of secure information as recited in claim 34, wherein said means for receiving said information includes means for receiving information from a local database.
40. A system to prevent unauthorized disclosure of secure information as recited in claim 34, wherein said means for receiving said information includes means for receiving information entered by a user.
41. A system to prevent unauthorized disclosure of secure information as recited in claim 34, wherein said means for generating said first word hash list includes means for converting said plurality of normalized words into a plurality of word-value hashes, wherein each specific one of said word-value hashes represents a specific normalized word.
42. A system to prevent unauthorized disclosure of secure information as recited in claim 41, wherein said means for converting said plurality of normalized words includes one or more of the following: means for performing case-folding; means for removing stop words; means for mapping a predefined set of common words to a unique word-value hash; means for mapping a predefined set of synonyms to a unique word-value hash; means for mapping a predefined set of common words to a unique word-value hash; and means for mapping a predefined set of words in a particular category to a unique word-value hash.
43. A system to prevent unauthorized disclosure of secure information as recited in claim 34, wherein said means for generating said fingerprints includes any hash function that allows said fingerprint to be independent of the order of said words in said first word hash list.
44. A system to prevent unauthorized disclosure of secure information as recited in claim 34, the system further comprising means for recording said fingerprints for secure information in a fingerprint database;
45. A system to prevent unauthorized disclosure of secure information as recited in claim 34, the system further comprising means for monitoring and detecting any unauthorized disclosure of said secure information by a user, by generating said fingerprint for information entered by a user.
46. A system to prevent unauthorized disclosure of secure information as recited in claim 44, the system further comprising means for monitoring and detecting any unauthorized disclosure of said secure information by a user, by generating said fingerprint for information entered by a user and matching that fingerprint against said fingerprints stored in said fingerprint database.
47. A computer implemented method for preventing unauthorized disclosure of secure information, the computer implemented method comprising: storing a plurality of secure text fingerprints for a given organization, wherein each of said plurality of secure text fingerprints is generated using a fixed window word run hashing; receiving a first text that a user desires to transmit outside of said given organization; generating a first set of fingerprints for said first text using said fixed window word run hashing; determining whether any of said first set of fingerprints is identical to any of said plurality of secure text fingerprints; and taking a security action when said first fingerprint is identical to any of said plurality of secure text fingerprints.
48. A computer implemented method for preventing unauthorized disclosure of secure information as recited in 47, wherein said fixed window word run hashing comprises: receiving information including an original text, said original text including a plurality of words ; normalizing said original text into an original canonical text expression, said original canonical text expression including a plurality of normalized words; generating an original word hash list for said original canonical text expression, where said original word hash list is generated at a word level, wherein said original word hash list includes a plurality of word-value hashes; and generating an original set of fingerprints for said original word hash list.
49. A computer implemented method for preventing unauthorized disclosure of secure information as recited in 48, wherein said generating said original set of fingerprints comprises: assigning a sliding window of size W, wherein said sliding window is used for reading a W number of said word-value hashes from said original word hash list; using said sliding window to read said W number of said word-level hashes from said original word hash list; designating said word-value hash with a distinct value within said sliding window as an anchor; and generating a fingerprint using a fingerprint hash function, wherein said fingerprint hash function is applied over said word- value hashes contained within a start of said sliding window to where said anchor resides in said sliding window.
50. A computer implemented method for preventing unauthorized disclosure of secure information as recited in 47, wherein said first text includes at least one of: text contained in an electronic mail; text contained in a file attached to an electronic mail; and text that is transferred using a computer's output device.
51. A computer implemented method for preventing unauthorized disclosure of secure information as recited in 47, wherein said security action includes at least one of: preventing said first text from being transferred; logging the event as a security violation; requiring a password from said user to allow said first text to be transferred; blocking said user's access to said first text; and sending out a security alert.
52. A computer implemented method for preventing unauthorized disclosure of secure information, the computer implemented method comprising: generating a plurality of secure text fingerprints for a given organization, wherein each of said plurality of secure text fingerprints is generated using a fixed window word run hashing.
53. A computer implemented method for preventing unauthorized disclosure of secure information, the computer implemented method comprising: creating a fingerprint database for a given organization, wherein said fingerprint database comprises a plurality of secure text fingerprints for said given organization, and wherein each of said plurality of secure text fingerprints is generated using a fixed window word run hashing.
54. A computer implemented method for preventing unauthorized disclosure of secure information, the computer implemented method comprising: creating a plurality of text fingerprints, wherein said text fingerprints are created for a first text that a user desires to transmit outside of a given organization, and wherein each of said plurality of text fingerprints is generated using a fixed window word run hashing.
55. A computer implemented method for preventing unauthorized disclosure of secure information as recited in claim 54, wherein said first text includes one or more of: text contained in an electronic mail; text contained in a file attached to an electronic mail; and text that is transferred using a computer's output device.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/177,043 US8286171B2 (en) | 2008-07-21 | 2008-07-21 | Methods and systems to fingerprint textual information using word runs |
US12/177,043 | 2008-07-21 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2010011691A2 true WO2010011691A2 (en) | 2010-01-28 |
WO2010011691A3 WO2010011691A3 (en) | 2010-04-22 |
Family
ID=41531435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2009/051313 WO2010011691A2 (en) | 2008-07-21 | 2009-07-21 | Methods and systems to fingerprint textual information using word runs |
Country Status (2)
Country | Link |
---|---|
US (5) | US8286171B2 (en) |
WO (1) | WO2010011691A2 (en) |
Families Citing this family (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8286171B2 (en) | 2008-07-21 | 2012-10-09 | Workshare Technology, Inc. | Methods and systems to fingerprint textual information using word runs |
US8555080B2 (en) | 2008-09-11 | 2013-10-08 | Workshare Technology, Inc. | Methods and systems for protect agents using distributed lightweight fingerprints |
US9092636B2 (en) | 2008-11-18 | 2015-07-28 | Workshare Technology, Inc. | Methods and systems for exact data match filtering |
US8406456B2 (en) | 2008-11-20 | 2013-03-26 | Workshare Technology, Inc. | Methods and systems for image fingerprinting |
US8473847B2 (en) * | 2009-07-27 | 2013-06-25 | Workshare Technology, Inc. | Methods and systems for comparing presentation slide decks |
US20120084868A1 (en) * | 2010-09-30 | 2012-04-05 | International Business Machines Corporation | Locating documents for providing data leakage prevention within an information security management system |
US11030163B2 (en) | 2011-11-29 | 2021-06-08 | Workshare, Ltd. | System for tracking and displaying changes in a set of related electronic documents |
US10783326B2 (en) | 2013-03-14 | 2020-09-22 | Workshare, Ltd. | System for tracking changes in a collaborative document editing environment |
US10025759B2 (en) | 2010-11-29 | 2018-07-17 | Workshare Technology, Inc. | Methods and systems for monitoring documents exchanged over email applications |
US10574729B2 (en) | 2011-06-08 | 2020-02-25 | Workshare Ltd. | System and method for cross platform document sharing |
US10880359B2 (en) | 2011-12-21 | 2020-12-29 | Workshare, Ltd. | System and method for cross platform document sharing |
US9948676B2 (en) | 2013-07-25 | 2018-04-17 | Workshare, Ltd. | System and method for securing documents prior to transmission |
US9613340B2 (en) | 2011-06-14 | 2017-04-04 | Workshare Ltd. | Method and system for shared document approval |
US10963584B2 (en) | 2011-06-08 | 2021-03-30 | Workshare Ltd. | Method and system for collaborative editing of a remotely stored document |
US9170990B2 (en) | 2013-03-14 | 2015-10-27 | Workshare Limited | Method and system for document retrieval with selective document comparison |
US8612754B2 (en) * | 2011-06-14 | 2013-12-17 | At&T Intellectual Property I, L.P. | Digital fingerprinting via SQL filestream with common text exclusion |
US9047304B2 (en) * | 2011-11-28 | 2015-06-02 | International Business Machines Corporation | Optimization of fingerprint-based deduplication |
WO2013137864A1 (en) | 2012-03-13 | 2013-09-19 | Hewlett-Packard Development Company, L.P. | Submatch extraction |
US9558299B2 (en) | 2012-04-30 | 2017-01-31 | Hewlett Packard Enterprise Development Lp | Submatch extraction |
US8725749B2 (en) | 2012-07-24 | 2014-05-13 | Hewlett-Packard Development Company, L.P. | Matching regular expressions including word boundary symbols |
US9613641B2 (en) * | 2013-03-13 | 2017-04-04 | Nuance Communications, Inc. | Identifying corresponding positions in different representations of a textual work |
US11567907B2 (en) | 2013-03-14 | 2023-01-31 | Workshare, Ltd. | Method and system for comparing document versions encoded in a hierarchical representation |
US10911492B2 (en) | 2013-07-25 | 2021-02-02 | Workshare Ltd. | System and method for securing documents prior to transmission |
EP3913500A1 (en) | 2013-11-08 | 2021-11-24 | Friend For Media Limited | Identifying media components |
US10133723B2 (en) | 2014-12-29 | 2018-11-20 | Workshare Ltd. | System and method for determining document version geneology |
US11182551B2 (en) | 2014-12-29 | 2021-11-23 | Workshare Ltd. | System and method for determining document version geneology |
US9858349B2 (en) * | 2015-02-10 | 2018-01-02 | Researchgate Gmbh | Online publication system and method |
US10282424B2 (en) | 2015-05-19 | 2019-05-07 | Researchgate Gmbh | Linking documents using citations |
US11763013B2 (en) | 2015-08-07 | 2023-09-19 | Workshare, Ltd. | Transaction document management system and method |
CN105871749A (en) * | 2015-11-16 | 2016-08-17 | 乐视致新电子科技(天津)有限公司 | Network access control method and system based on router, and related device |
CN105589962B (en) * | 2015-12-22 | 2018-11-02 | 北京奇虎科技有限公司 | A kind of generation method and device of text fingerprints information |
US10148664B2 (en) | 2016-08-16 | 2018-12-04 | Paypal, Inc. | Utilizing transport layer security (TLS) fingerprints to determine agents and operating systems |
GB201708767D0 (en) * | 2017-06-01 | 2017-07-19 | Microsoft Technology Licensing Llc | Managing electronic documents |
US10698876B2 (en) * | 2017-08-11 | 2020-06-30 | Micro Focus Llc | Distinguish phrases in displayed content |
CN107562893A (en) * | 2017-09-06 | 2018-01-09 | 叶进蓉 | A kind of multi-dimensional data duplicate removal method and system being used in network log file |
US10957445B2 (en) | 2017-10-05 | 2021-03-23 | Hill-Rom Services, Inc. | Caregiver and staff information system |
US10839135B1 (en) * | 2018-01-03 | 2020-11-17 | Amazon Technologies, Inc. | Detection of access to text-based transmissions |
CN109495456B (en) * | 2018-10-30 | 2021-06-11 | 腾讯科技(深圳)有限公司 | Verification method and device of matching module and readable storage medium |
JP7445135B2 (en) * | 2020-08-27 | 2024-03-07 | 富士通株式会社 | Communication program, communication device, communication method, and communication system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060005247A1 (en) * | 2004-06-30 | 2006-01-05 | Microsoft Corporation | Method and system for detecting when an outgoing communication contains certain content |
US20060112120A1 (en) * | 2004-11-22 | 2006-05-25 | International Business Machines Corporation | Method, system, and computer program product for threading documents using body text analysis |
US20070005589A1 (en) * | 2005-07-01 | 2007-01-04 | Sreenivas Gollapudi | Method and apparatus for document clustering and document sketching |
US20080033913A1 (en) * | 2006-05-26 | 2008-02-07 | Winburn Michael L | Techniques for Preventing Insider Theft of Electronic Documents |
KR20080029602A (en) * | 2006-09-29 | 2008-04-03 | 한국전자통신연구원 | Method and apparatus for preventing confidential information leak |
Family Cites Families (330)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4479195A (en) | 1982-09-07 | 1984-10-23 | At&T Bell Laboratories | Data conference system |
USRE35861E (en) | 1986-03-12 | 1998-07-28 | Advanced Software, Inc. | Apparatus and method for comparing data groups |
US5072412A (en) | 1987-03-25 | 1991-12-10 | Xerox Corporation | User interface with multiple workspaces for sharing display system objects |
US5008853A (en) | 1987-12-02 | 1991-04-16 | Xerox Corporation | Representation of collaborative multi-user activities relative to shared structured data objects in a networked workstation environment |
US5220657A (en) | 1987-12-02 | 1993-06-15 | Xerox Corporation | Updating local copy of shared data in a collaborative system |
US4853961A (en) | 1987-12-18 | 1989-08-01 | Pitney Bowes Inc. | Reliable document authentication system |
US4949300A (en) | 1988-01-07 | 1990-08-14 | International Business Machines Corporation | Sharing word-processing functions among multiple processors |
DE68926446T2 (en) | 1989-03-14 | 1996-12-05 | Ibm | Electronic document approval system |
US5245553A (en) | 1989-12-14 | 1993-09-14 | Options Unlimited Research | Full-duplex video communication and document generation system |
JP2793308B2 (en) | 1989-12-21 | 1998-09-03 | 株式会社日立製作所 | Dialogue system |
JP3161725B2 (en) | 1990-11-21 | 2001-04-25 | 株式会社日立製作所 | Workstations and collaborative information processing systems |
EP0538464B1 (en) | 1991-05-08 | 1998-12-30 | Digital Equipment Corporation | License management system |
US5293619A (en) | 1991-05-30 | 1994-03-08 | Sandia Corporation | Method and apparatus for collaborative use of application program |
US5671428A (en) | 1991-08-28 | 1997-09-23 | Kabushiki Kaisha Toshiba | Collaborative document processing system with version and comment management |
US5446842A (en) | 1993-02-26 | 1995-08-29 | Taligent, Inc. | Object-oriented collaboration system |
US5608872A (en) | 1993-03-19 | 1997-03-04 | Ncr Corporation | System for allowing all remote computers to perform annotation on an image and replicating the annotated image on the respective displays of other comuters |
US5544352A (en) * | 1993-06-14 | 1996-08-06 | Libertech, Inc. | Method and apparatus for indexing, searching and displaying data |
US5689641A (en) | 1993-10-01 | 1997-11-18 | Vicor, Inc. | Multimedia collaboration system arrangement for routing compressed AV signal through a participant site without decompressing the AV signal |
JP2906949B2 (en) * | 1993-10-27 | 1999-06-21 | 富士ゼロックス株式会社 | Hypertext device |
US6122403A (en) | 1995-07-27 | 2000-09-19 | Digimarc Corporation | Computer system linked by using information in data objects |
US7113615B2 (en) | 1993-11-18 | 2006-09-26 | Digimarc Corporation | Watermark embedder and reader |
JP3287679B2 (en) | 1993-12-28 | 2002-06-04 | キヤノン株式会社 | Document processing apparatus and method |
US5806078A (en) * | 1994-06-09 | 1998-09-08 | Softool Corporation | Version management system |
US5801702A (en) * | 1995-03-09 | 1998-09-01 | Terrabyte Technology | System and method for adding network links in a displayed hierarchy |
US5757669A (en) * | 1995-05-31 | 1998-05-26 | Netscape Communications Corporation | Method and apparatus for workgroup information replication |
US5619649A (en) | 1995-06-12 | 1997-04-08 | Xerox Corporation | Network printing system for programming a print job by selecting a job ticket identifier associated with remotely stored predefined document processing control instructions |
US5699427A (en) | 1995-06-23 | 1997-12-16 | International Business Machines Corporation | Method to deter document and intellectual property piracy through individualization |
IL114361A (en) | 1995-06-27 | 1998-08-16 | Veritas Technology Solutions L | File encryption method |
JP3298379B2 (en) | 1995-09-20 | 2002-07-02 | 株式会社日立製作所 | Electronic approval method and system |
US5787175A (en) | 1995-10-23 | 1998-07-28 | Novell, Inc. | Method and apparatus for collaborative document control |
US6366933B1 (en) * | 1995-10-27 | 2002-04-02 | At&T Corp. | Method and apparatus for tracking and viewing changes on the web |
US5727197A (en) * | 1995-11-01 | 1998-03-10 | Filetek, Inc. | Method and apparatus for segmenting a database |
US5855020A (en) * | 1996-02-21 | 1998-12-29 | Infoseek Corporation | Web scan process |
US5673316A (en) | 1996-03-29 | 1997-09-30 | International Business Machines Corporation | Creation and distribution of cryptographic envelope |
US5890176A (en) | 1996-04-24 | 1999-03-30 | International Business Machines Corp. | Object-oriented document version tracking method and apparatus |
US5890177A (en) | 1996-04-24 | 1999-03-30 | International Business Machines Corporation | Method and apparatus for consolidating edits made by multiple editors working on multiple document copies |
US6189019B1 (en) * | 1996-08-14 | 2001-02-13 | Microsoft Corporation | Computer system and computer-implemented process for presenting document connectivity |
JPH10105550A (en) * | 1996-10-02 | 1998-04-24 | Matsushita Electric Ind Co Ltd | Hyper-text document preparing device |
US5832529A (en) * | 1996-10-11 | 1998-11-03 | Sun Microsystems, Inc. | Methods, apparatus, and product for distributed garbage collection |
US20060129627A1 (en) | 1996-11-22 | 2006-06-15 | Mangosoft Corp. | Internet-based shared file service with native PC client access and semantics and distributed version control |
JP2815045B2 (en) | 1996-12-16 | 1998-10-27 | 日本電気株式会社 | Image feature extraction device, image feature analysis device, and image matching system |
US6003060A (en) * | 1996-12-20 | 1999-12-14 | International Business Machines Corporation | Method and apparatus to share resources while processing multiple priority data flows |
US5874953A (en) | 1996-12-31 | 1999-02-23 | International Business Machines Corporation | Database graphical user interface with outline view |
US6285999B1 (en) * | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
US5898836A (en) * | 1997-01-14 | 1999-04-27 | Netmind Services, Inc. | Change-detection tool indicating degree and location of change of internet documents by comparison of cyclic-redundancy-check(CRC) signatures |
US6012087A (en) * | 1997-01-14 | 2000-01-04 | Netmind Technologies, Inc. | Unique-change detection of dynamic web pages using history tables of signatures |
US6009173A (en) | 1997-01-31 | 1999-12-28 | Motorola, Inc. | Encryption and decryption method and apparatus |
US5877766A (en) * | 1997-08-15 | 1999-03-02 | International Business Machines Corporation | Multi-node user interface component and method thereof for use in accessing a plurality of linked records |
US6327611B1 (en) | 1997-11-12 | 2001-12-04 | Netscape Communications Corporation | Electronic document routing system |
US6067551A (en) | 1997-11-14 | 2000-05-23 | Microsoft Corporation | Computer implemented method for simultaneous multi-user editing of a document |
US6243091B1 (en) * | 1997-11-21 | 2001-06-05 | International Business Machines Corporation | Global history view |
US6088702A (en) | 1998-02-25 | 2000-07-11 | Plantz; Scott H. | Group publishing system |
US6189146B1 (en) | 1998-03-18 | 2001-02-13 | Microsoft Corporation | System and method for software licensing |
US6216112B1 (en) | 1998-05-27 | 2001-04-10 | William H. Fuller | Method for software distribution and compensation with replenishable advertisements |
US6219652B1 (en) | 1998-06-01 | 2001-04-17 | Novell, Inc. | Network license authentication |
US6424966B1 (en) * | 1998-06-30 | 2002-07-23 | Microsoft Corporation | Synchronizing crawler with notification source |
US6594662B1 (en) * | 1998-07-01 | 2003-07-15 | Netshadow, Inc. | Method and system for gathering information resident on global computer networks |
US6169976B1 (en) | 1998-07-02 | 2001-01-02 | Encommerce, Inc. | Method and apparatus for regulating the use of licensed products |
US6275850B1 (en) | 1998-07-24 | 2001-08-14 | Siemens Information And Communication Networks, Inc. | Method and system for management of message attachments |
US6658626B1 (en) | 1998-07-31 | 2003-12-02 | The Regents Of The University Of California | User interface for displaying document comparison information |
GB2341249A (en) | 1998-08-17 | 2000-03-08 | Connected Place Limited | A method of generating a difference file defining differences between an updated file and a base file |
WO2000017775A2 (en) | 1998-09-22 | 2000-03-30 | Science Applications International Corporation | User-defined dynamic collaborative environments |
US6145084A (en) | 1998-10-08 | 2000-11-07 | Net I Trust | Adaptive communication system enabling dissimilar devices to exchange information over a network |
US6424996B1 (en) * | 1998-11-25 | 2002-07-23 | Nexsys Electronics, Inc. | Medical network system and method for transfer of information |
US6918082B1 (en) | 1998-12-17 | 2005-07-12 | Jeffrey M. Gross | Electronic document proofing system |
US6418433B1 (en) * | 1999-01-28 | 2002-07-09 | International Business Machines Corporation | System and method for focussed web crawling |
US6301368B1 (en) | 1999-01-29 | 2001-10-09 | International Business Machines Corporation | System and method for data hiding in compressed fingerprint images |
US6584466B1 (en) | 1999-04-07 | 2003-06-24 | Critical Path, Inc. | Internet document management system and methods |
US6317777B1 (en) | 1999-04-26 | 2001-11-13 | Intel Corporation | Method for web based storage and retrieval of documents |
US8099758B2 (en) | 1999-05-12 | 2012-01-17 | Microsoft Corporation | Policy based composite file system and method |
US6212534B1 (en) | 1999-05-13 | 2001-04-03 | X-Collaboration Software Corp. | System and method for facilitating collaboration in connection with generating documents among a plurality of operators using networked computer systems |
US7857201B2 (en) * | 1999-05-25 | 2010-12-28 | Silverbrook Research Pty Ltd | Method and system for selection |
US6405219B2 (en) | 1999-06-22 | 2002-06-11 | F5 Networks, Inc. | Method and system for automatically updating the version of a set of files stored on content servers |
US6547829B1 (en) * | 1999-06-30 | 2003-04-15 | Microsoft Corporation | Method and system for detecting duplicate documents in web crawls |
US6356937B1 (en) | 1999-07-06 | 2002-03-12 | David Montville | Interoperable full-featured web-based and client-side e-mail system |
US6591289B1 (en) | 1999-07-27 | 2003-07-08 | The Standard Register Company | Method of delivering formatted documents over a communications network |
US6560620B1 (en) | 1999-08-03 | 2003-05-06 | Aplix Research, Inc. | Hierarchical document comparison system and method |
US6662212B1 (en) | 1999-08-31 | 2003-12-09 | Qualcomm Incorporated | Synchronization of a virtual workspace using E-mail extensions |
US6449624B1 (en) | 1999-10-18 | 2002-09-10 | Fisher-Rosemount Systems, Inc. | Version control and audit trail in a process control system |
US6351755B1 (en) * | 1999-11-02 | 2002-02-26 | Alta Vista Company | System and method for associating an extensible set of data with documents downloaded by a web crawler |
US6377984B1 (en) * | 1999-11-02 | 2002-04-23 | Alta Vista Company | Web crawler system using parallel queues for queing data sets having common address and concurrently downloading data associated with data set in each queue |
US6263364B1 (en) * | 1999-11-02 | 2001-07-17 | Alta Vista Company | Web crawler system using plurality of parallel priority level queues having distinct associated download priority levels for prioritizing document downloading and maintaining document freshness |
US6321265B1 (en) * | 1999-11-02 | 2001-11-20 | Altavista Company | System and method for enforcing politeness while scheduling downloads in a web crawler |
US6418453B1 (en) * | 1999-11-03 | 2002-07-09 | International Business Machines Corporation | Network repository service for efficient web crawling |
US7321864B1 (en) | 1999-11-04 | 2008-01-22 | Jpmorgan Chase Bank, N.A. | System and method for providing funding approval associated with a project based on a document collection |
US6614789B1 (en) | 1999-12-29 | 2003-09-02 | Nasser Yazdani | Method of and apparatus for matching strings of different lengths |
AU2000243591A1 (en) | 2000-01-14 | 2001-07-24 | Critical Path Inc. | Secure management of electronic documents in a networked environment |
EP1254547B1 (en) | 2000-02-08 | 2005-11-23 | Swisscom Mobile AG | Single sign-on process |
US7111060B2 (en) | 2000-03-14 | 2006-09-19 | Aep Networks, Inc. | Apparatus and accompanying methods for providing, through a centralized server site, a secure, cost-effective, web-enabled, integrated virtual office environment remotely accessible through a network-connected web browser |
US6643661B2 (en) * | 2000-04-27 | 2003-11-04 | Brio Software, Inc. | Method and apparatus for implementing search and channel features in an enterprise-wide computer system |
US6556982B1 (en) | 2000-04-28 | 2003-04-29 | Bwxt Y-12, Llc | Method and system for analyzing and classifying electronic information |
WO2001088750A1 (en) | 2000-05-16 | 2001-11-22 | Carroll Garrett O | A document processing system and method |
US20020063154A1 (en) | 2000-05-26 | 2002-05-30 | Hector Hoyos | Security system database management |
WO2001093655A2 (en) | 2000-06-05 | 2001-12-13 | Shiman Associates, Inc. | Method and apparatus for managing documents in a centralized document repository system |
US6963975B1 (en) | 2000-08-11 | 2005-11-08 | Microsoft Corporation | System and method for audio fingerprinting |
AU2001275982A1 (en) * | 2000-07-20 | 2002-02-05 | Rodney D. Johnson | Information archival and retrieval system for internetworked computers |
US6618717B1 (en) * | 2000-07-31 | 2003-09-09 | Eliyon Technologies Corporation | Computer method and apparatus for determining content owner of a website |
CA2424713C (en) | 2000-08-21 | 2007-12-04 | Thoughtslinger Corporation | Simultaneous multi-user document editing system |
JP2002176671A (en) | 2000-09-28 | 2002-06-21 | Takashi Fujimoto | Mobile phone |
GB2368670A (en) * | 2000-11-03 | 2002-05-08 | Envisional Software Solutions | Data acquisition system |
US7903822B1 (en) | 2000-11-10 | 2011-03-08 | DMT Licensing, LLC. | Method and system for establishing a trusted and decentralized peer-to-peer network |
US7003551B2 (en) | 2000-11-30 | 2006-02-21 | Bellsouth Intellectual Property Corp. | Method and apparatus for minimizing storage of common attachment files in an e-mail communications server |
US20020099602A1 (en) * | 2000-12-04 | 2002-07-25 | Paul Moskowitz | Method and system to provide web site schedules |
US20020073188A1 (en) * | 2000-12-07 | 2002-06-13 | Rawson Freeman Leigh | Method and apparatus for partitioning system management information for a server farm among a plurality of leaseholds |
US7356704B2 (en) | 2000-12-07 | 2008-04-08 | International Business Machines Corporation | Aggregated authenticated identity apparatus for and method therefor |
US6825844B2 (en) | 2001-01-16 | 2004-11-30 | Microsoft Corp | System and method for optimizing a graphics intensive software program for the user's graphics hardware |
US20020129062A1 (en) * | 2001-03-08 | 2002-09-12 | Wood River Technologies, Inc. | Apparatus and method for cataloging data |
US6820081B1 (en) | 2001-03-19 | 2004-11-16 | Attenex Corporation | System and method for evaluating a structured message store for message redundancy |
US8660017B2 (en) | 2001-03-20 | 2014-02-25 | Verizon Business Global Llc | Systems and methods for updating IP communication service attributes using an LDAP |
US7047406B2 (en) | 2001-03-21 | 2006-05-16 | Qurlo Holdings, Inc. | Method and system for providing a secure peer-to-peer file delivery network |
US7181017B1 (en) | 2001-03-23 | 2007-02-20 | David Felsher | System and method for secure three-party communications |
US7107518B2 (en) | 2001-04-03 | 2006-09-12 | Microsoft Corporation | Automating a document review cycle |
EP1490767B1 (en) | 2001-04-05 | 2014-06-11 | Audible Magic Corporation | Copyright detection and protection system and method |
KR20010078840A (en) | 2001-04-17 | 2001-08-22 | 유성경 | Security System detecting the leak of information using computer storage device |
US7428636B1 (en) | 2001-04-26 | 2008-09-23 | Vmware, Inc. | Selective encryption system and method for I/O operations |
JP2002329183A (en) | 2001-04-27 | 2002-11-15 | Matsushita Electric Ind Co Ltd | Pc card |
US6778688B2 (en) | 2001-05-04 | 2004-08-17 | International Business Machines Corporation | Remote authentication of fingerprints over an insecure network |
US6961723B2 (en) * | 2001-05-04 | 2005-11-01 | Sun Microsystems, Inc. | System and method for determining relevancy of query responses in a distributed network search mechanism |
AU2002257262A1 (en) | 2001-05-09 | 2003-03-10 | Core Ipr Limited | Method and system for facilitating creation, presentation, exchange, and management of documents to facilitate business transactions |
WO2002101577A1 (en) | 2001-06-07 | 2002-12-19 | Contentguard Holdings, Inc. | Method and system for subscription digital rights management |
US7562112B2 (en) | 2001-07-06 | 2009-07-14 | Intel Corporation | Method and apparatus for peer-to-peer services for efficient transfer of information between networks |
US7194513B2 (en) | 2001-07-08 | 2007-03-20 | Imran Sharif | System and method for using an internet appliance to send/receive digital content files as E-mail attachments |
US7006673B2 (en) * | 2001-07-25 | 2006-02-28 | Activcard Ireland Limited | Method of hash string extraction |
US7212955B2 (en) | 2001-08-16 | 2007-05-01 | Hewlett-Packard Development Company, L.P. | Consumer product status monitoring |
US7266699B2 (en) | 2001-08-30 | 2007-09-04 | Application Security, Inc. | Cryptographic infrastructure for encrypting a database |
US7124362B2 (en) | 2001-08-31 | 2006-10-17 | Robert Tischer | Method and system for producing an ordered compilation of information with more than one author contributing information contemporaneously |
US20030061260A1 (en) * | 2001-09-25 | 2003-03-27 | Timesys Corporation | Resource reservation and priority management |
JP3879594B2 (en) | 2001-11-02 | 2007-02-14 | 日本電気株式会社 | Switch method, apparatus and program |
US6738762B1 (en) * | 2001-11-26 | 2004-05-18 | At&T Corp. | Multidimensional substring selectivity estimation using set hashing of cross-counts |
US6915333B2 (en) | 2001-12-14 | 2005-07-05 | International Business Machines Corporation | Method of managing attached document |
US20030112273A1 (en) | 2001-12-17 | 2003-06-19 | Workshare Technology, Ltd. | Document collaboration suite using a common database |
US7496841B2 (en) | 2001-12-17 | 2009-02-24 | Workshare Technology, Ltd. | Method and system for document collaboration |
WO2003058519A2 (en) | 2002-01-08 | 2003-07-17 | Sap Aktiengesellschaft | Enhanced email management system |
US20030131005A1 (en) * | 2002-01-10 | 2003-07-10 | International Business Machines Corporation | Method and apparatus for automatic pruning of search engine indices |
GB0202431D0 (en) | 2002-02-02 | 2002-03-20 | F Secure Oyj | Method and apparatus for encrypting data |
US7299504B1 (en) | 2002-03-08 | 2007-11-20 | Lucent Technologies Inc. | System and method for implementing security management using a database-modeled security policy |
US6971017B2 (en) | 2002-04-16 | 2005-11-29 | Xerox Corporation | Ad hoc secure access to documents and services |
US7353455B2 (en) | 2002-05-21 | 2008-04-01 | At&T Delaware Intellectual Property, Inc. | Caller initiated distinctive presence alerting and auto-response messaging |
US7274807B2 (en) | 2002-05-30 | 2007-09-25 | Activcard Ireland Limited | Method and apparatus for supporting a biometric registration performed on a card |
US7437664B2 (en) | 2002-06-18 | 2008-10-14 | Microsoft Corporation | Comparing hierarchically-structured documents |
US6946715B2 (en) * | 2003-02-19 | 2005-09-20 | Micron Technology, Inc. | CMOS image sensor and method of fabrication |
FR2841673B1 (en) * | 2002-06-26 | 2004-12-03 | Solystic | TIMING OF POSTAL OBJECTS BY IMAGE SIGNATURE AND ASSOCIATED SORTING MACHINE |
US7733366B2 (en) * | 2002-07-01 | 2010-06-08 | Microsoft Corporation | Computer network-based, interactive, multimedia learning system and process |
US20040031052A1 (en) | 2002-08-12 | 2004-02-12 | Liberate Technologies | Information platform |
WO2004031905A2 (en) | 2002-09-30 | 2004-04-15 | Interface Software, Inc. | Managing changes in a relationship management system |
WO2004032435A1 (en) | 2002-10-03 | 2004-04-15 | In4S Inc. | Bit string check method and device |
US7818678B2 (en) | 2002-10-31 | 2010-10-19 | Litera Technology Llc | Collaborative document development and review system |
EP1565019A1 (en) | 2002-11-14 | 2005-08-17 | Omron Corporation | Information distribution system, information acquisition device, information distribution server, information reproduction device, information reproduction method, information distribution control method, information distribution control program, and computer-readable recording medium |
KR100458543B1 (en) | 2002-11-30 | 2004-12-03 | 삼성에스디에스 주식회사 | Comparing method of 2d cad file using graphic type |
AU2002953325A0 (en) | 2002-12-13 | 2003-01-09 | Executive Computing Holdings Pty Ltd | Means for providing protection for digital assets |
US20040122659A1 (en) | 2002-12-23 | 2004-06-24 | Hourihane John Philip | Tool and method for managing web pages in different languages |
JP2004265267A (en) | 2003-03-04 | 2004-09-24 | Sharp Corp | Face authentication method and face authentication device |
US20060259524A1 (en) | 2003-03-17 | 2006-11-16 | Horton D T | Systems and methods for document project management, conversion, and filing |
US7113948B2 (en) | 2003-03-21 | 2006-09-26 | Acellion Pte Ltd. | Methods and systems for email attachment distribution and management |
KR100390172B1 (en) | 2003-03-22 | 2003-07-04 | Knowledge Info Net Service Inc | Method and system for controlling internet contents providing service using redirection method |
US7188316B2 (en) | 2003-03-24 | 2007-03-06 | Microsoft Corporation | System and method for viewing and editing multi-value properties |
US7991751B2 (en) | 2003-04-02 | 2011-08-02 | Portauthority Technologies Inc. | Method and a system for information identification |
US7716742B1 (en) | 2003-05-12 | 2010-05-11 | Sourcefire, Inc. | Systems and methods for determining characteristics of a network and analyzing vulnerabilities |
US20040261016A1 (en) | 2003-06-20 | 2004-12-23 | Miavia, Inc. | System and method for associating structured and manually selected annotations with electronic document contents |
EP1507402A3 (en) | 2003-06-23 | 2005-07-20 | Ricoh Company, Ltd. | Access control decision system, access control enforcing system, and security policy |
US8042112B1 (en) * | 2003-07-03 | 2011-10-18 | Google Inc. | Scheduler for search engine crawler |
US7627613B1 (en) * | 2003-07-03 | 2009-12-01 | Google Inc. | Duplicate document detection in a web crawler system |
US20050021637A1 (en) | 2003-07-22 | 2005-01-27 | Red Hat, Inc. | Electronic mail control system |
US7836010B2 (en) | 2003-07-30 | 2010-11-16 | Northwestern University | Method and system for assessing relevant properties of work contexts for use by information services |
US7171618B2 (en) | 2003-07-30 | 2007-01-30 | Xerox Corporation | Multi-versioned documents and method for creation and use thereof |
US20050033811A1 (en) | 2003-08-07 | 2005-02-10 | International Business Machines Corporation | Collaborative email |
US8458033B2 (en) | 2003-08-11 | 2013-06-04 | Dropbox, Inc. | Determining the relevance of offers |
US20050048648A1 (en) | 2003-08-29 | 2005-03-03 | Ye Fang | Compositions & methods for reformulating biological membranes for arrays |
US8145543B2 (en) | 2003-10-17 | 2012-03-27 | International Business Machines Corporation | Method, system and program product for approving item requests |
US20130212707A1 (en) | 2003-10-31 | 2013-08-15 | James Donahue | Document control system |
US20050138540A1 (en) * | 2003-12-22 | 2005-06-23 | Xerox Corporation | Systems and methods for user-specific document change highlighting |
WO2005103878A2 (en) | 2004-04-26 | 2005-11-03 | Storewiz, Inc. | Method and system for compression of files for storage and operation on compressed files |
US20050268327A1 (en) | 2004-05-14 | 2005-12-01 | Secure Communications Technology, Llc | Enhanced electronic mail security system and method |
GB0411560D0 (en) | 2004-05-24 | 2004-06-23 | Protx Group Ltd | A method of encrypting and transferring data between a sender and a receiver using a network |
US7606821B2 (en) | 2004-06-30 | 2009-10-20 | Ebay Inc. | Method and system for preventing fraudulent activities |
EP1619611A1 (en) | 2004-07-22 | 2006-01-25 | Sap Ag | Technique for processing electronic documents in a computer network |
US7373586B2 (en) | 2004-09-03 | 2008-05-13 | International Business Machines Corporation | Differencing and merging tree-structured documents |
JP2006086637A (en) | 2004-09-14 | 2006-03-30 | Sony Corp | Information processing system, method therefor, and program |
US20060069605A1 (en) | 2004-09-29 | 2006-03-30 | Microsoft Corporation | Workflow association in a collaborative application |
JP4639734B2 (en) * | 2004-09-30 | 2011-02-23 | 富士ゼロックス株式会社 | Slide content processing apparatus and program |
US7454778B2 (en) | 2004-09-30 | 2008-11-18 | Microsoft Corporation | Enforcing rights management through edge email servers |
US7152019B2 (en) | 2004-11-30 | 2006-12-19 | Oracle International Corporation | Systems and methods for sensor-based computing |
US7734670B2 (en) | 2004-12-15 | 2010-06-08 | Microsoft Corporation | Actionable email documents |
US7716162B2 (en) | 2004-12-30 | 2010-05-11 | Google Inc. | Classification of ambiguous geographic references |
US7664323B2 (en) * | 2005-01-28 | 2010-02-16 | Microsoft Corporation | Scalable hash-based character recognition |
US9734139B2 (en) | 2005-02-14 | 2017-08-15 | Cluster Seven Limited | Auditing and tracking changes of data and code in spreadsheets and other documents |
US8011003B2 (en) | 2005-02-14 | 2011-08-30 | Symantec Corporation | Method and apparatus for handling messages containing pre-selected data |
US20060218004A1 (en) | 2005-03-23 | 2006-09-28 | Dworkin Ross E | On-line slide kit creation and collaboration system |
US20060236246A1 (en) | 2005-03-23 | 2006-10-19 | Bono Charles A | On-line slide kit creation and collaboration system |
US7526812B2 (en) | 2005-03-24 | 2009-04-28 | Xerox Corporation | Systems and methods for manipulating rights management data |
US7680785B2 (en) * | 2005-03-25 | 2010-03-16 | Microsoft Corporation | Systems and methods for inferring uniform resource locator (URL) normalization rules |
US20060261112A1 (en) | 2005-04-20 | 2006-11-23 | Gates George D | ATV mounting bracket and associated methods |
US8140664B2 (en) | 2005-05-09 | 2012-03-20 | Trend Micro Incorporated | Graphical user interface based sensitive information and internal information vulnerability management system |
US20090222450A1 (en) | 2005-05-16 | 2009-09-03 | Ron Zigelman | System and a method for transferring email file attachments over a telecommunication network using a peer-to-peer connection |
US20060271947A1 (en) | 2005-05-23 | 2006-11-30 | Lienhart Rainer W | Creating fingerprints |
US20060277229A1 (en) | 2005-05-31 | 2006-12-07 | Michihiro Yoshida | Document management server, information terminal, document managing method, and program |
WO2006137057A2 (en) * | 2005-06-21 | 2006-12-28 | Onigma Ltd. | A method and a system for providing comprehensive protection against leakage of sensitive information assets using host based agents, content- meta-data and rules-based policies |
US7493561B2 (en) | 2005-06-24 | 2009-02-17 | Microsoft Corporation | Storage and utilization of slide presentation slides |
US7590939B2 (en) | 2005-06-24 | 2009-09-15 | Microsoft Corporation | Storage and utilization of slide presentation slides |
US7724717B2 (en) | 2005-07-22 | 2010-05-25 | Sri International | Method and apparatus for wireless network security |
US20070027830A1 (en) | 2005-07-29 | 2007-02-01 | Microsoft Corporation | Dynamic content development based on user feedback |
US8201254B1 (en) | 2005-08-30 | 2012-06-12 | Symantec Corporation | Detection of e-mail threat acceleration |
US7624447B1 (en) | 2005-09-08 | 2009-11-24 | Cisco Technology, Inc. | Using threshold lists for worm detection |
US7877790B2 (en) | 2005-10-31 | 2011-01-25 | At&T Intellectual Property I, L.P. | System and method of using personal data |
US7890752B2 (en) | 2005-10-31 | 2011-02-15 | Scenera Technologies, Llc | Methods, systems, and computer program products for associating an originator of a network packet with the network packet using biometric information |
US7970834B2 (en) | 2005-11-03 | 2011-06-28 | International Business Machines Corporation | Method and program product for tracking a file attachment in an e-mail |
KR100751691B1 (en) | 2005-11-08 | 2007-08-23 | 삼성에스디에스 주식회사 | Method for modifying a great number of powerpoint document |
US20070112854A1 (en) | 2005-11-12 | 2007-05-17 | Franca Paulo B | Apparatus and method for automatic generation and distribution of documents |
US7650387B2 (en) | 2005-11-15 | 2010-01-19 | Cisco Technology, Inc. | Method and system for managing storage on a shared storage space |
GB0523703D0 (en) | 2005-11-22 | 2005-12-28 | Ibm | Collaborative editing of a document |
US20070179967A1 (en) | 2005-11-22 | 2007-08-02 | Zhang Xiaoge G | Intuitive and Dynamic File Retrieval Method and User Interface System |
US20070156785A1 (en) | 2006-01-03 | 2007-07-05 | Hines Wallis G Iii | Method and system for revising manuals |
US7958101B1 (en) | 2006-01-03 | 2011-06-07 | Emc Corporation | Methods and apparatus for mounting a file system |
US20070192728A1 (en) * | 2006-01-26 | 2007-08-16 | Finley William D | Method for dynamic document navigation |
US7818660B2 (en) | 2006-01-29 | 2010-10-19 | Litera Technology Llc | Method of compound document comparison |
EP1984866B1 (en) | 2006-02-07 | 2011-11-02 | Nextenders (India) Private Limited | Document security management system |
US20070220068A1 (en) | 2006-02-15 | 2007-09-20 | Bruce Thompson | Electronic document and business process control |
US8005277B2 (en) * | 2006-03-03 | 2011-08-23 | Research Foundation-State University of NY | Secure fingerprint matching by hashing localized information |
FR2898523B1 (en) | 2006-03-14 | 2009-02-27 | Alstom Power Conversion Sa | METHOD FOR ROLLING A TAPE |
JP4348353B2 (en) | 2006-04-04 | 2009-10-21 | 日本電信電話株式会社 | Pattern recognition apparatus, pattern recognition method, and recording medium storing program realizing the method |
US7428306B2 (en) | 2006-04-18 | 2008-09-23 | International Business Machines Corporation | Encryption apparatus and method for providing an encrypted file system |
US20070261099A1 (en) | 2006-05-02 | 2007-11-08 | Broussard Scott J | Confidential content reporting system and method with electronic mail verification functionality |
US7890612B2 (en) | 2006-05-08 | 2011-02-15 | Electro Guard Corp. | Method and apparatus for regulating data flow between a communications device and a network |
CA2651644C (en) | 2006-05-10 | 2016-09-13 | Margaret Atwood | System, method and computer program, for enabling entry into transactions on a remote basis |
US20070294612A1 (en) | 2006-06-20 | 2007-12-20 | Microsoft Corporation | Comparing and Managing Multiple Presentations |
WO2007149942A2 (en) | 2006-06-20 | 2007-12-27 | Freightdesk Technologies (Assignee) | Auditing, tracking, or inspection of data, objects, or their modifications |
US7613770B2 (en) | 2006-06-30 | 2009-11-03 | Microsoft Corporation | On-demand file transfers for mass P2P file sharing |
CA2554991A1 (en) | 2006-07-28 | 2008-01-28 | Ibm Canada Limited - Ibm Canada Limitee | System and method for distributing email attachments |
US20080040388A1 (en) | 2006-08-04 | 2008-02-14 | Jonah Petri | Methods and systems for tracking document lineage |
US20080046518A1 (en) | 2006-08-16 | 2008-02-21 | James I Tonnison | Enhanced E-Mail System |
US8527751B2 (en) | 2006-08-24 | 2013-09-03 | Privacydatasystems, Llc | Systems and methods for secure and certified electronic messaging |
US10313505B2 (en) | 2006-09-06 | 2019-06-04 | Apple Inc. | Portable multifunction device, method, and graphical user interface for configuring and displaying widgets |
US8842074B2 (en) | 2006-09-06 | 2014-09-23 | Apple Inc. | Portable electronic device performing similar operations for different gestures |
US8181036B1 (en) * | 2006-09-29 | 2012-05-15 | Symantec Corporation | Extrusion detection of obfuscated content |
US7788235B1 (en) | 2006-09-29 | 2010-08-31 | Symantec Corporation | Extrusion detection using taint analysis |
US8121875B2 (en) * | 2006-09-29 | 2012-02-21 | Morgan Stanley | Comparing taxonomies |
FR2906668A1 (en) | 2006-10-02 | 2008-04-04 | Alcatel Sa | Communication system for exchanging signaling message i.e. compliant, with session initiation protocol, has incoming signaling message routed to server corresponding to marker, when marker is included in incoming signaling message |
US7796309B2 (en) | 2006-11-14 | 2010-09-14 | Microsoft Corporation | Integrating analog markups with electronic documents |
US20080177782A1 (en) | 2007-01-10 | 2008-07-24 | Pado Metaware Ab | Method and system for facilitating the production of documents |
US8201086B2 (en) | 2007-01-18 | 2012-06-12 | International Business Machines Corporation | Spellchecking electronic documents |
WO2008147577A2 (en) | 2007-01-22 | 2008-12-04 | Spyrus, Inc. | Portable data encryption device with configurable security functionality and method for file encryption |
US8839100B1 (en) | 2007-01-26 | 2014-09-16 | The Mathworks, Inc. | Updating information related to data set changes |
US7895276B2 (en) | 2007-01-29 | 2011-02-22 | Litera Technology Llc | Method of managing metadata in attachments to e-mails in a network environment |
US20100287246A1 (en) | 2007-02-14 | 2010-11-11 | Thomas Klos | System for processing electronic mail messages with specially encoded addresses |
US20080209001A1 (en) | 2007-02-28 | 2008-08-28 | Kenneth James Boyle | Media approval method and apparatus |
US20080219495A1 (en) | 2007-03-09 | 2008-09-11 | Microsoft Corporation | Image Comparison |
JP2008243066A (en) | 2007-03-28 | 2008-10-09 | Canon Inc | Information processor and control method thereof |
US20110283177A1 (en) | 2007-04-05 | 2011-11-17 | Troy Gates | On-line document approval management system |
US7917493B2 (en) | 2007-04-19 | 2011-03-29 | Retrevo Inc. | Indexing and searching product identifiers |
US7844116B2 (en) | 2007-04-30 | 2010-11-30 | Xerox Corporation | Method for identifying images after cropping |
US8265382B2 (en) | 2007-05-29 | 2012-09-11 | Livescribe, Inc. | Electronic annotation of documents with preexisting content |
US8037004B2 (en) | 2007-06-11 | 2011-10-11 | Oracle International Corporation | Computer-implemented methods and systems for identifying and reporting deviations from standards and policies for contracts, agreements and other business documents |
WO2009009738A1 (en) | 2007-07-11 | 2009-01-15 | Pharmaceutical Product Development, Lp | Ubiquitous document routing enforcement |
US20090037520A1 (en) | 2007-07-30 | 2009-02-05 | Caterpillar Inc. | System and method for secure file transfer |
KR100945489B1 (en) | 2007-08-02 | 2010-03-09 | 삼성전자주식회사 | Method for performing a secure job using a touch screen and an office machine comprising the touch screen |
US20090064326A1 (en) | 2007-09-05 | 2009-03-05 | Gtb Technologies | Method and a system for advanced content security in computer networks |
US20090070128A1 (en) | 2007-09-11 | 2009-03-12 | Author Solutions Inc. | Community-based community project content creation system and method |
US7890872B2 (en) | 2007-10-03 | 2011-02-15 | International Business Machines Corporation | Method and system for reviewing a component requirements document and for recording approvals thereof |
CN201122265Y (en) | 2007-11-20 | 2008-09-24 | 鸿富锦精密工业(深圳)有限公司 | Computer cabinet |
US8326814B2 (en) | 2007-12-05 | 2012-12-04 | Box, Inc. | Web-based file management system and service |
US8233723B2 (en) | 2007-12-06 | 2012-07-31 | Ebay Inc. | Image categorization based on comparisons between images |
US8312023B2 (en) | 2007-12-21 | 2012-11-13 | Georgetown University | Automated forensic document signatures |
US9584343B2 (en) | 2008-01-03 | 2017-02-28 | Yahoo! Inc. | Presentation of organized personal and public data using communication mediums |
US8316442B2 (en) | 2008-01-15 | 2012-11-20 | Microsoft Corporation | Preventing secure data from leaving the network perimeter |
US8117225B1 (en) | 2008-01-18 | 2012-02-14 | Boadin Technology, LLC | Drill-down system, method, and computer program product for focusing a search |
US8019769B2 (en) | 2008-01-18 | 2011-09-13 | Litera Corp. | System and method for determining valid citation patterns in electronic documents |
US20090216843A1 (en) | 2008-02-26 | 2009-08-27 | Willner Barry E | System and method for collaborative email review |
US20090234863A1 (en) | 2008-03-12 | 2009-09-17 | Jeremy Evans | Method and apparatus for predictive downloading of attachments |
US8407784B2 (en) | 2008-03-19 | 2013-03-26 | Websense, Inc. | Method and system for protection against information stealing software |
US8539229B2 (en) | 2008-04-28 | 2013-09-17 | Novell, Inc. | Techniques for secure data management in a distributed environment |
US8196030B1 (en) | 2008-06-02 | 2012-06-05 | Pricewaterhousecoopers Llp | System and method for comparing and reviewing documents |
EP2144406B1 (en) | 2008-07-09 | 2012-11-21 | Research In Motion Limited | Delivery of email messages with repetitive attachments |
US9104682B2 (en) | 2008-07-15 | 2015-08-11 | International Business Machines Corporation | Method and apparatus to elegantly and automatically track emails and its attachments for enhanced user convenience |
US9245238B2 (en) | 2008-07-16 | 2016-01-26 | International Business Machines Corporation | Dynamic grouping of email recipients |
US8286171B2 (en) * | 2008-07-21 | 2012-10-09 | Workshare Technology, Inc. | Methods and systems to fingerprint textual information using word runs |
US8843566B2 (en) | 2008-08-20 | 2014-09-23 | First Data Corporation | Securing outbound mail |
US20100064004A1 (en) | 2008-09-10 | 2010-03-11 | International Business Machines Corporation | Synchronizing documents by designating a local server |
US8555080B2 (en) | 2008-09-11 | 2013-10-08 | Workshare Technology, Inc. | Methods and systems for protect agents using distributed lightweight fingerprints |
WO2010030871A2 (en) | 2008-09-11 | 2010-03-18 | Workshare Technology, Inc. | Methods and systems to implement fingerprint lookups across remote agents |
US8301994B1 (en) | 2008-09-12 | 2012-10-30 | Adobe Systems Incorporated | Synchronizing multiple hierarchal data structures |
US8307010B2 (en) | 2008-09-26 | 2012-11-06 | Microsoft Corporation | Data feature tracking through hierarchical node sets |
US9928242B2 (en) * | 2008-11-05 | 2018-03-27 | Oracle International Corporation | Managing the content of shared slide presentations |
US9092636B2 (en) | 2008-11-18 | 2015-07-28 | Workshare Technology, Inc. | Methods and systems for exact data match filtering |
US8406456B2 (en) | 2008-11-20 | 2013-03-26 | Workshare Technology, Inc. | Methods and systems for image fingerprinting |
US8392513B2 (en) | 2009-01-05 | 2013-03-05 | International Business Machines Corporation | Reducing email size by using a local archive of email components |
US10685177B2 (en) | 2009-01-07 | 2020-06-16 | Litera Corporation | System and method for comparing digital data in spreadsheets or database tables |
US9384295B2 (en) | 2009-01-22 | 2016-07-05 | Adobe Systems Incorporated | Method and apparatus for viewing collaborative documents |
US8471781B2 (en) | 2009-03-17 | 2013-06-25 | Litera Technologies, LLC | System and method for the auto-detection and presentation of pre-set configurations for multiple monitor layout display |
US8136031B2 (en) | 2009-03-17 | 2012-03-13 | Litera Technologies, LLC | Comparing the content of tables containing merged or split cells |
US20100251104A1 (en) | 2009-03-27 | 2010-09-30 | Litera Technology Llc. | System and method for reflowing content in a structured portable document format (pdf) file |
US8255571B2 (en) | 2009-06-30 | 2012-08-28 | Apple Inc. | Updating multiple computing devices |
US8473847B2 (en) * | 2009-07-27 | 2013-06-25 | Workshare Technology, Inc. | Methods and systems for comparing presentation slide decks |
CN101989335A (en) | 2009-07-31 | 2011-03-23 | 国际商业机器公司 | Processing method and system of email attachment |
US20110035655A1 (en) | 2009-08-04 | 2011-02-10 | Sap Ag | Generating Forms Using One or More Transformation Rules |
US8286085B1 (en) | 2009-10-04 | 2012-10-09 | Jason Adam Denise | Attachment suggestion technology |
FR2951560B1 (en) | 2009-10-19 | 2011-11-18 | Alcatel Lucent | METHOD FOR MANAGING PARTS ATTACHED TO AN E-MAIL IN AN ELECTRONIC MAIL APPLICATION |
US8732848B2 (en) | 2009-11-05 | 2014-05-20 | Kyocera Document Solutions Inc. | File-distribution apparatus and recording medium having file-distribution authorization program recorded therein |
KR101279442B1 (en) | 2009-11-24 | 2013-06-26 | 삼성전자주식회사 | Method of managing file in WevDAV embeded image forming apparatus and image forming system for performing thereof |
US8711419B2 (en) | 2009-12-15 | 2014-04-29 | Xerox Corporation | Preserving user applied markings made to a hardcopy original document |
US20110154180A1 (en) | 2009-12-17 | 2011-06-23 | Xerox Corporation | User-specific digital document annotations for collaborative review process |
US8381104B2 (en) | 2010-05-06 | 2013-02-19 | Litera Technologies, LLC | Systems and methods for providing context recognition |
US9356991B2 (en) | 2010-05-10 | 2016-05-31 | Litera Technology Llc | Systems and methods for a bidirectional multi-function communication module |
US8745091B2 (en) | 2010-05-18 | 2014-06-03 | Integro, Inc. | Electronic document classification |
US8448246B2 (en) | 2010-07-08 | 2013-05-21 | Raytheon Company | Protecting sensitive email |
US8719239B2 (en) | 2010-07-16 | 2014-05-06 | International Business Machines Corporation | Displaying changes to versioned files |
US8838962B2 (en) | 2010-09-24 | 2014-09-16 | Bryant Christopher Lee | Securing locally stored Web-based database data |
US8626852B2 (en) | 2010-10-29 | 2014-01-07 | International Business Machines Corporation | Email thread monitoring and automatic forwarding of related email messages |
US8732181B2 (en) | 2010-11-04 | 2014-05-20 | Litera Technology Llc | Systems and methods for the comparison of annotations within files |
CA2759612C (en) | 2010-11-23 | 2018-10-23 | Afore Solutions Inc. | Method and system for securing data |
US10025759B2 (en) | 2010-11-29 | 2018-07-17 | Workshare Technology, Inc. | Methods and systems for monitoring documents exchanged over email applications |
US10783326B2 (en) | 2013-03-14 | 2020-09-22 | Workshare, Ltd. | System for tracking changes in a collaborative document editing environment |
US8776190B1 (en) | 2010-11-29 | 2014-07-08 | Amazon Technologies, Inc. | Multifactor authentication for programmatic interfaces |
US20120173881A1 (en) | 2011-01-03 | 2012-07-05 | Patient Always First | Method & Apparatus for Remote Information Capture, Storage, and Retrieval |
US8843734B2 (en) | 2011-04-04 | 2014-09-23 | Nextlabs, Inc. | Protecting information using policies and encryption |
US20120260188A1 (en) | 2011-04-06 | 2012-10-11 | Microsoft Corporation | Potential communication recipient prediction |
US9613340B2 (en) | 2011-06-14 | 2017-04-04 | Workshare Ltd. | Method and system for shared document approval |
US9948676B2 (en) | 2013-07-25 | 2018-04-17 | Workshare, Ltd. | System and method for securing documents prior to transmission |
US10963584B2 (en) | 2011-06-08 | 2021-03-30 | Workshare Ltd. | Method and system for collaborative editing of a remotely stored document |
US20120317479A1 (en) | 2011-06-08 | 2012-12-13 | Workshare Ltd. | Method and system for shared document editing on a mobile device |
US9507874B2 (en) | 2011-06-30 | 2016-11-29 | International Business Machines Corporation | Validation of schema and schema conformance verification |
US9047258B2 (en) | 2011-09-01 | 2015-06-02 | Litera Technologies, LLC | Systems and methods for the comparison of selected text |
US8661558B2 (en) | 2011-09-20 | 2014-02-25 | Daon Holdings Limited | Methods and systems for increasing the security of electronic messages |
US20130254536A1 (en) | 2012-03-22 | 2013-09-26 | Workshare, Ltd. | Secure server side encryption for online file sharing and collaboration |
US20130227397A1 (en) | 2012-02-24 | 2013-08-29 | Microsoft Corporation | Forming an instrumented text source document for generating a live web page |
US9348802B2 (en) | 2012-03-19 | 2016-05-24 | Litéra Corporation | System and method for synchronizing bi-directional document management |
US20130290867A1 (en) | 2012-04-27 | 2013-10-31 | Litera Technologies, LLC | Systems and Methods For Providing Dynamic and Interactive Viewing and Control of Applications |
US9118613B2 (en) | 2012-10-18 | 2015-08-25 | Litéra Technologies, LLC | Systems and methods for creating and displaying an electronic communication digest |
US20140115436A1 (en) | 2012-10-22 | 2014-04-24 | Apple Inc. | Annotation migration |
US20140136497A1 (en) | 2012-11-13 | 2014-05-15 | Perforce Software, Inc. | System And Method To Compare And Merge Documents |
US11567907B2 (en) | 2013-03-14 | 2023-01-31 | Workshare, Ltd. | Method and system for comparing document versions encoded in a hierarchical representation |
-
2008
- 2008-07-21 US US12/177,043 patent/US8286171B2/en active Active
- 2008-09-11 US US12/209,096 patent/US9473512B2/en active Active
-
2009
- 2009-07-21 WO PCT/US2009/051313 patent/WO2010011691A2/en active Application Filing
-
2012
- 2012-09-14 US US13/620,364 patent/US20130074198A1/en not_active Abandoned
-
2016
- 2016-08-11 US US15/234,596 patent/US9614813B2/en active Active
-
2017
- 2017-02-21 US US15/437,569 patent/US20170200019A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060005247A1 (en) * | 2004-06-30 | 2006-01-05 | Microsoft Corporation | Method and system for detecting when an outgoing communication contains certain content |
US20060112120A1 (en) * | 2004-11-22 | 2006-05-25 | International Business Machines Corporation | Method, system, and computer program product for threading documents using body text analysis |
US20070005589A1 (en) * | 2005-07-01 | 2007-01-04 | Sreenivas Gollapudi | Method and apparatus for document clustering and document sketching |
US20080033913A1 (en) * | 2006-05-26 | 2008-02-07 | Winburn Michael L | Techniques for Preventing Insider Theft of Electronic Documents |
KR20080029602A (en) * | 2006-09-29 | 2008-04-03 | 한국전자통신연구원 | Method and apparatus for preventing confidential information leak |
Also Published As
Publication number | Publication date |
---|---|
WO2010011691A3 (en) | 2010-04-22 |
US20170200019A1 (en) | 2017-07-13 |
US20100064372A1 (en) | 2010-03-11 |
US8286171B2 (en) | 2012-10-09 |
US20160352688A1 (en) | 2016-12-01 |
US9614813B2 (en) | 2017-04-04 |
US9473512B2 (en) | 2016-10-18 |
US20130074198A1 (en) | 2013-03-21 |
US20100017850A1 (en) | 2010-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8286171B2 (en) | Methods and systems to fingerprint textual information using word runs | |
US11682226B2 (en) | Method and system for assessing similarity of documents | |
Neal et al. | Surveying stylometry techniques and applications | |
Chowdhury et al. | Plagiarism: Taxonomy, tools and detection techniques | |
US8484238B2 (en) | Automatically generating regular expressions for relaxed matching of text patterns | |
US20190236102A1 (en) | System and method for differential document analysis and storage | |
US20170308528A1 (en) | System and method for indexing electronic discovery data | |
US8938384B2 (en) | Language identification for documents containing multiple languages | |
Urvoy et al. | Tracking web spam with html style similarities | |
US9754076B2 (en) | Identifying errors in medical data | |
US9852122B2 (en) | Method of automated analysis of text documents | |
US11403465B2 (en) | Systems and methods for report processing | |
JP2010157178A (en) | Computer system for creating term dictionary with named entities or terminologies included in text data, and method and computer program therefor | |
US8750630B2 (en) | Hierarchical and index based watermarks represented as trees | |
Dai et al. | A new statistical formula for Chinese text segmentation incorporating contextual information | |
US20120096028A1 (en) | Information retrieving apparatus, information retrieving method, information retrieving program, and recording medium on which information retrieving program is recorded | |
US20170154029A1 (en) | System, method, and apparatus to normalize grammar of textual data | |
US20190042568A1 (en) | Method, apparatus, and computer-readable medium for determining a data domain associated with data | |
Malagi et al. | Content Modelling Intelligence System Based on Automatic Text Summarization | |
US20190332719A1 (en) | System and method for generating summary of research document | |
Nawab et al. | Comparing Medline citations using modified N-grams | |
Kulkarni et al. | Novel Approach to Detect Plagiarism in the Document | |
US10296990B2 (en) | Verifying compliance of a land parcel to an approved usage | |
Saeed et al. | A proposed approach for plagiarism detection in Article documents | |
JP2001034630A (en) | System and method for document base retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09800905 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09800905 Country of ref document: EP Kind code of ref document: A2 |