Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Page images | Web History | Sign in

Patents

  

United States Patent [w]

Wyard et al.

US006167398A [ii] Patent Number: [45] Date of Patent:

6,167,398 Dec. 26,2000

[54] INFORMATION RETRIEVAL SYSTEM AND METHOD THAT GENERATES WEIGHTED COMPARISON RESULTS TO ANALYZE THE DEGREE OF DISSIMILARITY BETWEEN A REFERENCE CORPUS AND A CANDIDATE DOCUMENT

[75] Inventors: Peter J Wyard, Woodbridge; Tony G Rose, Guildford, both of United Kingdom

[73] Assignee: British Telecommunications public limited company, London, United Kingdom

[21] Appl. No.: 09/068,452

[22] PCT Filed: Jan. 30, 1998

[86] PCT No.: PCT/GB98/00294

§ 371 Date: May 13, 1998

§ 102(e) Date: May 13, 1998 [87] PCT Pub. No.: WO98/34180

PCT Pub. Date: Aug. 6, 1998 [30] Foreign Application Priority Data Jan. 30, 1997 [GB] United Kingdom 9701866

[51] Int. CI.7 G06F 17/30

[52] U.S. CI 707/5; 707/2; 707/3; 707/4

[58] Field of Search 707/5, 10, 2, 3,

707/4

[56] References Cited

U.S. PATENT DOCUMENTS

5,625.767 4/1997 Bartell et al 345/440

5,724,571 3/1998 Woods 707/5

5,873,076 2/1999 Barr et al 707/3

5,907,839 5/1999 Roth 707/5

5,937,422 8/1999 Nelson et al 707/531

FOREIGN PATENT DOCUMENTS

0687987 Al 12/1995 European Pat. Off. . WO 92/04681 3/1992 WIPO .

WO 96/32686 10/1996 WIPO .

OTHER PUBLICATIONS

W. Bruce Croft, Intelligent Internet Services Effective Text Retrieval Based on Combining Evidence from the Corpus and Users, vol. 10 issue 6 IEEE electronic library online, pp.59-63, Dec. 1995.

Besancon et al., Textual Similarities Based on a Distributional Approach, IEEE electronic library online, p. 180-184, Sep. 1999.

Chapter 4 of the book "Introduction to Modern Information Retrieval" by G. Saltan, published by McGraw Hill, 1983.

Dunning, "Accurate Methods for the Statistics of Surprise and Coincidence", Computational Linguistics, vol. 19, No. 1, 1993.

Katz, "Estimation of Probabilities from Sparse Data", IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-35, 1987.

(List continued on next page.)

Primary Examiner—-John Breene

Assistant Examiner—Greta L. Robinson

Attorney, Agent, or Firm—Nixon & Vanderhye PC.

[blocks in formation]

An internet information agent accepts a reference document, performs an analysis upon it in accordance with metrics defined by its analysis algorithm and obtains respective lists (word, character-level n-gram, word-level n-gram), derives weights corresponding to the metrics, applies the metrics to a candidate document and obtains respective returned values, applies the weights to the returned values and sums the results to obtain a Document Dissimilarity (DD) value. This DD is compared with a Dissimilarity Threshold (DT) and the candidate document is stored if the DD is less than the DT. A user can apply relevance values to the search results and the agent modifies the weights accordingly. The agent can be used to improve a language model for use in speech recognition applications and the like.

18 Claims, 4 Drawing Sheets

USER CLICKS ON AGENT BUTTON

USER ENTERS URL OF REFERENCE DOCUMENT

AGENT RETRIEVES REFERENCE DOCUMENT

40 AGENT DERIVES DOCUMENT DISSIMILARITY

AGENT COMPARES DOCUMENT DISSIMILARITY
WITH DISSIMILARITY THRESHOLD

AGENT WRITES CANDIDATE DOCUMENT
TO RETAINED TEXT STORE
IF DD LESS THAN DT

Page 2

OTHER PUBLICATIONS

Jelinek, "Self-Organised Language Modelling for Speech Recognition", Readings in Speech Recognition, edited by A. Waibel and K. Lee, published by Morgan Kaufmann, 1990. Pearce et al, Generating a Dynamic Hypertext Environment with n-gram Analysis, Proceedings of the International

Conference on Information and Knowledge Management CIKM, Nov. 1, 1993, pp. 148-153, XP000577412.

Wong et al, "Implementations of Partial Document Ranking Using Inverted Files", Information Processing & Management (Incorporating Information Technology), vol. 29, No. 5, Sep. 1993, pp. 647-669, XP002035616.

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][table][merged small][merged small][merged small][merged small][graphic][merged small][merged small][table][merged small][subsumed]
[blocks in formation]
« PreviousContinue »