WO1997033249A1 - Method and device for handwritten character recognition - Google Patents
Method and device for handwritten character recognition Download PDFInfo
- Publication number
- WO1997033249A1 WO1997033249A1 PCT/US1997/004349 US9704349W WO9733249A1 WO 1997033249 A1 WO1997033249 A1 WO 1997033249A1 US 9704349 W US9704349 W US 9704349W WO 9733249 A1 WO9733249 A1 WO 9733249A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- characters
- character
- combined
- possible characters
- value
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- This invention relates generally to handwriting recognition by a character recognizer, and more particularly to improving recognition of handwritten characters using a post ⁇ processing method and device.
- Conventional character recognizers have approximately a 70 to 80 percent accuracy rate when attempting to correctly recognize handwritten characters from a digitizing tablet or other input device, yielding a 15 to 30 percent error rate. This accuracy rate is not good enough for the average user to feel confident in the ability of the recognizer.
- character recognizers can be useful and valuable. For instance, character recognizers can be useful in conferences or seminars where a user does not bring in a keyboard but desires to electronically take notes. A character recognizer would then be used. If the character recognizer does not have a fairly high rate of accuracy, the notes taken during the seminar may become misleading.
- character recognizers may be valuable in hospitals if the character recognizer has a high rate of accuracy.
- Hand ⁇ held character recognizers would allow hospital personnel to checks patients and enter by hand reports which may be life saving. Without a high recognition rate, lives may be endangered.
- One very useful application for character recognizers is inputting Chinese characters for electronic processing and storage. Chinese characters do not lend themselves well to keyboard entry making word processing in the Chinese language difficult. Chinese characters are complex and changing a small portion of the character may entirely change the meaning of the character or word. A high rate of accuracy is necessary for Chinese character recognition. Unfortunately, conventional character recognizers and recognition processes have not achieved the high accuracy necessary for these varying application.
- a method comprising the steps of: choosing a number of template characters from a template character set which are likely to resemble a handwritten character thereby providing a set of possible characters, each of the possible characters having a value representing a degree of similarity with the handwritten character; and processing the possible characters according to a language model to determine which of the possible characters most resembles the handwritten character.
- the step of processing the possible characters according to a language model preferably includes: combining each of the possible characters with a surrounding character to form combined characters; assigning a combined value to each of the combined characters where the combined value represents a probability that the surrounding character would be in combination with a respective one of the possible characters; and resorting the possible characters.
- it includes comparing each of the possible characters with a surrounding character to determine a probability that the surrounding character would be in combination with a respective one of the possible characters; and determining from the probability for each of the possible characters which of the possible characters most resembles the handwritten character.
- the value for each of the possible characters and the combined value of the combined characters may be weighted to determine a weighted value for each of the possible characters; and these may be ordered for each of the possible characters to determine a sequential order for resorting the possible characters.
- a recognizer comprising: a character recognizer coupled to a handwriting input device, to choose a number of template characters from a template character set which are likely to resemble a handwritten character (possible characters), each of the possible characters having a value representing a degree of similarity with the handwritten character; a post-processor coupled to the character recognizer to process the possible characters according to a language model to determine which of the possible characters most resembles the handwritten character; and a display device coupled to the post-processor to receive the one of the possible characters most resembling the handwritten character.
- FIG. 1 is a block diagram illustrating a preferred embodiment of the present invention.
- FIG. 2 is a flow chart illustrating a method of performing the present invention.
- FIG. 3 is a flow chart illustrating the method of performing the present invention according to the preferred embodiment.
- FIG. 4 shows an example of the operation of a language modeling post-processor according to the present invention.
- FIG. 1 illustrates, with reference also to FIG. 2, a device and method, according to the present invention, for improving the accuracy of handwritten character recognition.
- Handwritten character recognizing devices such as character recognizing device 100, generally include some sort of handwriting input device or tablet 110 allowing a user to enter handwritten characters to character recognizing device 100. It will be noted at this point that character recognizing devices may also receive input through devices other than through tablets. For instance, handwritten characters may be input to character recognizing device 100 via facsimile or any other media in addition to tablet 1 10.
- handwritten characters are input from tablet 110 to a character recognizer 120 (Step 200 of FIG. 2).
- the character recognizer 120 chooses characters from a predetermined template character set 125 (step 210) for comparison with the handwritten character.
- the predetermined template characters of template character set 125 are the characters used in the language for which character recognizing device 100 is designed. For instance, if English handwritten characters are being input to character recognizing device 100, template character set 125 will contain information representing English characters in some form, such as longhand, print, or a combination of styles. If, for instance, recognizer 100 is designed for Chinese character input, template character set 125 will contain information representing Chinese characters in such styles (cursive or printed) as character recognizing device 100 is designed for.
- Character recognizer 120 compares each input handwritten character to the characters stored in template character set 125 and chooses a number of the characters, or possible characters, which most closely resemble the input character. In a preferred embodiment, character recognizer
- character recognizer 120 chooses 10 characters from the template character set 125. To each of these number of possible characters (10 in the preferred embodiment), character recognizer 120 assigns a score (or value) that represents the degree of similarity between the respective possible character and the input character (step 220 of FIG. 2). Character recognizer 120 then prioritizes the number of possible characters according to their respective scores (step 240). Character recognizer 120 prioritizes the number of possible characters according to their respective scores ordered into a chronological order with the possible character having the score indicative of the nearest similarity ordered at the top of the list.
- handwritten characters which are processed simply by a character recognizer have approximately a 15 to 30 percent error rate when choosing the top prioritized possible character.
- the probability that the top prioritized possible character chosen by character recognizer 120 is actually the same as the handwritten input character is about 80 to 85 percent.
- There is a 92 to 96 percent probability that the actual handwritten input character is one of the number of possible characters chosen by character recognizer 120 when the total number of possible characters is 10 pursuant to the preferred embodiment. This accuracy is nearly the same as the degree of accuracy most people have when reading handwritten characters, which accuracy is about 95 to 97 percent.
- the accuracy of the 10 chosen possible characters is capitalized upon through the method described below to increase the probability that the character chosen as the top prioritized possible character is the same as the handwritten character.
- the present invention contemplates further analyzing and processing the number of possible characters generated by character recognizer 120 to improve recognition accuracy.
- the additional analysis and processing (post-processing) focuses on the 10 possible characters.
- character recognizer 120 outputs the list of 10 possible characters to a post- processor 130 (step 250).
- Post-processor 130 processes the 10 possible characters according to a language model to select which of the 10 possible characters is a best-fit character (step 260).
- the language model post-processing chooses one of the 10 possible characters for output. This yields approximately a 90 to 92 percent probability that the character which is output, or best-fit character, is the same as the input handwritten character.
- Language modeling is a process where each possible character processed is compared with a surrounding character to determine the probability that the possible character could be properly used in combination with such surrounding characters in the language being used. This process will be described in detail later.
- post-processor 130 After post-processor 130 has chosen a best-fit character from the 10 possible characters, post-processor 130 outputs the best-fit character (step 270). In the preferred embodiment shown in FIG. 1, the best-fit character is output to digitiizing display 1 10 and displayed to the user.
- the flow chart of FIG. 3 shows a preferred embodiment of the post-processing method and is described in conjunction with the preferred embodiment of FIG. 1.
- the top prioritized possible character is chosen as the best-fit character (step 325) and output to the digitizing display 110 of the preferred embodiment (step 380). Choosing the top prioritized possible character as the best-fit character simply means that no further processing is chosen and character recognizer 120 operates in a conventional manner with the output (top prioritized possible character) sent directly to the digitizing display 110.
- language model processor 140 which includes combiner 142, scoring device 144, and language model library 145.
- Language model processor 140 compares each of the possible characters from character recognizer 120 with surrounding characters to determine the probability that the possible character could be in combination with the surrounding characters.
- the surrounding characters are usually characters which have been already been recognized which are stored by the computing device (Surrounding Character 141), but may also be numbers, indications of the beginning of a sentence or word, words from a different language (such as English company names used while writing Chinese characters), etc.
- the surrounding characters may also be characters which have not been recognized, such as a character subsequent in sequence to the handwritten character currently being recognized.
- FIG. 4 illustrates the language model post-processing method using an example of two letters.
- two letters are assumed to have been input as handwritten characters.
- the first character in slot b will be assumed to have been processed previously and correctly by character recognizer 100 and confirmed as the letter "h”.
- the second character in slot a is the character to be recognized.
- character recognizer 120 generates a number of possible characters, which for the preferred embodiment is 10 possible characters, listed in FIG.
- Scoring device 144 obtains from language model library 145, for each of the combinations, a predetermined probability (combined score) that the adjacent character in slot b, "h", will be combined with the number of possible characters ai through a n .
- each of the combinations are assigned their respective combined score (step 340 and column 420). For instance, if character recognizer 120 determined ai to be "a”, and the letter in slot b was already determined to be "h", the probability that these two letters would be combined in sequence would be very high since "h” and "a” are combined in sequence in many different words.
- the combined score representing this probability found in language model library 145 would be high and an appropriate combined score would be assigned to "ha”.
- resorter 150 obtains the combined scores from scoring device 144, generates an order from the combined scores, and resorts the number of possible characters based upon that order. Specifically, a weighting element 152 of resorter 150 weights each of the combined scores from the scoring device 144 with the score of its corresponding number of possible characters to determine a weighted score for each of the number of possible characters (step 350). The weighting is calculated for each of the number of possible characters by: (i) multiplying the score (see previous discussion with respect to step 220 of FIG.
- ⁇ c R and LM combined equal 1. Further, at optimum values for the weighting factors, ⁇ c R is greater than ⁇ LM. and ⁇ M is equal to 0.33. A user may choose a value for ⁇ M which is greater than or less than the optimum value, depending upon the desired output, and the choice may be input manually into weighting element 152.
- Reorderer 154 of resorter 150 receives the weighted scores from weighting element 152 and orders the weighted scores in chronological order. In the preferred embodiment, the weighted scores are ordered from highest to lowest. This determined order is used to resort the number of possible characters. Reorderer 154 then resorts the number of possible characters according to the order it just determined, and chooses the best-fit character from the reordered number of possible characters (steps 360 and 370). The best-fit character is then output (step 380).
- Post-processing of the output of character recognizers is necessary in order to improve the rate of accuracy of selecting a single possible character representing an input handwritten character. Without the additional accuracy of post-processing, character recognizers will probably not become commercially viable.
- the probability of selecting a single possible character which is the same as a handwritten character increases from roughly 84% to approximately 90 to 92 percent. This recognition accuracy brings handwriting recognition into an acceptable range for consumer use.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU22168/97A AU726852B2 (en) | 1996-03-08 | 1997-03-06 | Method and device for handwritten character recognition |
EP97915155A EP0896704A1 (en) | 1996-03-08 | 1997-03-06 | Method and device for handwritten character recognition |
IL12564897A IL125648A0 (en) | 1996-03-08 | 1997-03-06 | A method and device for handwritten character recognition |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US61484696A | 1996-03-08 | 1996-03-08 | |
US08/614,846 | 1996-03-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997033249A1 true WO1997033249A1 (en) | 1997-09-12 |
Family
ID=24462951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1997/004349 WO1997033249A1 (en) | 1996-03-08 | 1997-03-06 | Method and device for handwritten character recognition |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP0896704A1 (en) |
CN (1) | CN1181827A (en) |
AU (1) | AU726852B2 (en) |
CA (1) | CA2247359A1 (en) |
IL (1) | IL125648A0 (en) |
WO (1) | WO1997033249A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003034326A1 (en) * | 2001-10-15 | 2003-04-24 | Silverbrook Research Pty Ltd | Character string identification |
WO2005106771A1 (en) * | 2004-05-04 | 2005-11-10 | Nokia Corporation | Apparatus and method for handwriting recognition |
US7873217B2 (en) | 2003-02-26 | 2011-01-18 | Silverbrook Research Pty Ltd | System for line extraction in digital ink |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100356392C (en) * | 2005-08-18 | 2007-12-19 | 北大方正集团有限公司 | Post-processing approach of character recognition |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5131053A (en) * | 1988-08-10 | 1992-07-14 | Caere Corporation | Optical character recognition method and apparatus |
US5151950A (en) * | 1990-10-31 | 1992-09-29 | Go Corporation | Method for recognizing handwritten characters using shape and context analysis |
US5343537A (en) * | 1991-10-31 | 1994-08-30 | International Business Machines Corporation | Statistical mixture approach to automatic handwriting recognition |
US5392363A (en) * | 1992-11-13 | 1995-02-21 | International Business Machines Corporation | On-line connected handwritten word recognition by a probabilistic method |
US5465309A (en) * | 1993-12-10 | 1995-11-07 | International Business Machines Corporation | Method of and apparatus for character recognition through related spelling heuristics |
US5467407A (en) * | 1991-06-07 | 1995-11-14 | Paragraph International | Method and apparatus for recognizing cursive writing from sequential input information |
US5621809A (en) * | 1992-06-09 | 1997-04-15 | International Business Machines Corporation | Computer program product for automatic recognition of a consistent message using multiple complimentary sources of information |
-
1997
- 1997-03-06 WO PCT/US1997/004349 patent/WO1997033249A1/en not_active Application Discontinuation
- 1997-03-06 AU AU22168/97A patent/AU726852B2/en not_active Ceased
- 1997-03-06 CA CA002247359A patent/CA2247359A1/en not_active Abandoned
- 1997-03-06 EP EP97915155A patent/EP0896704A1/en not_active Withdrawn
- 1997-03-06 IL IL12564897A patent/IL125648A0/en unknown
- 1997-03-06 CN CN97190161A patent/CN1181827A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5131053A (en) * | 1988-08-10 | 1992-07-14 | Caere Corporation | Optical character recognition method and apparatus |
US5436983A (en) * | 1988-08-10 | 1995-07-25 | Caere Corporation | Optical character recognition method and apparatus |
US5151950A (en) * | 1990-10-31 | 1992-09-29 | Go Corporation | Method for recognizing handwritten characters using shape and context analysis |
US5467407A (en) * | 1991-06-07 | 1995-11-14 | Paragraph International | Method and apparatus for recognizing cursive writing from sequential input information |
US5343537A (en) * | 1991-10-31 | 1994-08-30 | International Business Machines Corporation | Statistical mixture approach to automatic handwriting recognition |
US5621809A (en) * | 1992-06-09 | 1997-04-15 | International Business Machines Corporation | Computer program product for automatic recognition of a consistent message using multiple complimentary sources of information |
US5392363A (en) * | 1992-11-13 | 1995-02-21 | International Business Machines Corporation | On-line connected handwritten word recognition by a probabilistic method |
US5465309A (en) * | 1993-12-10 | 1995-11-07 | International Business Machines Corporation | Method of and apparatus for character recognition through related spelling heuristics |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003034326A1 (en) * | 2001-10-15 | 2003-04-24 | Silverbrook Research Pty Ltd | Character string identification |
AU2002333063B2 (en) * | 2001-10-15 | 2007-09-06 | Silverbrook Research Pty Ltd | Character string identification |
US7444021B2 (en) | 2001-10-15 | 2008-10-28 | Silverbrook Research Pty Ltd | Character string identification |
US7532758B2 (en) | 2001-10-15 | 2009-05-12 | Silverbrook Research Pty Ltd | Method and apparatus for generating handwriting recognition template |
US7756336B2 (en) | 2001-10-15 | 2010-07-13 | Silverbrook Research Pty Ltd | Processing system for identifying a string formed from a number of hand-written characters |
US8000531B2 (en) | 2001-10-15 | 2011-08-16 | Silverbrook Research Pty Ltd | Classifying a string formed from a known number of hand-written characters |
US8285048B2 (en) | 2001-10-15 | 2012-10-09 | Silverbrook Research Pty Ltd | Classifying a string formed from hand-written characters |
US7873217B2 (en) | 2003-02-26 | 2011-01-18 | Silverbrook Research Pty Ltd | System for line extraction in digital ink |
WO2005106771A1 (en) * | 2004-05-04 | 2005-11-10 | Nokia Corporation | Apparatus and method for handwriting recognition |
KR100858545B1 (en) * | 2004-05-04 | 2008-09-12 | 노키아 코포레이션 | Apparatus and method for handwriting recognition |
US8411958B2 (en) | 2004-05-04 | 2013-04-02 | Nokia Corporation | Apparatus and method for handwriting recognition |
Also Published As
Publication number | Publication date |
---|---|
CN1181827A (en) | 1998-05-13 |
CA2247359A1 (en) | 1997-09-12 |
IL125648A0 (en) | 1999-04-11 |
EP0896704A1 (en) | 1999-02-17 |
AU726852B2 (en) | 2000-11-23 |
AU2216897A (en) | 1997-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7129932B1 (en) | Keyboard for interacting on small devices | |
US6173253B1 (en) | Sentence processing apparatus and method thereof,utilizing dictionaries to interpolate elliptic characters or symbols | |
US7164367B2 (en) | Component-based, adaptive stroke-order system | |
US6513005B1 (en) | Method for correcting error characters in results of speech recognition and speech recognition system using the same | |
US20050089226A1 (en) | Apparatus and method for letter recognition | |
JPH07105316A (en) | Handwritten-symbol recognition apparatus | |
WO2006044207A2 (en) | An electronic device and method for visual text interpretation | |
KR20070043673A (en) | System and its method for inputting character by predicting character sequence of user's next input | |
US6799914B2 (en) | Arabic-persian alphabeth input apparatus | |
EP0797157A2 (en) | Machine interpreter | |
AU726852B2 (en) | Method and device for handwritten character recognition | |
US6320985B1 (en) | Apparatus and method for augmenting data in handwriting recognition system | |
JPH08263587A (en) | Method and device for document input | |
TW409213B (en) | Method and device for handwritten character recognition | |
El-Nasan et al. | Ink-link [character recognition] | |
JPH02112058A (en) | Character recognition input system | |
JP3022790B2 (en) | Handwritten character input device | |
US20030110451A1 (en) | Practical chinese classification input method | |
JP2990734B2 (en) | Character recognition device output control method for character recognition device | |
JPH10320107A (en) | Handwritten character input device having handwritten character recognizing function | |
JP2639314B2 (en) | Character recognition method | |
JPS61131159A (en) | Erroneously read character correcting device | |
TW511039B (en) | Apparatus for encoding and defining symbols and, assembling text in ideographic languages | |
JP3157995B2 (en) | Character processor | |
JPH01166187A (en) | Method for recognizing character |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 97190161.9 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG UZ VN YU |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) |
Free format text: CN |
|
ENP | Entry into the national phase |
Ref document number: 2247359 Country of ref document: CA Ref document number: 2247359 Country of ref document: CA Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1997915155 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref document number: 97532057 Country of ref document: JP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 1997915155 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1997915155 Country of ref document: EP |