US7782339B1 - Method and apparatus for generating masks for a multi-layer image decomposition - Google Patents
Method and apparatus for generating masks for a multi-layer image decomposition Download PDFInfo
- Publication number
- US7782339B1 US7782339B1 US11/173,303 US17330305A US7782339B1 US 7782339 B1 US7782339 B1 US 7782339B1 US 17330305 A US17330305 A US 17330305A US 7782339 B1 US7782339 B1 US 7782339B1
- Authority
- US
- United States
- Prior art keywords
- pixel
- digital image
- map
- pixels
- identified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 title claims abstract description 133
- 238000000354 decomposition reaction Methods 0.000 title description 22
- 230000005540 biological transmission Effects 0.000 claims abstract description 13
- 238000007906 compression Methods 0.000 claims description 72
- 230000006835 compression Effects 0.000 claims description 70
- 229920006395 saturated elastomer Polymers 0.000 claims description 40
- 238000001514 detection method Methods 0.000 claims description 24
- 239000003086 colorant Substances 0.000 claims description 14
- 238000009825 accumulation Methods 0.000 claims 8
- 230000001131 transforming effect Effects 0.000 claims 8
- 230000008569 process Effects 0.000 description 13
- 230000008901 benefit Effects 0.000 description 7
- 230000006837 decompression Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000001914 filtration Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000012015 optical character recognition Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/187—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
Definitions
- the present invention relates to the field of image analysis. More particularly, the invention relates to segmenting an image into layers that distinguish between text, objects, background and pictures. Object layers include graphics objects while picture layers include textural images such as photographs.
- the image includes artifacts such as boxes or borders that are common to computer display images and lend themselves to efficient compression.
- the images include other characteristics associated with computer display images but not associated with motion video or other natural images such as areas of exactly matched color levels and accurately aligned artifacts which also lend themselves to efficient compression.
- Text is a common image type. It is desirable to identify text so it can be compressed separately to allow lossless reproduction. Once text elements are identified and separated, they can be compressed efficiently. One compression technique would be to cache the shape and color of the text parts so they can be reused on different images or parts of the image.
- a second type of image that is desirable for lossless reproduction is the background artifact type.
- These artifacts include window backgrounds and other large geometry areas with few colors.
- Background image types may be coded as a set of graphic commands, which allows for highly efficient compression in addition to lossless reproduction.
- a background frequently remains constant in an otherwise continuously changing display.
- a remote display can use historic background information rather than requiring the retransmission of static information. This improves the frame-to-frame compression of the display.
- a third image type is the picture type.
- Pictures or natural images that have texture or a large number of colors may be compressed using lossy compression algorithms with little or no noticeable difference. By using a lossy algorithm, pictures can be compressed efficiently.
- a fourth image type is the object type that includes areas of high contrast such as graphics, icons and text or other low contrast artifacts surrounded by picture areas.
- Object types may be encoded using lossless or high quality lossy compression methods. Object types may also be cached and reused.
- the identification of different types of objects within an image for the purposes of image or video compression is standard practice. Different existing algorithms define “an object” in different ways, depending on the method in which the object is handled. However, previous definitions for an “object” still fail to define a group of pixels in such a way as to more effectively enable compression.
- Accuracy of image type identification affects both the quality of the decompressed image as well as the compression ratio. While it is important to maximize the compression in this application, it is more important to ensure the areas of text and graphics and have been correctly identified so they are reproduced accurately.
- Layering an image into multiple planes of different image types is a technique that is common use.
- An mage format based on this is specified in “Mixed Raster Content (MRC),” Draft ITU-T Recommendation T.44, International Telecommunication Union, Study Group 8 (Contribution (10/97).
- the recommended model defines the image as three planes: a text or graphics plane, a background plane containing continuous tone images and a mask plane. While the recommendation identifies the interchange format, it does not provide a method for generating the mask.
- Sato et al describe a text mask that is generated by filtering the image.
- the image is filtered using four directional filters that highlight the shape contrast of a text image.
- the results of the four filtered images are summed and quantified to generate a text image or mask. While filtering an image in multiple directions and summing the results produces a reasonable mask, it is computationally intensive and does not take advantage of the characteristics of text in a computer display image. The resulting mask can lead to missed and false indications that reduce the compression and image quality.
- a method for decomposing an image is disclosed by Li et al., “Text and Picture Segmentation by the Distribution Analysis of Wavelet Coefficients” IEEE/ICIP Chicago, Ill. Proceedings, October 1999. This method segments the display into blocks of text, pictures or backgrounds using histograms of wavelet coefficients. While this identifies the image layers, plus the mask layers, it does it at a block resolution. Blocks of multiple pixels cannot create the proper boundaries between these image types. As a result, this method does not provide sufficient compression or image quality.
- none of the existing methods decompose a computer display image for compression and accurate reproduction. None of the methods identify text, objects, background and picture images separately and at a pixel resolution. Existing methods that provide reasonable accuracy of text identification are too computationally intensive for practical real-time decomposition. None of the methods take advantage of the image characteristics and artifacts of a computer display to simplify and improve the image decomposition. None of the methods decompose the image by identifying backgrounds graphic commands that can compress well. None of the methods identify text on a background surface, which is highly repetitious and lends itself to efficient compression.
- the present invention relates to the preparation of the computer display image for efficient compression so that it may be transmitted across the network and accurately reproduced at the remote computer.
- Embodiments of the present invention decompose a computer display image into different layer types and associated masks based on the unique nature of the image. These types include text, objects, background and pictures.
- a set of image masks is used to uniquely identify different layer types within an image, where each layer type includes none, some or all of the pixels of the original image.
- Each layer of the image is processed prior to transmission (i.e. compressed) using a method appropriate for the characteristics of that layer. For example, a picture layer may be compressed using a different method to the method used to compress a text layer.
- Compression techniques such as Run-Length Encoding (RLE) Lempel Ziv Walsh (LZW) encoding, Joint Photographic Experts Group (JPEG) compression, and Motion Picture Experts Group (MPEG) may be used.
- RLE Run-Length Encoding
- LZW Lempel Ziv Walsh
- JPEG Joint Photographic Experts Group
- MPEG Motion Picture Experts Group
- data may be compressed on a per frame basis (e.g. LZW, JPEG), or across frame updates (E.G. MPEG).
- every layer is assigned a single-bit pixel mask of the same dimensions of the original image. If a pixel from the original image is represented on a layer, the corresponding bit in the pixel mask for that layer is set.
- each mask is implemented as an array.
- the present invention decomposes the image into four mutually exclusive layers, so therefore the entire mask set may be described using a two-dimensional array of the same dimension as the original image with each array element defined as a two bit value.
- Each two-bit value describes four different states and each state identifies the presence of a pixel on one of the four layers of the image.
- each mask may be implemented as two-dimensional array of single-bit elements, with the mask having the same dimensions as the computer display image.
- the four masks and the compressed 16 ⁇ 16 pixel portion of the computer display image form a discrete packet that is transmitted to a remote client in a data stream as illustrated below:
- the objective of the present invention is to prepare a digital computer display image for efficient compression and subsequent reproduction by taking advantage of the attributes of a computer image and the digital nature of the image source.
- the present invention decomposes the image into multiple layer types and generates masks for each type.
- the image is decomposed into separate text, object, background and picture types, where each type has an associated image mask for that type.
- the present invention identifies background areas and generates a background mask, identifies text and object areas and generates text and object masks, expands and optimizes the identified background areas and masks, and expands the identified text and object areas and their masks.
- the process of background, text and object optimization is repeated until satisfactory decomposition is accomplished.
- a picture mask is generated and optimized to remove small isolated areas of one image type.
- the present invention applies a range of filters to an image to identify text within the image.
- horizontal, vertical and diagonal filters are applied to the image to identify areas of high contrast.
- each pixel that meets a defined contrast threshold criterion for possible identification is temporarily marked as a candidate for a text mask. These marks are accumulated for each pixel over a small area to identify text centers and positively identify pixels that exceed a predefined text density threshold as text.
- the present invention identifies object types.
- two types of objects are identified.
- the first type of objects includes small, low-contrast regions of the image that are completely surrounded by background.
- the second type of objects includes high-contrast regions that are not completely surrounded by background.
- the present invention identifies background areas as related extensions of a basic shape within defined color limits that can be described using graphic commands.
- horizontal, vertical and diagonal lines of a matched color are identified as background lines; consecutive lines of the same color are identified as background rectangles or shapes; consecutive lines of different colors are identified as gradient background and large areas of the same color are identified as the default background, which may or may not be rectangular in shape.
- the present invention identifies background features of an image that can be described in terms of highly-compressible graphics descriptors.
- the present invention identifies pictures in an image as picture types.
- areas of the image that are not text, background or objects are identified as picture types.
- the present invention improves the compressibility of an image.
- small areas are reclassified as larger surrounding areas.
- the present invention reclassifies background types surrounded by text types as text types, background types surrounded by picture types as picture types, and picture types surrounded by objects as objects types.
- the present invention improves the compressibility of an image by classifying high-contrast areas of an image such as text, graphics, or icons as either text or object types.
- these same general text areas that are at least partially surrounded by pictures are reclassified as unbounded objects while those surrounded wholly by background remain classified as text.
- the present invention provides real-time decomposition of text, objects, background and pictures at a pixel level that is critical for the perception-free compression and reproduction of a computer display image. Furthermore, the present invention decomposes an image based on expected digital computer display image artifacts, resulting in optimized decomposition of the image.
- FIG. 1 illustrates a representation of a grayscale image containing image types that might be found on a computer display.
- FIGS. 2A-2E illustrate the decomposition of an exemplary image into five different image layers and masks.
- FIG. 3 illustrates in flow chart form the image decomposition method.
- FIG. 4 illustrates a filter used to identify background images.
- FIG. 5 illustrates a subsection of the background image mask identified by the first pass of the background filter.
- FIG. 6 illustrates in flow chart form the text identification method.
- FIG. 7 illustrates a saturated pixel text filter
- FIG. 8 illustrates a 3-pixel pattern filter
- FIG. 9 illustrates a 4-pixel pattern filter.
- FIG. 10 illustrates a subsection of the text image mask identified by the saturated pixel filter and pixel pattern filters.
- FIG. 11 illustrates a subsection of the text image mask after it has been filtered for text mark density and the text mask has been expanded.
- FIG. 12 illustrates a background filter used to expand the background mask by identifying background pixels enclosed by text marked pixels.
- FIG. 13 illustrates the background image detected by the modified background filter.
- FIG. 14 illustrates the generation of the clean text mask by removing the expanded text markings that conflict with the background mask.
- FIG. 15 illustrates a non-text object surrounded by background pixels that is detected as a low-contrast object.
- FIG. 16 illustrates the generation of an object mask containing the non-text artifacts.
- FIG. 17 illustrates the generation of the picture mask that is generated by identifying all of the pixels that are neither text nor background.
- FIG. 18 illustrates the separation of the text mask from the high-contrast object mask based on the different background characteristics of text and high-contrast objects.
- FIG. 19 illustrates a variation on the object mask set that combines the different object types onto a single mask.
- FIG. 1 represents an example of a grayscale or color image 100 that might be seen in a section of a computer display.
- the section is composed of text and other artifacts on a variety of different backgrounds.
- the underlying background for the entire section is a picture background as represented by area 106 .
- area 106 might be a photograph.
- Area 106 is overlaid with text 105 “Tomorrow” and two styles of boxes, each with additional text overlays.
- Box 104 represents a vertical gradient in color tone. Box 104 is overlaid with black text “Open” 103 .
- Box 101 has a single gray tone and is overlaid by anti-aliased text “Today” 102 and small square object 107 of a different gray tone.
- each mask is a map of one-bit pixels of the image where the bit value of 1 positively identifies a pixel as an element of that mask.
- the present invention distinguishes between five types of images and generates a mask for each of the five image types.
- the five types of images include background, text, picture, type 1 object, and type 2 object.
- the object image type depends on the background characteristics for the objects.
- FIGS. 2A through 2E show the image decomposed into the five image types described, each type associated with its own mask.
- Text image type 112 shown in FIG. 2A is defined by any small high-contrast area that is surrounded by background image type 110 shown in FIG. 2B .
- Text image types require accurate or lossless reproduction. Given that text is often small in size and spatially repeated, text image elements compress well.
- the text layer of an image is identified by text mask 113 shown in FIG. 2A .
- Background image type 110 in FIG. 2B is defined as any area that may be described using a graphical primitive that can easily be regenerated.
- the basic graphical primitive is a line. Multiple lines of the same color represent solid color areas 101 . Multiple lines of different colors represent gradient areas 104 of background.
- text image regions 112 overwrite the background image regions, thus allowing the background to be defined as continuous graphical objects through the text regions.
- text regions identified by text mask 113 are first identified and then marked as “don't-care” regions for the subsequent background decomposition analysis.
- long lines of the same length and the same color are used to describe areas of background image.
- the present embodiment distinguishes between two types of objects as these may be handled by separate compression processes. Firstly, small, low-contrast regions that are surrounded by background or text, for example, small square 107 shown on type 1 object layer 118 with mask 119 in FIG. 2C , are classified as type 1 objects. Secondly, text, graphics, icons, or other high-contrast regions that are at least partially surrounded by picture image types are classified as type 2 objects. Text 105 is an example of a type 2 object shown on its own layer 116 with mask 117 in FIG. 2D . Type 1 objects are typically reconstructed using lossless techniques while type 2 objects may be compressed using either lossless or high quality lossy compression techniques.
- picture image 114 As identified by picture mask 115 in FIG. 2E .
- Picture images do not have the high-contrast detail of text or objects and are not flat graphic images as captured by the background image area.
- the picture area is made up of photographs or other textured images that can be reproduced using photographic compression techniques.
- FIG. 3 illustrates the top-level flow chart for the image decomposition process.
- the first operation is the identification of background areas. Background areas that can be identified before other image types are identified and marked at act 10 .
- High-contrast filters including saturated pixel filters and other pixel pattern filters, are then used to identify and mark high-contrast areas including text, graphics or icons (act 11 ). Once these high-contrast filters have been applied, the text mask contains both text and type 2 object types.
- the background mask is updated to include additional background areas that are identified and marked (act 12 ) using the current text mask.
- the text mask is cleared of pixels that are assigned both as text and background using the updated background mask.
- act 14 small areas that are not identified in the text or background masks are reviewed based on the image type of neighboring pixels. Small areas adjacent to text, background, or type 1 objects are reclassified.
- the text mask is divided into two layers: type 2 object layer 116 and text layer 112 .
- the object layer consists of areas on the original text mask that are not fully surrounded by background. Pixels in the object layer are removed from the text mask and placed in the object mask.
- the text layer consists of areas on the original text mask that are fully surrounded by background. Pixels in the text layer remain on the text mask.
- the mask set is filtered to redefine small, isolated images that may hinder optimum compression and can be reclassified without degrading the image quality.
- the text mask is expanded through iterations of acts 12 and 13 until a desired level of quality is achieved for the text mask and the background mask.
- FIG. 4 illustrates one of the filters that may be used to identify background pixels suitable for coding as graphical objects.
- the filter seeks horizontal area 120 or vertical rectangular area 121 of dimension m by n pixels as referenced in the formula below. Each pixel p(x,y) is tested for a background match. Pixels that are either exactly matched in color or are within a defined limit of color “d” are identified as background:
- the filter seeks a line of adjacent pixels that is 16 pixels in length with all pixels matching in color.
- a variation of this filter may allow small variations in color. In cases where these variations are not factored into the graphics primitive for the background, the compression process would reduce the image quality. Pixels that meet the background filter criteria are marked as background pixels.
- filter embodiments that may be used to identify background pixels are rectangular area filters, diagonal lines, dotted or dashed lines, and color lines of even gradient.
- Embodiments of the present invention describe a graphic artifact using a simple formula. This ensures that the graphic descriptor is simple to generate and the background is readily compressed. As more pixels are identified by each graphic descriptor, the resulting compression ratio improves.
- a graphic descriptor that can be used to determine that the background is a default background color for an area or an entire display. This descriptor should be used cautiously for backgrounds with complex shapes because little or no advantage is gained if a complex description is needed to describe the outline of the image.
- FIG. 5 illustrates a subsection of example image 100 as the image is transformed using background identification and mask generation as shown in FIG. 3 .
- Gradient background area 125 is removed from the image where lines of 16 color-matched pixels are identified. However, due to the presence of text and other artifacts, some of gradient area 126 does not have lines of pixels that are 16 pixels long so they are not identified as background pixels.
- Region 129 on resulting background mask 128 shows the area that has been removed from the image and remaining region 127 indicates where the background has not been removed.
- This filter process has similar results for constant color background region 101 . All regions where lines of 16 pixels of constant color that don't intersect text 102 or object 107 are moved to the background mask.
- text identification 11 The next operation in the decomposition of the image is text identification 11 .
- text and high-contrast type 2 objects are identified using the same filters and only classified separately as a later step based on different background characteristics, the following description uses the term “text” to refer to both text and type 2 object image types unless specifically noted otherwise.
- the preferred embodiment of the invention uses conservative analysis for text identification based on the underlying requirement for accurate image reproduction.
- the rationale is that it is useful for text areas to be correctly identified to ensure lossless compression.
- accidental classification of non-text areas as text areas may impact the compression ratio but does not impact image quality.
- graphical images incorporating lines with 16 pixels of a constant color match the background filter requirements and are decomposed onto the background layer rather than the text layer. This may decrease the overall compression ratio slightly, but both the background and high-contrast features will be reproduced accurately.
- FIG. 6 illustrates a flow chart of the method of the present invention that is used to identify high-contrast text areas.
- High-contrast areas include text, icons and other high-contrast graphic objects. These parts of the image should be viewed clearly and should therefore be capable of lossless reproduction.
- the present invention uses a series of contrast filters in conjunction with an accumulated pixel density integration filter to positively identify text pixels.
- Each contrast filter is applied to the image and marks are assigned to individual pixels identified as text prospects.
- the marks for each pixel are accumulated and the image is filtered by the integration filter to select only areas that have a high density of text markings.
- the first filter method used (act 20 ) for detecting text identifies and marks saturated text pixels.
- a saturated color in RGB space is defined as any color where R, G and B are each 0 or 255, where each RGB color is represented by an 8-bit value.
- these values correspond to the colors black and white. Saturated colors tend to be vivid and are therefore often used in computer display text. Therefore, pixels of saturated color have a high probability of being text.
- the saturated color pixel needs to be adjacent a pixel of contrasting color.
- the filter seeks saturated color pixels with the additional constraint that each be adjacent to a pixel of reasonably high-contrast. Background pixels are almost always saturated, so an additional constraint is that the pixel should not be a background pixel as determined by the previous filters.
- Another filter method involves the identification of pixel regions of various sizes that match, either exactly or within some predefined difference, pre-determined pixel patterns. These pixel patterns are based on the expected color gradient and contour of text. In addition these pixel patterns may include the expected location of background pixels (where a background pixel is a pixel that has been detected by the aforementioned background filter).
- This embodiment of the invention includes application of multiple pixel pattern filters (act 21 ) that compare groups of 1 ⁇ 3, 1 ⁇ 4 or 1 ⁇ 5 regions of pixels to determine if they are assigned text pixel markings.
- Prospective text pixels receive multiple markings from the multiple pixel pattern filters. Once all of the text filters have been applied, the marks are accumulated and integrated over a small area (act 22 ). The output of the integration filter is a value that is used to measure if the area has a sufficient density of text marks. If the area passes the threshold, then all text marks in that area of the text mask identify text pixels. If the area does not pass the threshold, then all text markings are considered to be noise and the text marks in that area are removed. Once the text pixel markings determined to indicate noise have been removed, the remaining text pixel markings are converted into a text mask (act 23 ). Indicia for pixels that are identified as both text and background are also removed from a text mask (act 24 ).
- the text mask contains both text and high-contrast objects. These high-contrast objects are removed from the text mask by a later filter. Text indication is not a perfect process and not every text pixel is positively identified by the aforementioned pixel patterns. A blocking operation is performed to mark the pixels surrounding text pixels (act 25 ). This ensures the mask is expanded to include all text pixels. The expanded area is also useful for background identification.
- FIG. 7 illustrates a saturated pixel filter of the present invention that is used to identify text more accurately than prior solutions.
- Saturated text identification is valid for computer-generated images that have maintained their original, digitally-specified pixel values whereas images that are scanned or pass through an analog transformation are less likely to be able to use saturated color as a means for identifying text. While some text identification methods in the prior art use a threshold to identify pixels as text, these methods are less effective as the original pixel values are unavailable for a determination of the bit-exact original color.
- saturated color is used to mark pixels as potential text areas.
- the marks are summed and combined with the marks from the pixel pattern filters described below to determine if the pixels should be positively identified as text.
- One embodiment of the filter operates on an 8-bit grayscale image format where saturated black pixels have a value of 0 while saturated white pixels have a value of 255. This allows the filter to work with both black and white text.
- the saturated pixel filter requires that a minimum color difference exists between the saturated pixel and an adjacent pixel. Specifically, referring to FIG. 7 , pixel A is marked as text according to the formula:
- Pixel B may be to the right 130 , left 132 , above 131 or below 133 the saturated color pixel A.
- diagonal filters 134 may also be used.
- FIG. 7 shows an example of the “a” character as 5 ⁇ 7 pixel array 135 . If the character pixels are black (or of value 0) and the background contrast is greater than the minimum required color difference, then both of the pixels are marked as text pixels multiple times.
- FIG. 7 shows pixel pair 136 in which pixel A will be marked as text according to the formula.
- the filter requires that the saturated pixel is not also a background pixel in order for the text pixel to be identified. For example A is not identified as text in the application of filter 137 .
- aliased text is detected by measuring the contrast between the saturated pixel A and the pixel B where B is two pixels away from A rather than them being adjacent pixels as described above.
- the middle pixel (between pixel A and pixel B) is either not considered at all in the filter equation or the filter coefficient for that pixel has a reduced weighting. For example, a weighted average value may be calculated across the two non-saturated pixels where the weighting for the center pixel is lower than the weighting for the outer pixel. This averaged contrast level is then used to determine if a contrast threshold is exceeded.
- color pixels that are saturated in one or two of the R, G, or B levels are also considered for text identification.
- the probability of false detection increases as the number of saturated colors is reduced from three to two or one.
- the probability of errors further increases as the filter width increases.
- additional filtering is necessary to remove the unwanted detections.
- one option is to decrease the contrast threshold between the saturated color pixel and the adjacent pixel that positively identifies the color pixel as text.
- FIG. 8 is an illustration of a 3-pixel filter that takes advantage of two types of 1 ⁇ 3 pixel patterns that might appear in a noiseless computer display generated image.
- the illustration shows the character “a” 144 .
- the first pattern 145 takes into account the fact that the background in a digital image may be precisely constant without any noise and therefore A and C are exactly equal.
- the second pattern 146 takes into account that pixels contained in the same text character may be exactly equal and therefore A and C are once again equal.
- Non-digital pictures do not exhibit this characteristic, which may be used as a further identifier of text.
- this filter can also look for color gradients on either side of a text pixel.
- the 3-pixel filter may be applied in multiple directions. Since this filter is symmetric only 4 filter direction variations are shown; horizontal 140 , vertical 141 , and diagonal directions 142 and 143 .
- the advantage of the 1 ⁇ 3 pixel pattern filter is that text written over a picture may be detected.
- the picture may not be flat enough to have matching pixels on either side of a text pixel.
- text written on top of pictures is usually written with opaque pixels.
- two pixels on the same character or adjacent characters are likely to have the exact same color and can be separated by a picture background that meets the minimum contrast difference requirement.
- the filter has two control values for determining if pixel or group of pixels matches this pattern and thus should be marked as text.
- the first value is the minimum difference between the center pixel and the nearest outside pixel.
- the second control value is the maximum difference between the two outer pixels. While the minimum difference of the center pixel need not be large if the end pixels are identical, in cases where the maximum allowable difference between the end pixels is increased, the center pixel minimum difference should also be increased to prevent excessive false text markings.
- An optional parameter for the filter is to use the background information to determine if a pixel is text.
- Pixels A, B and C are marked as text according to the criteria in the expression below:
- ⁇ maximum difference between the two outside pixels (4) and
- > minimum difference between center pixel and nearest outside pixel and optionally A and/or B are background pixels (5)
- the center pixel is not a background pixel, then there is a high probability that the center pixel is a text pixel. If only one end of the filter is an identified background pixel but there is minimal difference between the two ends, then there is a reasonable probability that the text is on a gradient background. In cases where a pixel identified as a background pixel is under filter examination, the other two parameters may be reduced without increased false text detection.
- FIG. 9 is an illustration of a 4-pixel filter.
- the illustration shows an anti-aliased character “a” 154 , with 4-pixel filter 155 .
- the different representations for different pixels in the illustration represent different grayscale levels that comprise the anti-aliased character.
- This filter is similar to the 3-pixel filter described above and may be applied in multiple orientations 150 , 151 and 152 .
- One example of a 1 ⁇ 4 pixel pattern that may be applied is described as follows. Pixels A, B, C and D are marked as text if the following conditions are met:
- ⁇ maximum difference (6) and (
- > minimum difference or
- > minimum difference) (7)
- 4-pixel filter 155 leverages the fact that the background in a digital image may be precisely constant without any noise i.e. pixels
- ⁇ maximum difference as the filter covers adjacent text pixels B and C on background pixels A and D 156 .
- Filter 155 also leverages other characteristics of text. For example, for readability purposes text pixels are surrounded by pixels of high-contrast e.g.
- > minimum difference or
- > minimum difference.
- Pixels A, B, C and D are marked as text using the middle pixels according to the expression:
- ⁇ maximum difference (8) and
- > minimum difference (9) and
- > minimum difference (10)
- the primary enhancement of the 1 ⁇ 4 pixel pattern filter over the 1 ⁇ 3 filter is that the 1 ⁇ 4 pixel patterns may be applied to detect larger font over a wider area of flat text.
- some pixel patterns associated with small fonts can only be properly expressed by a 1 ⁇ 4 pixel pattern.
- a variation on the 4-pixel filter embodiment uses background pixel information to improve the search in a similar mode to the 1 ⁇ 3 pattern filter.
- a 1 ⁇ 5-pixel pattern embodiment is also useful for detecting wider text. While the simple n ⁇ m pixel pattern recognition works well for small values of n and m, as the pixel pattern increases in size, it loses its suitability to capturing generic text characteristics. In fact, the filter embodiment becomes better suited to more computationally intensive character recognition applications.
- the 3-, 4-, and 5-pixel filters described work well for computer displays and provide significant processing and identification improvement over alternative filtering methods.
- the simple pixel comparison method is suitable for the real-time decomposition of a computer display.
- FIG. 10 shows area 168 of original image 100 after the text filters have been applied. Pixels that are marked are illustrated as pixel marks on pixel map 165 while those that have not been detected by any of the text filters are shown as pixels 169 on pixel map 160 .
- the text filtering process results in numerous text markings in areas of high text density, a few markings in areas where text appears over a picture and infrequent markings in other regions of the image, for example regions where a picture has localized areas of high contrast.
- Pixel map 165 shows the filtered pixel data for pixels that have accumulated at least one positive marking.
- text pixels will typically have multiple markings because each text pixel may be detected by multiple filters (for example, a saturated color filter on one or more pixel pattern filters) whereas textured background pixels will have no markings or only a few markings as a result of occasional erroneous text detections.
- accumulated text markings provided by the text filters are filtered to evaluate the text mark density and remove erroneous text detections. If the number of text marks over a small area exceeds a defined threshold, the text pixels in that area remain marked as text pixels.
- the weighting of text marks and the text density threshold may be varied in different areas of the image. Nevertheless, depending on how the text markings are accumulated and the defined threshold value, some false text indications may result, especially in areas where text is drawn over textured image 105 .
- FIG. 11 shows the results of text surround mask generation process 25 in area 168 .
- FIG. 11 shows text mask 170 for area 168 .
- the process also includes the removal of text pixels located over background pixels (act 24 ).
- the text mask is then expanded around every text pixel (act 25 ) and results in expanded text pixel mask 172 , of which pixel subsection 171 for area 168 is shown.
- extra pixels may be captured in the process, this ensures that all of the text over a picture background area will be accurately reproduced when decompressed.
- expansion of the text over the background aids more precise identification of the background itself as shown in the next decomposition operation.
- the text mask contains text and high-contrast objects.
- FIG. 12 illustrates background expansion and mask update process 12 .
- a background modification filter evaluates line 181 of 16 pixels.
- the top part of two anti-aliased “a” characters is shown at reference numeral 180 .
- the filter For every pixel that is marked as a background pixel, the filter analyzes the next 16 pixels. While the embodiment shows the next 16 pixels to the right of background pixel 182 under analysis, the filter may be applied to the right, above, below or diagonally.
- the pixels are assigned various markings identified by the letters T, B and C, depending on specific attributes. Pixels have been identified as text pixels in the text mask are marked with a T (reference numeral 186 ). Pixels have been previously identified as background pixels are marked with a B. Pixels that exactly match the color of pixel 182 are marked with a C to indicate the color matching criterion. C pixels 183 and 184 are potentially background pixels and are therefore subject to further evaluation.
- Pixels that are marked only as text 186 are ignored as these represent the text over the background. If pixels are marked with both T and C, they have been incorrectly identified as text, probably as a result of expanding the text mask. These pixels are candidates for background pixels.
- the effect of the filter is that background inside of text is marked.
- background inside of text is marked.
- the area inside of the letter “O” is marked as background as a result of this process.
- the gradient lines run orthogonal to the direction of constant color. If the filter described in this embodiment is applied both horizontally and vertically, it will also successfully detect and mark the gradient background around the text.
- FIG. 13 represents image 190 with areas of background 191 that have been detected and removed by the background modification filter.
- the resulting update lines 192 generated by the background modification filter are added to the initial background mask creating completed background mask 111 .
- FIG. 14 shows an example of background mask 111 that is applied to completely expanded text mask 172 to generate clean, text-only mask 197 in text expansion and mask update (act 13 ).
- Text mask 197 still contains both text and high-contrast type 2 objects.
- the text filters detect text image types but do not detect small areas that are smooth or have low contrast, for example a low contrast icon overlaid on a background.
- Enclosed object additions add these artifacts as type 1 objects to the text mask so that they can be accurately reproduced.
- FIG. 15 is an illustration of the enclosed artifact search method.
- An area of image 200 which has pixels that have not been identified as text or background 201 is surrounded by pixels that are identified as background pixels B 202 .
- the process searches for a series of connected background pixels that can create box 203 around the unmarked pixels. If the size of the box is within a defined area, the unmarked pixels are marked as object pixels A 204 . If the box exceeds the defined area, it is likely a picture on a page and the pixels are left unmarked. These and other unmarked pixels are classified as picture pixels at the end of the identification sequence.
- the addition of the enclosed objects to the text mask may improve the background expansion and mask update act 12 . For example, in the case of an enclosed object that bisects a horizontal line of one color into two shorter lines. Once the enclosed object is removed, the two lines may be joined by the method to form a single line. The resulting single line is more efficiently compressed.
- FIG. 16 is an illustration of image 210 with undefined artifact 211 removed and added as object 213 on object mask 119 in accordance with act 15 .
- FIG. 17 illustrates background mask 111 combined with text mask 197 to identify the unmarked pixels.
- the unmarked pixels are used to generate picture mask 115 .
- each act in the identification process makes use of pixel markings generated in previous steps, the identification process may be repeated to improve the decomposition as illustrated in iterative, top-level act 18 .
- high-contrast type 2 objects are moved to the object layer. Text pixels that are at least partially surrounded by pictures are removed from text mask 197 and type 2 object mask 117 is generated. Text pixels that are completely surrounded by background pixels remain on text mask 113 .
- FIG. 19 A variation on this embodiment is shown in FIG. 19 , where the type 2 objects on text mask 197 are added to object mask 119 (that already contains low-contrast objects detected in the encircling filter) to form combined object mask 221 rather than two separate object masks. Separating the objects into multiple masks provides further opportunities to optimize the compression techniques.
- small areas of image types may be filtered at act 17 once the masks have been created.
- This filter reclassifies small areas of one image type based on the type of adjacent pixels in order to improve the compression ratio of the image.
- a first filter method changes small areas of background pixels that are surrounded by text pixels to text pixels. The reason this is more efficient is that background image types compress well if they define a large area, but the text compression algorithms may be better at handling small groups of pixels.
- a second filter method changes small groups of background pixels that are surrounded by picture pixels to picture pixels because these areas are likely a flat area of the picture.
- a third filter method converts small groups of picture pixels surrounded by background or text pixels to text pixels using methods similar to the enclosed artifact detection of act 14 .
- color space translation may be used to improve or simplify the decomposition methods.
- the image should be compressed as an RGB format or using another lossless translation to ensure accurate reproduction of the image.
- the compressed image and masks are received by the remote client as a data stream described above. While the present invention can be used with conventional computers configured in networked environments, the present invention is particularly useful for thin clients that have limited bandwidth capabilities or limited processing resources, such as portable computers, wireless devices such as cellular telephones, palm-top computers, and the like.
- the data stream is a serial stream comprised of a header, a sequence of four mask information fields (for the text, background, object and picture masks) followed by the compressed image data for a specified geometry of image pixels.
- each sequence of four mask information fields is used to describe a compressed block area of 16 ⁇ 16 pixels.
- the blocks may be of other dimensions, including larger blocks, lines or entire frames.
- the remote client is familiar with the organization of the data stream and has the ability to extract the mask information from the mask information fields and decode the image based on this information to reconstruct the original image frame.
- the remote client maintains the algorithms necessary to decompress the image data using the methods identified by the mask information.
- different decompression techniques may be used, such as Run-Length Encoding (RLE), Lempel Ziv Walsh (LZW) encoding, Joint Photographic Experts Group (JPEG) decompression, Motion Picture Experts Group (MPEG) decompression, or other published or proprietary lossless or lossy compression methods.
- RLE Run-Length Encoding
- LZW Lempel Ziv Walsh
- JPEG Joint Photographic Experts Group
- MPEG Motion Picture Experts Group
- the compressed display stream may be decompressed on a per block basis, across multiple blocks (e.g. LZW, JPEG), or across frame updates (e.g. MPEG).
- the decompression apparatus located on the remote client uses the received mask set to enable the image decompression and reconstruction methods.
- background and picture layers are decompressed and reconstructed before the text and object layers.
- the mask provides the co-ordinates for the start and end co-ordinates of graphic descriptors or the predictive background decoder in an alternative embodiment.
- the descriptors themselves may define the background co-ordinates.
- the remote client uses the received picture mask to identify the co-ordinates and boundaries of the picture areas once they have been decompressed.
- the object mask identifies the exact location of object pixels in the original image although the mask does not specify the object texture. Objects are decompressed and the pixels are populated over the background of the reconstructed image using the co-ordinate positions provided by the mask.
- the text mask defines the boundaries of the text. Texture detail is derived through a lossless decoding method used for the text layer text.
- the text mask provides an accurate specification of the form and texture of the text. For example, in the case of simple single color text, accurate text reconstruction is accomplished by populating the locations of the image specified by the text mask with the pixels matching the color specified by the text layer.
Abstract
Description
|p(x,y)−p(x+i,y+j)|<=d (1)
|A−B|>=d (2)
and
A=0xFF or 0x00 (3)
-
- A is not a background pixel, where d is the minimum specified color difference.
|A−C|<=maximum difference between the two outside pixels (4)
and
|A−B|>=minimum difference between center pixel and nearest outside pixel and optionally A and/or B are background pixels (5)
|A−D|<=maximum difference (6)
and
(|A−B|>=minimum difference or |C−D|>=minimum difference) (7)
|B−C|<=maximum difference (8)
and
|A−B|>=minimum difference (9)
and
|C−D|>=minimum difference (10)
Claims (8)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/173,303 US7782339B1 (en) | 2004-06-30 | 2005-06-30 | Method and apparatus for generating masks for a multi-layer image decomposition |
US12/825,092 US8442311B1 (en) | 2005-06-30 | 2010-06-28 | Apparatus and method for encoding an image generated in part by graphical commands |
US13/863,025 US8855414B1 (en) | 2004-06-30 | 2013-04-15 | Apparatus and method for encoding an image generated in part by graphical commands |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US58486904P | 2004-06-30 | 2004-06-30 | |
US11/173,303 US7782339B1 (en) | 2004-06-30 | 2005-06-30 | Method and apparatus for generating masks for a multi-layer image decomposition |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/333,955 Continuation-In-Part US7747086B1 (en) | 2004-06-30 | 2006-01-17 | Methods and apparatus for encoding a shared drawing memory |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/825,092 Continuation-In-Part US8442311B1 (en) | 2004-06-30 | 2010-06-28 | Apparatus and method for encoding an image generated in part by graphical commands |
Publications (1)
Publication Number | Publication Date |
---|---|
US7782339B1 true US7782339B1 (en) | 2010-08-24 |
Family
ID=42583337
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/173,303 Active 2026-02-28 US7782339B1 (en) | 2004-06-30 | 2005-06-30 | Method and apparatus for generating masks for a multi-layer image decomposition |
Country Status (1)
Country | Link |
---|---|
US (1) | US7782339B1 (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070195107A1 (en) * | 2006-02-23 | 2007-08-23 | Dubois David H | Combining multi-layered bitmap files using network specific hardware |
US20080075370A1 (en) * | 2006-09-15 | 2008-03-27 | Ricoh Company, Limited | Apparatus, method, system, and computer program product |
US20090132943A1 (en) * | 2007-02-13 | 2009-05-21 | Claudia Juliana Minsky | Method and System for Creating a Multifunctional Collage Useable for Client/Server Communication |
US20090316213A1 (en) * | 2008-06-23 | 2009-12-24 | Xerox Corporation | System and method of improving image quality in digital image scanning and printing by reducing noise in output image data |
US20090323089A1 (en) * | 2008-06-24 | 2009-12-31 | Makoto Hayasaki | Image processing apparatus, image forming apparatus, image processing method, and computer-readable storage medium storing image processing program |
US20110052062A1 (en) * | 2009-08-25 | 2011-03-03 | Patrick Chiu | System and method for identifying pictures in documents |
US20110292062A1 (en) * | 2010-05-28 | 2011-12-01 | Casio Computer Co., Ltd. | Image processing apparatus, method, and storage medium storing a program |
US20120027309A1 (en) * | 2009-04-14 | 2012-02-02 | Nec Corporation | Image signature extraction device |
US20120189179A1 (en) * | 2007-03-19 | 2012-07-26 | General Electric Company | Processing of content-based compressed images |
US20120212495A1 (en) * | 2008-10-23 | 2012-08-23 | Microsoft Corporation | User Interface with Parallax Animation |
US8326051B1 (en) * | 2008-02-22 | 2012-12-04 | Teradici Corporation | Method and apparatus for progressive encoding for text transmission |
WO2013010248A1 (en) * | 2011-07-21 | 2013-01-24 | Research In Motion | Adaptive filtering based on pattern information |
US8892170B2 (en) | 2009-03-30 | 2014-11-18 | Microsoft Corporation | Unlock screen |
US8914072B2 (en) | 2009-03-30 | 2014-12-16 | Microsoft Corporation | Chromeless user interface |
US9323424B2 (en) | 2008-10-23 | 2016-04-26 | Microsoft Corporation | Column organization of content |
US20160203645A1 (en) * | 2015-01-09 | 2016-07-14 | Marjorie Knepp | System and method for delivering augmented reality to printed books |
US9451271B2 (en) | 2011-07-21 | 2016-09-20 | Blackberry Limited | Adaptive filtering based on pattern information |
US20170344821A1 (en) * | 2016-05-25 | 2017-11-30 | Ebay Inc. | Document optical character recognition |
US20200053230A1 (en) * | 2018-08-10 | 2020-02-13 | Masamoto Nakazawa | Reading device, image forming apparatus, authenticity determination system, and reading method |
US10740616B2 (en) * | 2016-02-12 | 2020-08-11 | Viaccess | Method for identifying a show in a video filmed by a camera of a spectator |
US11457116B2 (en) * | 2020-06-17 | 2022-09-27 | Ricoh Company, Ltd. | Image processing apparatus and image reading method |
US20220407978A1 (en) * | 2020-01-23 | 2022-12-22 | Hewlett-Packard Development Company, L.P. | Determining minimum scanning resolution |
Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5583573A (en) | 1992-04-28 | 1996-12-10 | Mitsubishi Denki Kabushiki Kaisha | Video encoder and encoding method using intercomparisons of pixel values in selection of appropriation quantization values to yield an amount of encoded data substantialy equal to nominal amount |
US5586200A (en) | 1994-01-07 | 1996-12-17 | Panasonic Technologies, Inc. | Segmentation based image compression system |
US5767978A (en) | 1997-01-21 | 1998-06-16 | Xerox Corporation | Image segmentation system |
US5915044A (en) * | 1995-09-29 | 1999-06-22 | Intel Corporation | Encoding video images using foreground/background segmentation |
US5949555A (en) | 1994-02-04 | 1999-09-07 | Canon Kabushiki Kaisha | Image processing apparatus and method |
US5990852A (en) | 1996-10-31 | 1999-11-23 | Fujitsu Limited | Display screen duplication system and method |
US20030072487A1 (en) | 2001-10-12 | 2003-04-17 | Xerox Corporation | Background-based image segmentation |
US20030133617A1 (en) | 2002-01-14 | 2003-07-17 | Debargha Mukherjee | Coder matched layer separation and interpolation for compression of compound documents |
US20030156760A1 (en) | 2002-02-20 | 2003-08-21 | International Business Machines Corporation | Layer based compression of digital images |
US20030185454A1 (en) | 2002-03-26 | 2003-10-02 | Simard Patrice Y. | System and method for image compression using wavelet coding of masked images |
US6633670B1 (en) * | 2000-03-31 | 2003-10-14 | Sharp Laboratories Of America, Inc. | Mask generation for multi-layer image decomposition |
US20030197715A1 (en) * | 1999-05-17 | 2003-10-23 | International Business Machines Corporation | Method and a computer system for displaying and selecting images |
US20030202697A1 (en) * | 2002-04-25 | 2003-10-30 | Simard Patrice Y. | Segmented layered image system |
US20030202699A1 (en) | 2002-04-25 | 2003-10-30 | Simard Patrice Y. | System and method facilitating document image compression utilizing a mask |
US6664969B1 (en) | 1999-11-12 | 2003-12-16 | Hewlett-Packard Development Company, L.P. | Operating system independent method and apparatus for graphical remote access |
US20040010622A1 (en) | 2002-07-11 | 2004-01-15 | O'neill Thomas G. | Method and system for buffering image updates in a remote application |
US6701012B1 (en) | 2000-07-24 | 2004-03-02 | Sharp Laboratories Of America, Inc. | Out-of-layer pixel generation for a decomposed-image layer |
US20050053278A1 (en) * | 2001-05-31 | 2005-03-10 | Baoxin Li | Image background replacement method |
US20050270307A1 (en) * | 1999-09-15 | 2005-12-08 | Brouaux Alexandre Marc Jacques | Dynamic graphic user interface |
US6995763B2 (en) * | 1999-01-22 | 2006-02-07 | Cedara Software Corp. | Interactive sculpting for volumetric exploration and feature extraction |
US20060031755A1 (en) * | 2004-06-24 | 2006-02-09 | Avaya Technology Corp. | Sharing inking during multi-modal communication |
US7016080B2 (en) | 2000-09-21 | 2006-03-21 | Eastman Kodak Company | Method and system for improving scanned image detail |
US7202872B2 (en) * | 2003-10-29 | 2007-04-10 | Via Technologies, Inc. | Apparatus for compressing data in a bit stream or bit pattern |
US7221810B2 (en) * | 2000-11-13 | 2007-05-22 | Anoto Group Ab | Method and device for recording of information |
US7246342B2 (en) * | 2002-07-26 | 2007-07-17 | Asml Masktools B.V. | Orientation dependent shielding for use with dipole illumination techniques |
US7333657B1 (en) * | 1999-12-02 | 2008-02-19 | Adobe Systems Incorporated | Recognizing text in a multicolor image |
-
2005
- 2005-06-30 US US11/173,303 patent/US7782339B1/en active Active
Patent Citations (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5583573A (en) | 1992-04-28 | 1996-12-10 | Mitsubishi Denki Kabushiki Kaisha | Video encoder and encoding method using intercomparisons of pixel values in selection of appropriation quantization values to yield an amount of encoded data substantialy equal to nominal amount |
US5586200A (en) | 1994-01-07 | 1996-12-17 | Panasonic Technologies, Inc. | Segmentation based image compression system |
US5949555A (en) | 1994-02-04 | 1999-09-07 | Canon Kabushiki Kaisha | Image processing apparatus and method |
US5915044A (en) * | 1995-09-29 | 1999-06-22 | Intel Corporation | Encoding video images using foreground/background segmentation |
US5990852A (en) | 1996-10-31 | 1999-11-23 | Fujitsu Limited | Display screen duplication system and method |
US5767978A (en) | 1997-01-21 | 1998-06-16 | Xerox Corporation | Image segmentation system |
US6995763B2 (en) * | 1999-01-22 | 2006-02-07 | Cedara Software Corp. | Interactive sculpting for volumetric exploration and feature extraction |
US20030197715A1 (en) * | 1999-05-17 | 2003-10-23 | International Business Machines Corporation | Method and a computer system for displaying and selecting images |
US20050270307A1 (en) * | 1999-09-15 | 2005-12-08 | Brouaux Alexandre Marc Jacques | Dynamic graphic user interface |
US6664969B1 (en) | 1999-11-12 | 2003-12-16 | Hewlett-Packard Development Company, L.P. | Operating system independent method and apparatus for graphical remote access |
US7333657B1 (en) * | 1999-12-02 | 2008-02-19 | Adobe Systems Incorporated | Recognizing text in a multicolor image |
US6633670B1 (en) * | 2000-03-31 | 2003-10-14 | Sharp Laboratories Of America, Inc. | Mask generation for multi-layer image decomposition |
US6701012B1 (en) | 2000-07-24 | 2004-03-02 | Sharp Laboratories Of America, Inc. | Out-of-layer pixel generation for a decomposed-image layer |
US7016080B2 (en) | 2000-09-21 | 2006-03-21 | Eastman Kodak Company | Method and system for improving scanned image detail |
US7221810B2 (en) * | 2000-11-13 | 2007-05-22 | Anoto Group Ab | Method and device for recording of information |
US20050053278A1 (en) * | 2001-05-31 | 2005-03-10 | Baoxin Li | Image background replacement method |
US20030072487A1 (en) | 2001-10-12 | 2003-04-17 | Xerox Corporation | Background-based image segmentation |
US20030133617A1 (en) | 2002-01-14 | 2003-07-17 | Debargha Mukherjee | Coder matched layer separation and interpolation for compression of compound documents |
US20030156760A1 (en) | 2002-02-20 | 2003-08-21 | International Business Machines Corporation | Layer based compression of digital images |
US20030185454A1 (en) | 2002-03-26 | 2003-10-02 | Simard Patrice Y. | System and method for image compression using wavelet coding of masked images |
US20070025622A1 (en) * | 2002-04-25 | 2007-02-01 | Microsoft Corporation | Segmented layered image system |
US7120297B2 (en) * | 2002-04-25 | 2006-10-10 | Microsoft Corporation | Segmented layered image system |
US20030202699A1 (en) | 2002-04-25 | 2003-10-30 | Simard Patrice Y. | System and method facilitating document image compression utilizing a mask |
US20030202697A1 (en) * | 2002-04-25 | 2003-10-30 | Simard Patrice Y. | Segmented layered image system |
US20040010622A1 (en) | 2002-07-11 | 2004-01-15 | O'neill Thomas G. | Method and system for buffering image updates in a remote application |
US7246342B2 (en) * | 2002-07-26 | 2007-07-17 | Asml Masktools B.V. | Orientation dependent shielding for use with dipole illumination techniques |
US7202872B2 (en) * | 2003-10-29 | 2007-04-10 | Via Technologies, Inc. | Apparatus for compressing data in a bit stream or bit pattern |
US20060031755A1 (en) * | 2004-06-24 | 2006-02-09 | Avaya Technology Corp. | Sharing inking during multi-modal communication |
Non-Patent Citations (27)
Title |
---|
Broder, Andrei et al., "Pattern-based compression of text images," Digital Syst. Res. Center and Michael Mitzenmacher, Dept. Of Computer Science, UC Berkeley, Data Compression Conference 1996, pp. 300-309, 1996. |
Draft ITU-T Recommendation T.44 "Mixed Raster Content (MRC)," International Telecommunication Union, Study Group 8 (Contribution (Oct. 1997). |
Finding text in images Victor Wu, R. Manmatha, Edward M. Riseman Jul. 1997 DL '97: Proceedings of the second ACM international conference on Digital libraries Publisher: ACM. * |
Fragment-based image completion Iddo Drori, Daniel Cohen-Or, Hezy Yeshurun Jul. 2003 ACM Transactions on Graphics (TOG), ACM SIGGRAPH 2003 Papers SIGGRAPH 03, vol. 22 Issue 3 Publisher: ACM Press. * |
Gilbert, Jeffrey M. et al., "A Lossless 2-D Image Compression Technique for Synthetic Discrete-Tone Images," in Proceedings of the Data Compression Conference (DCC), 10 pages, Mar.-Apr. 1998. |
Jason Nieh, S. et al., "A Comparison of Thin-Client Computing Architectures," Technical Report CUCS-022-00, www.nomachine.com/documentation/pdf/cucs-022-00.pdf, 16 pages, Nov 2000. |
Jeffery Michael Gilbert, "Text/Graphics and Image Transmission over Bandlimited Lossy Links", A thesis submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering Electrical Engineering and Computer Sciences in the Graduate Division of the University of California, Berkeley, Spring 2000, 276 pages, Berkeley, CA. |
Keechul Jung, Kwang In Kim, Anil K. Jain: Text Information Extraction in Images and Video: A Survey. Pattern Recognition, vol. 37(5), pp. 977-997, www.cse.msu.edu/prip/Files/TextDetectionSurvey.pdf, 2004. |
Li et al., "Text and Picture Segmentation by the Distribution Analysis of Wavelet Coefficients," Proceedings of the 1998 IEEE International Conference on Image Processing (ICIP-98), Chicago, Illinois, vol. 3, pp. 790-794, Oct. 4-7, 1998. |
Lienhart, Rainer et al., "Automatic Text Segmentation and Text Recognition for Video Indexing," Mulitmedia Systems, vol. 8, No. 1, pp. 69-81, Jan. 2000. |
Lienhart, Rainer et al., "Video OCR: A Survey and Practitioner's Guide," in Video Mining, Kluwer Academic Publisher, pp. 155-184, Oct. 2003. |
Lin, Tony, et al., "Hybrid Image Coding for Real-Time Computer Screen Video Transmission," Visual Communications and Image Processing (VCIP), part of the IS&T/SPIE Symposium on Electronic Imaging, 12 pages, San Jose, CA, USA, Jan. 18-22, 2004. |
Oliveira, Inês et al., "Image Processing Techniques for Video Content Extraction," ERCIM Workshop Proceedings-No. 97-W004; www.ercim.org/publication/ ws-proceedings/DELOS4/oliveira.pdf, San Miniato, Aug. 28-30, 1997. |
Queiroz, Ricardo L de et al., Mixed Raster Content (MRC) Model for Compound Image Compression, Corporate Research & Technology, Xerox Corp., Proceedings SPIE, Visual Communications and Image Processng, vol. 3653, pp. 1106-1117, Jan. 1999. |
Queiroz, Ricardo L. de et al., "Optimizing Block-Thresholding Segmentation for Multilayer Compression of Compound Images," IEEE Transactions on Image Processing, vol. 9, No. 9, pp. 1461-1471, Sep. 2000. |
Said, Amir, "Compression of Compound Images and Video for Enabling Rich Media in Embedded Systems," Imaging Systems Laboratory, HP Laboratories Palo Alto, HPL-2004-89, 14 pages, May 11, 2004. |
Sato, T. et al., Video OCR for Digital News Archives. IEEE International Workshop on Content-Based Access of Image and Video Database (CAIVD '98), Los Alamitos, CA, pp. 52-60, Jan. 1998. |
Starck, J.-L. et al., "Image Decomposition: Separation of Texture from Piecewise Smooth Content," www-sccm.stanford.edu/~elad/ Conferences/19-Separation-SPIE-2003.pdf, 12 pages, 2003. |
Starck, J.-L. et al., "Image Decomposition: Separation of Texture from Piecewise Smooth Content," www-sccm.stanford.edu/˜elad/ Conferences/19—Separation—SPIE—2003.pdf, 12 pages, 2003. |
Tony Lin, Pengwei Hao, Chao Xu & Ju-Fu Feng, "Hybrid image coding for real-time computer screen video transmission", Visual Communications and Image Processing (VCIP), part of the IS&T/SPIE Symposium on Electronic Imaging 2004, 12 pages, Jan. 2004, San Jose, CA. |
Tony Lin, Pengwei Hao; Sang UK Lee, "Efficient Coding of Computer Generated Compound Images", IEEE International Conference on Image Processing, ICIP 2005, Sep. 2005, pp. 561-564, vol. 1, Genoa, Itally. |
Tour into the picture: using a spidery mesh interface to make animation from a single image Youichi Horry, Ken-Ichi Anjyo, Kiyoshi Arai Aug. 1997 SIGGRAPH '97: Proceedings of the 24th annual conference on Computer graphics and interactive techniques Publisher: ACM Press/Addison-Wesley Publishing Co. * |
V. Wu, et al., "Finding Text in Images", In Proceedings of Second ACM International Conference on Digital Libraries, Philadelphia, PA, pp. 3-12, 1997. |
Vasconcelos, Nuno et al., "Statistical Models of Video Structure for Content Analysis and Characterization," IEEE Transactions on Image Processing, vol. 9, No. 1, 15 pages, Jan. 2000. |
Wang, Wei et al., "Identification of Objects From Image Regions," http://www.cse.buffalo.edu/DBGROUP/psfiles/Wang/icme2003.pd, NSF Digital Government Grant EIA-9983430, 4 pages, 2003. |
William Rucklidge, Digipaper: A Versatile Color Document Image Representation, Proceedings of the IEEE, International Conference on Image Processing, Kobe, Japan, Oct. 24-25, 1999. * |
Yuan, Xiaojing et al., "Multi-scale Feature Identification Using Evolution Strategies," (Mechanical Engineering Department, Tulane University, New Orleans), Preprint submitted to Elsevier Science, 16 pages, Aug. 11, 2003. |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070195107A1 (en) * | 2006-02-23 | 2007-08-23 | Dubois David H | Combining multi-layered bitmap files using network specific hardware |
US8125486B2 (en) * | 2006-02-23 | 2012-02-28 | Los Alamos National Security, Llc | Combining multi-layered bitmap files using network specific hardware |
US20080075370A1 (en) * | 2006-09-15 | 2008-03-27 | Ricoh Company, Limited | Apparatus, method, system, and computer program product |
US8170344B2 (en) * | 2006-09-15 | 2012-05-01 | Ricoh Company, Limited | Image storage device, image storage system, method of storing image data, and computer program product for image data storing |
US9530142B2 (en) * | 2007-02-13 | 2016-12-27 | Claudia Juliana Minsky | Method and system for creating a multifunctional collage useable for client/server communication |
US20090132943A1 (en) * | 2007-02-13 | 2009-05-21 | Claudia Juliana Minsky | Method and System for Creating a Multifunctional Collage Useable for Client/Server Communication |
US8406539B2 (en) * | 2007-03-19 | 2013-03-26 | General Electric Company | Processing of content-based compressed images |
US20120189179A1 (en) * | 2007-03-19 | 2012-07-26 | General Electric Company | Processing of content-based compressed images |
US8824799B1 (en) * | 2008-02-22 | 2014-09-02 | Teradici Corporation | Method and apparatus for progressive encoding for text transmission |
US8559709B1 (en) | 2008-02-22 | 2013-10-15 | Teradici Corporation | Method and apparatus for progressive encoding for text transmission |
US8326051B1 (en) * | 2008-02-22 | 2012-12-04 | Teradici Corporation | Method and apparatus for progressive encoding for text transmission |
US20090316213A1 (en) * | 2008-06-23 | 2009-12-24 | Xerox Corporation | System and method of improving image quality in digital image scanning and printing by reducing noise in output image data |
US8503036B2 (en) * | 2008-06-23 | 2013-08-06 | Xerox Corporation | System and method of improving image quality in digital image scanning and printing by reducing noise in output image data |
US20090323089A1 (en) * | 2008-06-24 | 2009-12-31 | Makoto Hayasaki | Image processing apparatus, image forming apparatus, image processing method, and computer-readable storage medium storing image processing program |
US8384952B2 (en) * | 2008-06-24 | 2013-02-26 | Sharp Kabushiki Kaisha | Image processing and forming apparatus, method and computer-readable medium for improving document image compression efficiency and quality |
US9606704B2 (en) | 2008-10-23 | 2017-03-28 | Microsoft Technology Licensing, Llc | Alternative inputs of a mobile communications device |
US8970499B2 (en) | 2008-10-23 | 2015-03-03 | Microsoft Technology Licensing, Llc | Alternative inputs of a mobile communications device |
US20120212495A1 (en) * | 2008-10-23 | 2012-08-23 | Microsoft Corporation | User Interface with Parallax Animation |
US10133453B2 (en) | 2008-10-23 | 2018-11-20 | Microsoft Technology Licensing, Llc | Alternative inputs of a mobile communications device |
US9323424B2 (en) | 2008-10-23 | 2016-04-26 | Microsoft Corporation | Column organization of content |
US9223411B2 (en) * | 2008-10-23 | 2015-12-29 | Microsoft Technology Licensing, Llc | User interface with parallax animation |
US9218067B2 (en) | 2008-10-23 | 2015-12-22 | Microsoft Technology Licensing, Llc | Mobile communications device user interface |
US9703452B2 (en) | 2008-10-23 | 2017-07-11 | Microsoft Technology Licensing, Llc | Mobile communications device user interface |
US8914072B2 (en) | 2009-03-30 | 2014-12-16 | Microsoft Corporation | Chromeless user interface |
US8892170B2 (en) | 2009-03-30 | 2014-11-18 | Microsoft Corporation | Unlock screen |
US9977575B2 (en) | 2009-03-30 | 2018-05-22 | Microsoft Technology Licensing, Llc | Chromeless user interface |
US8861871B2 (en) * | 2009-04-14 | 2014-10-14 | Nec Corporation | Image signature extraction device |
US20120027309A1 (en) * | 2009-04-14 | 2012-02-02 | Nec Corporation | Image signature extraction device |
US8634644B2 (en) * | 2009-08-25 | 2014-01-21 | Fuji Xerox Co., Ltd. | System and method for identifying pictures in documents |
US20110052062A1 (en) * | 2009-08-25 | 2011-03-03 | Patrick Chiu | System and method for identifying pictures in documents |
US20110292062A1 (en) * | 2010-05-28 | 2011-12-01 | Casio Computer Co., Ltd. | Image processing apparatus, method, and storage medium storing a program |
US9451271B2 (en) | 2011-07-21 | 2016-09-20 | Blackberry Limited | Adaptive filtering based on pattern information |
WO2013010248A1 (en) * | 2011-07-21 | 2013-01-24 | Research In Motion | Adaptive filtering based on pattern information |
US20160203645A1 (en) * | 2015-01-09 | 2016-07-14 | Marjorie Knepp | System and method for delivering augmented reality to printed books |
US10740616B2 (en) * | 2016-02-12 | 2020-08-11 | Viaccess | Method for identifying a show in a video filmed by a camera of a spectator |
US20170344821A1 (en) * | 2016-05-25 | 2017-11-30 | Ebay Inc. | Document optical character recognition |
US10068132B2 (en) * | 2016-05-25 | 2018-09-04 | Ebay Inc. | Document optical character recognition |
US11893611B2 (en) | 2016-05-25 | 2024-02-06 | Ebay Inc. | Document optical character recognition |
US20200053230A1 (en) * | 2018-08-10 | 2020-02-13 | Masamoto Nakazawa | Reading device, image forming apparatus, authenticity determination system, and reading method |
US10924621B2 (en) * | 2018-08-10 | 2021-02-16 | Ricoh Company, Ltd. | Reading device to read and output an invisible image included in a document |
US20220407978A1 (en) * | 2020-01-23 | 2022-12-22 | Hewlett-Packard Development Company, L.P. | Determining minimum scanning resolution |
US11800036B2 (en) * | 2020-01-23 | 2023-10-24 | Hewlett, Packard Development Company, L.P. | Determining minimum scanning resolution |
US11457116B2 (en) * | 2020-06-17 | 2022-09-27 | Ricoh Company, Ltd. | Image processing apparatus and image reading method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7782339B1 (en) | Method and apparatus for generating masks for a multi-layer image decomposition | |
US7634150B2 (en) | Removing ringing and blocking artifacts from JPEG compressed document images | |
US5848185A (en) | Image processing apparatus and method | |
US6633670B1 (en) | Mask generation for multi-layer image decomposition | |
Lin et al. | Compound image compression for real-time computer screen image transmission | |
JP4732660B2 (en) | Visual attention system | |
US8417029B2 (en) | Image processing apparatus and method, including fill-up processing | |
TWI426774B (en) | A method for classifying an uncompressed image respective to jpeg compression history, an apparatus for classifying an image respective to whether the image has undergone jpeg compression and an image classification method | |
JP2001223903A (en) | Method for compressing scanned document with color and gray scale | |
KR100937542B1 (en) | Segmented layered image system | |
KR100422709B1 (en) | Face detecting method depend on image | |
US20010000314A1 (en) | Iterative smoothing technique for pre-processing mixed raster content planes to improve the quality of a decompressed image and increase document compression ratios | |
JP2005228340A (en) | Image analysis device, image analysis method, and blob identification device | |
JP2000196895A (en) | Digital image data classifying method | |
JP2004537220A (en) | Equipment for processing digital images | |
US7065254B2 (en) | Multilayered image file | |
JP4441300B2 (en) | Image processing apparatus, image processing method, image processing program, and recording medium storing the program | |
US20040101204A1 (en) | Method of processing video into an encoded bitstream | |
JP2004199622A (en) | Apparatus and method for image processing, recording media, and program | |
KR20060007901A (en) | Apparatus and method for automatic extraction of salient object from an image | |
KR20170046136A (en) | Method for choosing a compression algorithm depending on the image type | |
JP3647071B2 (en) | Image processing apparatus and method | |
JP2004242075A (en) | Image processing apparatus and method therefor | |
JP4383187B2 (en) | Image processing apparatus, image processing program, and storage medium | |
JP2005184403A (en) | Image processing apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TERADICI CORPORATION, CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HOBBS, DAVID V.;TUCKER, KIMBERLY MARIE;REEL/FRAME:016758/0708 Effective date: 20050630 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: WF FUND IV LIMITED PARTNERSHIP, CANADA Free format text: SECURITY AGREEMENT;ASSIGNOR:TERADICI CORPORATION;REEL/FRAME:029800/0593 Effective date: 20130204 |
|
REMI | Maintenance fee reminder mailed | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
AS | Assignment |
Owner name: BEEDIE CAPITAL PARTNERS FUND I LIMITED PARTNERSHIP Free format text: SECURITY INTEREST;ASSIGNOR:TERADICI CORPORATION;REEL/FRAME:037988/0364 Effective date: 20160219 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552) Year of fee payment: 8 |
|
AS | Assignment |
Owner name: COMERICA BANK, MICHIGAN Free format text: SECURITY INTEREST;ASSIGNOR:TERADICI CORPORATION;REEL/FRAME:048454/0895 Effective date: 20190222 |
|
AS | Assignment |
Owner name: TERADICI CORPORATION, CANADA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BEEDIE CAPITAL PARTNERS FUND I LIMITED PARTNERSHIP;REEL/FRAME:048473/0846 Effective date: 20190226 |
|
AS | Assignment |
Owner name: TERADICI CORPORATION, CANADA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WF FUND IV LIMITED PARTNERSHIP;REEL/FRAME:048499/0800 Effective date: 20190228 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |