US9053392B2 - Generating a hierarchy of visual pattern classes - Google Patents
Generating a hierarchy of visual pattern classes Download PDFInfo
- Publication number
- US9053392B2 US9053392B2 US14/012,770 US201314012770A US9053392B2 US 9053392 B2 US9053392 B2 US 9053392B2 US 201314012770 A US201314012770 A US 201314012770A US 9053392 B2 US9053392 B2 US 9053392B2
- Authority
- US
- United States
- Prior art keywords
- child
- class
- visual pattern
- classes
- visual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G06K9/6267—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G06K9/6298—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
- G06V10/7625—Hierarchical techniques, i.e. dividing or merging patterns to obtain a tree-like representation; Dendograms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/95—Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/24—Character recognition characterised by the processing or recognition method
- G06V30/242—Division of the character sequences into groups prior to recognition; Selection of dictionaries
- G06V30/244—Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
- G06V30/245—Font recognition
Definitions
- the subject matter disclosed herein generally relates to the processing of data. Specifically, the present disclosure addresses systems and methods of generating a hierarchy of classes of visual patterns.
- a visual pattern may be depicted in an image.
- An example of a visual pattern is text, such as dark words against a white background or vice versa.
- text may be rendered in a particular typeface or font (e.g., Times New Roman or Helvetica) and in a particular style (e.g., regular, semi-bold, bold, black, italic, or any suitable combination thereof).
- Another example of a visual pattern that may be depicted in an image is an object, such as a car, a building, or a flower.
- a further example of a visual pattern is a face (e.g., a face of a human or animal). A face depicted in an image may be recognizable as a particular individual.
- the face within an image may have a particular facial expression, indicate a particular gender, indicate a particular age, or any suitable combination thereof.
- a visual pattern is a scene (e.g., a landscape or a sunset).
- a visual pattern may exhibit coarse-grained features (e.g., an overall shape of alphabetic letter rendered in a font), fine-grained features (e.g., a detailed shape of an ending of the letter that is rendered in the font), or any suitable combination thereof.
- FIG. 1 is a network diagram illustrating a network environment suitable for visual pattern classification and recognition, according to some example embodiments.
- FIG. 2 is a block diagram illustrating components of a hierarchy machine suitable for generating a hierarchy of visual pattern classes, according to some example embodiments.
- FIG. 3-6 are conceptual diagrams illustrating a hierarchy of visual pattern classes, according to some example embodiments.
- FIG. 7-9 are flowcharts illustrating operations of the hierarchy machine in performing a method of generating the hierarchy of visual pattern classes, according to some example embodiments.
- FIG. 10 is a conceptual diagram that illustrates generation and encoding of local feature vectors from pixel blocks of an image, according to some example embodiments.
- FIG. 11 is a conceptual diagram that illustrates generation of a first array of ordered pairs for the image, according to some example embodiments.
- FIG. 12 is a conceptual diagram that illustrates generation of a second array of ordered pairs for the image, according to some example embodiments.
- FIG. 13-15 are flowcharts illustrating operations of the hierarchy machine in performing a method of processing the image, according to some example embodiments.
- FIG. 16 is a block diagram illustrating components of a machine, according to some example embodiments, able to read instructions from a machine-readable medium and perform any one or more of the methodologies discussed herein.
- Example methods and systems are directed to generating a hierarchy of classes that classify visual patterns (e.g., generating a tree of classifications, categories, or clusters of visual patterns, for subsequent visual pattern recognition in an image, such as, classification, categorization, or identification of a visual pattern within an image). Examples merely typify possible variations. Unless explicitly stated otherwise, components and functions are optional and may be combined or subdivided, and operations may vary in sequence or be combined or subdivided. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of example embodiments. It will be evident to one skilled in the art, however, that the present subject matter may be practiced without these specific details.
- a class of visual patterns may include a class of fonts (e.g., a classification, category, or group of typefaces or fonts used for rendering text in images).
- a class of fonts e.g., a classification, category, or group of typefaces or fonts used for rendering text in images.
- an individual font may be treated as an individual visual pattern (e.g., encompassing multiple images of letters and numerals rendered in the single font), while groups (e.g., families or categories) of related fonts may be treated as larger classes of visual patterns (e.g., regular, bold, italic, and italic-bold versions of the same font).
- visual patterns may be supported, such as face types (e.g., classified by expression, gender, age, or any suitable combination thereof), objects (e.g., arranged into a hierarchy of object types or categories), and scenes (e.g., organized into a hierarchy of scene types or categories).
- face types e.g., classified by expression, gender, age, or any suitable combination thereof
- objects e.g., arranged into a hierarchy of object types or categories
- scenes e.g., organized into a hierarchy of scene types or categories.
- a system may be or include a machine (e.g., an image processing machine) that analyzes images of visual patterns (e.g., analyzes visual patterns depicted in images). To do this, the machine may generate a representation of various features of an image. Such representations of images may be or include mathematical representations (e.g., feature vectors) that the system can analyze, compare, or otherwise process, to classify, categorize, or identify visual patterns depicted in the represented images. In some situations, the system may be or include a hierarchy machine configured to use one or more machine-learning techniques to train a classifier (e.g., classifier module) for visual patterns.
- a classifier e.g., classifier module
- the hierarchy machine may use the classifier to classify one or more reference images (e.g., test images) whose depicted visual patterns are known (e.g., predetermined), and then modify or update the classifier (e.g., by applying one or more weight vectors, which may be stored as templates of the classifier) to improve its performance (e.g., speed, accuracy, or both).
- reference images e.g., test images
- weight vectors which may be stored as templates of the classifier
- the system may utilize an image feature representation called local feature embedding (LFE).
- LFE enables generation of a feature vector that captures salient visual properties of an image to address both the fine-grained aspects and the coarse-grained aspects of recognizing a visual pattern depicted in the image.
- the system may implement a nearest class mean (NCM) classifier, as well as a scalable recognition algorithm with metric learning and max margin template selection. Accordingly, the system may be updated to accommodate new classes with very little added computational cost. This may have the effect of enabling the system to readily handle open-ended image classification problems.
- NCM nearest class mean
- the hierarchy machine may be configured as a clustering machine that utilizes LFE to organize (e.g., cluster) visual patterns into nodes (e.g., clusters) that each represent one or more visual patterns (e.g., by clustering visual patterns into groups that are similar to each other).
- nodes may be arranged as a hierarchy (e.g., a tree of nodes, or a tree of clusters) in which a node may have a parent-child relationship with another node.
- a root node may represent all classes of visual patterns supported by the system, and nodes that are children of the root node may represent subclasses of the visual patterns.
- a node that represents a subclass of visual patterns may have child nodes of its own, where these child nodes each represent a sub-subclass of visual patterns.
- a node that represents only a single visual pattern cannot be subdivided further and is therefore a leaf node in the hierarchy (e.g., tree).
- the hierarchy machine may implement a node-splitting and tree-learning algorithm that includes (1) hard-splitting of nodes and (2) soft-assignment of nodes to perform error-bounded splitting of nodes into clusters. This may enable the overall system to perform large-scale visual pattern recognition (e.g., font recognition) based on a learned error-bounded tree of visual patterns (e.g., fonts or font classes).
- large-scale visual pattern recognition e.g., font recognition
- a learned error-bounded tree of visual patterns e.g., fonts or font classes.
- fonts may share many features with each other.
- a group of fonts may belong to the same family of typefaces, in which each member of the family differs from the others by only small variations (e.g., aspect ratio of characters, stroke width, or ending slope).
- classifying or identifying these fonts is different from classifying fonts that share very few features (e.g., fonts from different or divergent families).
- the system e.g., the clustering machine
- the system may be configured to cluster the fonts, so that fonts within each cluster are similar to each other but vary dramatically from fonts in other clusters.
- Each cluster of fonts may then have a specific classifier (e.g., an image classifier module) trained for that cluster of fonts, and the system may be configured to train and use multiple classifiers for multiple clusters of fonts.
- a specific classifier e.g., an image classifier module
- the system may perform visual font recognition with increased speed compared to existing algorithms.
- the system may be readily scalable to large scale problems in visual font recognition.
- the system may be configured to perform a two-stage procedure that includes (1) hard-splitting of nodes (e.g., representing font classes or individual fonts) and (2) soft-assignment of nodes to obtain an error-bounded tree in which nodes are allocated into hierarchical clusters.
- nodes e.g., representing font classes or individual fonts
- soft-assignment of nodes to obtain an error-bounded tree in which nodes are allocated into hierarchical clusters.
- FIG. 1 is a network diagram illustrating a network environment 100 , according to some example embodiments.
- the network environment 100 includes a hierarchy machine 110 , a database 115 , and devices 130 and 150 , all communicatively coupled to each other via a network 190 .
- the hierarchy machine 110 , the database, 115 , and the devices 130 and 150 may each be implemented in a computer system, in whole or in part, as described below with respect to FIG. 16 .
- the hierarchy machine 110 may be configured (e.g., by one or more software modules, as described below with respect to FIG. 2 ) to perform one or more of any of the methodologies discussed herein, in whole or in part. Such methodologies include hierarchy generation algorithms (e.g., as discussed below with respect to FIG. 3-9 ). Such methodologies may further include image processing algorithms (e.g., visual pattern recognition algorithms) that may be used by the hierarchy machine 110 to train an image classifier, use an image classifier to classify (e.g., recognize, categorize, or identify) an image, or both.
- the database 115 may store one or more images before, during, or after image processing by the hierarchy machine 110 .
- the database 115 may store a reference set (e.g., trainer set) of images (e.g., a training database of images for training an image classifier), a set of unclassified images (e.g., a test database of test images, or a production database of captured images) to be processed by the hierarchy machine 110 , or any suitable combination thereof.
- the hierarchy machine 110 with or without the database 115 , may form all or part of a network-based system 105 .
- the network-based system 105 may be or include a cloud-based image processing system (e.g., visual pattern recognition system) that provides one or more network-based image processing services (e.g., a visual pattern recognition service).
- a cloud-based image processing system e.g., visual pattern recognition system
- a visual pattern recognition service e.g., a visual pattern recognition service
- users 132 and 152 are also shown in FIG. 1 .
- One or both of the users 132 and 152 may be a human user (e.g., a human being), a machine user (e.g., a computer configured by a software program to interact with the device 130 ), or any suitable combination thereof (e.g., a human assisted by a machine or a machine supervised by a human).
- the user 132 is not part of the network environment 100 , but is associated with the device 130 and may be a user of the device 130 .
- the device 130 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, or a smart phone belonging to the user 132 .
- the user 152 is not part of the network environment 100 , but is associated with the device 150 .
- the device 150 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, or a smart phone belonging to the user 152 .
- any of the machines, databases, or devices shown in FIG. 1 may be implemented in a general-purpose computer modified (e.g., configured or programmed) by software to be a special-purpose computer to perform one or more of the functions described herein for that machine, database, or device.
- a computer system able to implement any one or more of the methodologies described herein is discussed below with respect to FIG. 16 .
- a “database” is a data storage resource and may store data structured as a text file, a table, a spreadsheet, a relational database (e.g., an object-relational database), a triple store, a hierarchical data store, or any suitable combination thereof.
- any two or more of the machines, databases, or devices illustrated in FIG. 1 may be combined into a single machine, and the functions described herein for any single machine, database, or device may be subdivided among multiple machines, databases, or devices.
- the network 190 may be any network that enables communication between or among machines, databases, and devices (e.g., the hierarchy machine 110 and the device 130 ). Accordingly, the network 190 may be a wired network, a wireless network (e.g., a mobile or cellular network), or any suitable combination thereof. The network 190 may include one or more portions that constitute a private network, a public network (e.g., the Internet), or any suitable combination thereof.
- FIG. 2 is a block diagram illustrating components of the hierarchy machine 110 , according to some example embodiments.
- the hierarchy machine 110 may be a cloud-based server machine (e.g., a hierarchy generation machine for classes of visual patterns, a visual recognition server machine, or any suitable combination thereof) and is shown as including an assignment module 260 (e.g., a node soft-assignment module) and a hierarchy module 270 (e.g., a tree generation module), which may be configured to communicate with each other (e.g., via a bus, shared memory, or a switch).
- an assignment module 260 e.g., a node soft-assignment module
- a hierarchy module 270 e.g., a tree generation module
- the assignment module 260 may be configured to begin with mutually exclusive child classes that have been split from a parent class, and then remove mutual exclusivity from two or more child classes by adding a visual pattern (e.g., a font, font family, or a category of fonts) to one or more of the child classes, such that multiple child classes each include the visual pattern.
- the hierarchy module 270 may be configured to generate a hierarchy of classes of visual patterns (e.g., visual pattern classes), based on the output of the assignment module 260 . For example, the hierarchy module 270 may generate the hierarchy such that the hierarchy includes the parent class and the mutually nonexclusive child classes (e.g., the child classes from which mutual exclusivity was removed) that each includes the visual pattern or subclass of visual patterns. Further details of the assignment module 260 and the hierarchy module 270 are discussed below with respect to FIG. 7-9 .
- the hierarchy machine 110 may also include an image access module 210 , a feature vector module 220 , and a vector storage module 230 , which may all be configured to communicate with any one or more other modules of the hierarchy machine 110 (e.g., via a bus, shared memory, or a switch). As shown, the hierarchy machine 110 may further include an image classifier module 240 , a classifier trainer module 250 , or both.
- the image classifier module 240 may be or include a font classifier (e.g., typeface classifier), a font identifier (e.g., typeface identifier), a face classifier (e.g., facial expression classifier, facial gender classifier, or both), face identifier (e.g., face recognizer), or any suitable combination thereof.
- the classifier trainer module 250 may be or include a font recognition trainer (e.g., typeface recognition trainer), a face recognition trainer, or any suitable combination thereof. As shown in FIG. 2 , the image classifier module 240 and the classifier trainer module 250 may be configured to communicate with each other, as well as with the image access module 210 , the feature vector module 220 , and a vector storage module 230 .
- the image classifier module 240 , the classifier trainer module 250 , or both, may form all or part of a node division module 255 (e.g., a module configured to perform hard-splitting of nodes).
- any one or more of the modules described herein may be implemented using hardware (e.g., a processor of a machine) or a combination of hardware and software.
- any module described herein may configure a processor to perform the operations described herein for that module.
- any two or more of these modules may be combined into a single module, and the functions described herein for a single module may be subdivided among multiple modules.
- modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.
- FIG. 3-6 are conceptual diagrams illustrating a hierarchy of visual pattern classes, according to some example embodiments.
- the hierarchy may be treated as a tree of nodes (e.g., a node tree).
- FIG. 3 illustrates examples of structural elements (e.g., nodes) of the hierarchy
- FIG. 4-6 illustrate an example of hard-splitting a parent node (e.g., parent class) into mutually exclusive child classes (e.g., child nodes) and subsequent soft-assignment of a visual pattern (e.g., a font) into child classes from which their initial mutual exclusivity has been removed.
- a parent node e.g., parent class
- mutually exclusive child classes e.g., child nodes
- soft-assignment of a visual pattern e.g., a font
- the hierarchy includes a node 300 that may be a root node representing all classes of visual patterns that are supported by (e.g., represented within) the hierarchy. These classes may be subdivided into multiple subclasses and sub-subclasses, which may be represented by additional nodes of the hierarchy. As shown in FIG. 3 , the classes represented by node 300 may be subdivided among two nodes 310 and 320 , with the node 310 representing a portion of the classes represented by the node 300 , and the node 320 representing another portion of classes that are represented by the node 300 . In general, the nodes 310 and 320 may be mutually exclusive and have nothing in common (e.g., no classes or visual patterns in common). Alternatively, the nodes 310 and 320 may be mutually nonexclusive and include at least one class or visual pattern in common. The node 300 may be considered as a parent of the node 310 and 320 , which may be considered children of the node 300 .
- the classes represented by the node 310 may be subdivided among multiple nodes 311 , 315 , and 319 , with each of the nodes 311 , 315 , and 319 strictly or approximately representing a different portion of the classes that are represented by the node 310 .
- the nodes 311 , 315 , and 319 may be mutually exclusive and have nothing in common.
- two or more of the nodes 311 , 315 , and 319 may lack mutual exclusivity and include at least one class or visual pattern in common.
- the node 310 may be considered as a parent of the nodes 311 , 315 , 319 , which may be considered children of the node 310 .
- the node 320 may also have child nodes.
- the classes represented by the node 311 may be subdivided among multiple nodes 312 and 313 , with each of the nodes 312 and 313 strictly or approximately representing a different portion of the classes that are represented by the node 311 .
- the nodes 312 and 313 may be mutually exclusive (e.g., having no classes or visual patterns in common) or may be mutually non-exclusive (e.g., both including at least one class or visual pattern shared in common).
- the node 311 may be considered as a parent of the nodes 312 and 313 , which may be considered as children of the node 311 .
- one or more of the nodes 315 and 319 may have their own child nodes.
- the nodes 312 and 313 may be considered as grandchild nodes of the node 310 .
- the node 312 may have its own child nodes (e.g., great-grandchild nodes of the node 310 ).
- the node 313 may represent only a single visual pattern (e.g., single font) or a single class of visual patterns (e.g., a single font family). Accordingly, the node 313 may be considered as a leaf node (e.g., in contrast with the root node 300 ) of the hierarchy (e.g., the node tree).
- FIG. 4 illustrates an example of hard-splitting the node 310 into mutually exclusive nodes 311 and 315 .
- the node 300 e.g., root node
- this top-level class may encompass multiple visual patterns in the example form of fonts (e.g., Font 1, Fonts 2, Font 3, Font 4, Font 5, Font 6, Font 7, Font 8, and Font 9).
- Font 1, Fonts 2, Font 3, Font 4, Font 5, Font 6, Font 7, Font 8, and Font 9 may form all or part of a reference set of visual patterns (e.g., a test set of visual patterns with none classifications) that are usable to train one or more classifiers.
- Fonts 1-5 have been classified (e.g., by a classifier module, such as the image classifier module 240 ) into the node 310 .
- a classifier e.g., a classifier that is specific to the node 310
- the classifier may define a 55% chance of classifying Font 3 into the node 311 and a 45% chance of classifying Font 3 into the node 315 .
- Such probabilities may be stored in a weight vector for the node 310 , and this weight vector may be used by (e.g., incorporated into) the classifier for the node 310 . Accordingly, Font 3 is shown as being classified exclusively into the node 311 , with no representation whatsoever in the node 315 .
- Font 3 may be misclassified into the node 311 , instead of the node 315 .
- the actual proper classification for Font 3 is known (e.g., predetermined) to be the node 315
- Font 3 has been misclassified by this initial hard-splitting operation, and the classifier may be improved (e.g., modified or adjusted) such that it is more likely to classify Font 3 into the node 315 .
- the classifier may be improved (e.g., modified or adjusted) such that it is more likely to classify Font 3 into the node 315 .
- the classifier may define a 39% chance of putting Font 3 in the node 311 and a 61% chance of putting Font 3 in the node 315 .
- Such probabilities may be stored in a modified weight vector for the node 310 . In this manner, the classifier can be iteratively improved to produce more accurate subdivisions for visual patterns of known classification (e.g., the reference set of visual patterns).
- this combination of hard-splitting and soft-assignment may produce an error-bounded hierarchy (e.g., tree) of nodes.
- This error-bounded hierarchy may be used to facilitate visual pattern recognition, for example, by omitting unrelated classifiers and executing only those classifiers with at least a threshold probability of actually classifying a candidate visual pattern (e.g., a font of unknown classification or identity).
- a threshold probability of actually classifying a candidate visual pattern e.g., a font of unknown classification or identity.
- This benefit can be seen by reference to FIG. 6 .
- recognition of Font 1 would involve three or four executions of classifiers (e.g., one to subdivide the node 300 , one to subdivide the node 310 , and one or two to isolate Font 1 from Fonts 2 and 3 in the node 311 ).
- FIG. 7-9 are flowcharts illustrating operations of the hierarchy machine 110 in performing a method 700 of generating the hierarchy of visual pattern classes, according to some example embodiments. Operations in the method 700 may be performed using modules described above with respect to FIG. 2 . As shown in FIG. 7 , the method 700 may include one or more of operations 710 , 720 , 730 , 740 , and 750 .
- the image classifier module 240 classifies a reference set of visual patterns (e.g., a test set of fonts, such as Fonts 1-9 illustrated in FIG. 4-6 , which fonts may be stored in the database 115 ) that belong to a parent class (e.g., node 310 ).
- the image classifier module 240 may classify this reference set into mutually exclusive child classes (e.g., nodes 311 and 315 , as shown in FIG. 4 ).
- mutually exclusive child classes may include a first child class (e.g., node 311 ) and a second child class (e.g., node 315 ).
- the mutually exclusive child classes include a third child class (e.g., node 319 ).
- a visual pattern from the reference set e.g., Font 3
- the first child class e.g., node 311
- the second child class e.g., node 315
- This may have the effect of hard-splitting the parent class (e.g., node 310 ).
- the classifier trainer module 250 modifies a weight vector that corresponds to the parent class (e.g., node 310 ).
- the modification of this weight vector may be in response to testing the accuracy of the hard-splitting performed in operation 710 and detection of any errors in classification.
- operation 720 may be performed in response to the visual pattern being misclassified into the first child class (e.g., node 311 ) instead of the second child class (e.g., node 315 ).
- the modified weight vector may alter a first probability that the visual pattern belongs to the first child class (e.g., from 55% to 39%), and alter a second probability that the visual pattern belongs to the second child class (e.g., from 45% to 61%).
- the assignment module 260 based on the altered probabilities, removes mutual exclusivity from the first and second child classes (e.g., nodes 311 and 315 ). For example, mutual exclusivity may be removed by adding the visual pattern to the second child class (e.g., node 315 ), so that both the first and second child classes include the visual pattern (e.g., a test font) and share it in common.
- operations similar to operations 710 - 730 may be performed for any one or more additional classes to be included in the hierarchy.
- the first child class (e.g., node 311 ) may be subdivided into multiple grandchild classes (e.g., nodes 312 and 313 ) in a manner similar to the hard-splitting and soft-assignment described above for the parent class (e.g., node 310 ).
- a similar operation may classify this portion of the reference set into such grandchild classes (e.g., nodes 312 and 313 ).
- the hierarchy module 270 generates a hierarchy of classes of visual patterns (e.g., an error-bounded tree of nodes that each represent the classes of visual patterns).
- the hierarchy module 270 may include the parent class (e.g., node 310 ) and the now mutually nonexclusive first and second child classes (e.g., nodes 311 and 315 ) that now each include the visual pattern.
- the image classifier module 240 uses the generated hierarchy of classes to classify a candidate visual pattern (e.g., a font of unknown class or identity) by processing one or more images of the candidate visual pattern (e.g., an image of text rendered in the font). For example, the image classifier module 240 may traverse the hierarchy of classes, which may have the effect of omitting unrelated classifiers and executing only those classifiers with at least a minimum threshold probability of properly classifying a candidate visual pattern.
- a candidate visual pattern e.g., a font of unknown class or identity
- the image classifier module 240 may traverse the hierarchy of classes, which may have the effect of omitting unrelated classifiers and executing only those classifiers with at least a minimum threshold probability of properly classifying a candidate visual pattern.
- the method 700 may include one or more of operations 810 , 815 , 819 , and 820 .
- One or more of operations 810 , 815 , and 819 may be performed as part (e.g., a precursor task, a subroutine, or a portion) of operation 710 , in which the image classifier module 240 classifies the reference set of visual patterns.
- the image classifier module 240 computes an affinity matrix that quantifies similarity between or among the visual patterns. For example, the computed affinity matrix may quantify degrees to which the visual patterns are similar to each other.
- the affinity matrix may be specific to the parent class (e.g., node 310 ) undergoing the hard-splitting discussed above with respect to operation 710 . That is, each class (e.g., parent class) undergoing hard-splitting may have its own affinity matrix.
- the image classifier module 240 may be or include a classifier that is assigned specifically to the parent class (e.g., node 310 ), and this classifier may include the weight vector (e.g., modified or unmodified) discussed above with respect to operation 720 .
- the image classifier module 240 increases sparseness of the affinity matrix calculated in operation 810 (e.g., makes the affinity matrix more sparse than initially calculated). In some example embodiments, this may be done by zeroing values of the affinity matrix that are below a minimum threshold value. In certain example embodiments, this may be done by zeroing values that fall outside the largest N values of the affinity matrix (e.g., values that lie outside the top 10 values or top 20 values). In some example embodiments, the values in the affinity matrix are representations of the vector distances between visual patterns. Hence, in some example embodiments, operation 815 may be performed by setting one or more of such representations to zero based on those representations falling below a minimum threshold value. Similarly, in certain example embodiments, operation 815 may be performed by setting one or more of such representations to zero based on those representations falling outside the top N largest representations.
- the image classifier module 240 groups the visual patterns into the mutually exclusive child classes (e.g., nodes 311 and 315 ) discussed above with respect to operation 710 .
- this grouping may be performed by applying spectral clustering to the affinity matrix computed in operation 810 .
- the increased sparseness from operation 815 may have the effect of reducing the number of computations involved, thus facilitating efficient performance of operation 819 .
- one or more of operations 811 , 812 , 813 , 814 may be performed as part of operation 810 , in which the affinity matrix is computed.
- the image classifier module 240 calculates feature vectors of images of the visual patterns in the reference set. These images may be accessed from the database 115 . For example, the image classifier module 240 may access an image that depicts a particular visual pattern (e.g., Font 3, as discussed above with respect to FIG. 4-6 ), and the image classifier module 240 may calculate a feature vector of this image. An example of a feature vector being calculated is discussed below with respect to FIG. 10-15 . As discussed below with respect to FIG. 10-15 , the calculating of a feature vector may be performed using LFE, such that the resulting feature vector has one or more locally embedded features.
- performance of operation 811 may further calculate mean feature vectors that each represent groups of images depicting the visual patterns in the reference set. For example, there may be nine fonts (e.g., Fonts 1-9, as discussed above with respect to FIG. 4-6 ), and each font may be depicted in 100 images of various numerals, letters, words or phrases rendered in that font. In such a case, performance of operation 811 may include calculating nine mean feature vectors, where each mean feature vector represents one of the nine fonts.
- the image classifier module 240 calculates vector distances between or among two or more of the feature vectors calculated in operation 811 .
- vector distances e.g., Mahalanobis distances
- such vector distances may be calculated among the nine mean feature vectors that respectively represent the nine fonts (e.g., Fonts 1-9, as discussed above with respect to FIG. 4-6 ). This may have the effect of calculating vector distances between the visual patterns in the reference set (e.g., with the visual patterns being represented by their respective mean feature vectors).
- the image classifier module 240 calculates representations of the vector distances for inclusion in the affinity matrix.
- the vector distances may be normalized to values between zero and one (e.g., to obtain relative indicators of similarity between the visual patterns).
- the vector distances may be normalized by calculating a ratio of each vector distance to the median value of the vector distances.
- normalization of the vector distances may be performed by calculating a ratio of each vector distance to the median value of the vector distances.
- an exponential transform may be taken of the negative of these normalized values (e.g., such that the normalized values are negative exponentially transformed).
- such representations of the vector distances may be prepared for inclusion in the affinity matrix and subsequent spectral clustering.
- the image classifier module 240 includes the representations of the vector distances into the affinity matrix. As noted above, these representations may be normalized, negative exponentially transformed, or both.
- the image classifier module 240 checks its accuracy against the known (e.g., predetermined) classifications of the reference set of visual patterns. This may involve detecting one or more misclassifications and calculating a percentage of misclassifications (e.g., as an error rate from classifying the reference set in operation 710 ). Continuing the above example, if Font 3 is the only misclassified font among the nine fonts (e.g., Fonts 1-9), the detected misclassification percentage would be 11%. Based on this calculated percentage, the method 700 may flow on to operation 720 , as described above with respect to FIG. 7 . That is, operation 720 may be performed in response to the percentage calculated in operation 820 .
- the method 700 may iterate back to operation 710 , in which the image classifier module 240 performs the classification of the reference set of visual patterns, this time with the modified weight vector.
- the method 700 may include iterating operations 710 , 820 , and 720 until the misclassification percentage falls below a threshold value (e.g., a maximum allowable error percentage for misclassifications). Accordingly, the initial performance of operation 710 may be described as being performed with the unmodified weight vector, while subsequent performances of operation 710 are performed with the modified weight vector (e.g., modified at least once by performance of operation 720 ).
- one or more of operations 932 , 934 , 936 , and 938 may be performed as part of operation 730 , in which the assignment module 260 removes mutual exclusivity from the first and second child classes (e.g., nodes 311 and 315 ) and performs the soft-assignment functions discussed above with respect to operation 730 .
- the assignment module 260 compares probabilities that the visual pattern (e.g., the test font) belongs to one or more of the child classes (e.g., node 311 , 315 , or 319 ) subdivided from the parent class (e.g., node 310 ) and ranks the probabilities (e.g., orders the probabilities by their values).
- the assignment module 260 includes the visual pattern (e.g., the test font) in multiple child classes based on the probabilities ranked in operation 932 (e.g., allocates the visual pattern into the multiple child classes based on at least one of the probabilities). For example, supposing that there is a 39% first probability of the visual pattern belonging to the first child class (e.g., node 311 ), a 61% second probability of the visual pattern belonging to the second child class (e.g., no 315), and a 3% third probability that the visual pattern belongs to a third child class (e.g., node 319 ), the assignment module 260 may apply a rule that only the top two probabilities will be considered.
- the first child class e.g., node 311
- a 61% second probability of the visual pattern belonging to the second child class e.g., no 315
- a third probability that the visual pattern belongs to a third child class e.g., node 319
- the visual pattern may be included into the nodes 311 and 315 , but not the node 319 , based on the first and second probabilities being the top two probabilities and the third probability falling outside this subset.
- operation 730 may be performed based on the first and second probabilities being among a predetermined subset of largest probabilities, based on the third probability falling outside of the predetermined subset of largest probabilities, or based on any suitable combination thereof.
- operations 936 and 938 are used instead of operations 932 and 934 .
- the assignment module 260 compares the probabilities discussed above with respect to operations 932 and 934 to a threshold minimum value (e.g., 10%).
- the assignment module 260 includes the visual pattern (e.g., the test font) in multiple child classes based on these probabilities in comparison to the minimum threshold value (e.g., allocates the visual pattern into the multiple child classes based on a comparison of at least one of the probabilities to the minimum threshold value).
- the assignment module 260 may apply a rule that only the probabilities above the minimum threshold value (e.g., 10%) will be considered. Accordingly, the visual pattern may be included into the nodes 311 and 315 , but not the node 319 , based on the first and second probabilities exceeding the minimum threshold value and the third probability failing to exceed this minimum threshold value.
- operation 730 may be performed based on the first and second probabilities exceeding the minimum threshold value, based on the third probability falling below the predetermined minimum threshold value, or based on any suitable combination thereof.
- the two-stage procedure performed by the hierarchy machine 110 may include (1) hard-splitting of nodes (e.g., representing font classes or individual fonts) and (2) soft-assignment of nodes to obtain an error-bounded tree in which nodes are allocated into hierarchical clusters.
- nodes e.g., representing font classes or individual fonts
- soft-assignment of nodes to obtain an error-bounded tree in which nodes are allocated into hierarchical clusters.
- N font classes total in a current node i.
- the task is to assign these N fonts into C child nodes.
- each font class is assigned into exactly one child node. That is, the child nodes contain no duplicate font classes.
- ⁇ k c 1 Z c ⁇ ⁇ i ⁇ I c ⁇ ⁇ z k i ⁇ x e k i , ( 1 )
- the distance between each pair of fonts may be defined as:
- a sparse affinity matrix (e.g., an affinity matrix having increased sparseness) may be obtained next.
- the affinity matrix A may be symmetric, and its diagonal elements may all be zero. According to various example embodiments, the meaning of matrix A is: the higher value of A ij , the more similar are the corresponding two fonts c i and cj.
- the hierarchy machine 110 could use one or more classic clustering algorithms to cluster these fonts.
- the hierarchy machine 110 is configured to use spectral clustering to cluster the fonts. Supposing that these N fonts are to be clustered into K clusters, the steps for spectral clustering are:
- clustering on a full affinity matrix A may be non-stable and thus poorly performed. Moreover, clustering may be quite sensitive to parameter ⁇ . Without a carefully-tuned ⁇ , the clustering may be unsuccessful. Consequently, a bad clustering operation may cause a font classification algorithm (e.g., an LFE-based algorithm) to fail.
- the hierarchy machine 110 may be configured to perform operations that return stable and appropriate clustering results. For example, such operations may include the following:
- the affinity matrix A is a sparse matrix.
- the sparse affinity matrix works well compared to a self-tuning spectral clustering algorithm (e.g., much better and more stable). Moreover, there are no sensitive parameters, and parameter tuning may thus be avoided. This feature may be important for tree construction. Note that the above step 1 uses the median, not the mean, since from a statistical viewpoint, the median may be more stable than the mean.
- Discriminative classification clustering may be implemented by the hierarchy machine 110 .
- the hierarchy machine 110 may factor in the importance weight w k when computing the font distance d(c 1 ,c 2 ) in Equation 2.
- the hierarchy machine 110 may treat each cluster as a new class and train the LFE-based classifier to classify these classes and get the weights w k . Having obtained w k , the hierarchy machine 110 may re-compute the distances between the font classes. Then the hierarchy machine 110 may obtain a new sparse affinity matrix and perform clustering again. This procedure may be repeated to get better clustering results.
- the algorithm steps may be expressed as the following operations:
- this discriminative classification clustering works well and iteratively improves classification performance (e.g., of an LFE-based classifier). Convergence may occur within 4 or 5 iterations.
- the hierarchy machine 110 may perform soft-assignment of nodes to obtain an error-bounded tree in which nodes are allocated into hierarchical clusters. After hard-splitting, each font is assigned to one class (e.g., each font or font class in the node i only belongs to one child node). However, errors may propagate during tree growth.
- the hierarchy machine 110 has assigned the fonts in a parent node into child nodes, and thus the hierarchy machine 110 may train an LFE-based classifier f i to classify a test font (e.g., font of known classification or identity) by determining to which child node it belongs.
- the hierarchy machine 110 may implement a method to perform soft-assignment of nodes, which may also be called error-bounded node splitting. After performing the hard-splitting method introduced above to get an initial splitting, and after training a classifier (e.g., an LFE-based classifier module) for a given node i, the hierarchy machine 110 may assign one or more visual patterns into multiple child nodes, based on the classification accuracy of each font class. To illustrate, imagine that a font class j is supposed to belong to a child node c i . However, tests may indicate that a test font that represents font class j could fall into more child nodes ⁇ c l , c l+1 , c l+2 , .
- a method to perform soft-assignment of nodes which may also be called error-bounded node splitting.
- the hierarchy machine 110 may ensure that the classification accuracy of each font in this node i is at least ⁇ i .
- the hierarchy machine 110 may bound the error rate of each node to less than 1 ⁇ i .
- the time used by the hierarchy machine 110 for font class soft-assignment may depend on the average number of child nodes into which each font class is softly assigned. In general, if a font class is assigned into too many child nodes, the computation complexity is increased, potentially to impractical levels.
- the hierarchy machine 110 may be configured to perform soft-assignment of font classes into an average assignment ratio of 2.2 to 3.5 nodes, which may only slightly burden the computation.
- the hard-splitting of nodes and the soft-assignment of nodes may result in error-bounded splitting of nodes into clusters, which may also be called error-bounded tree construction.
- N font classes total
- the root node of the tree has C child nodes.
- the above-described hard-splitting technique may be used by the system to assign the N fonts into C child nodes.
- the hierarchy machine 110 may use the above-described soft-assignment technique to reassign the N fonts into C child nodes with certain error bounds, denoting the average assignment ratio for each font as R.
- the hierarchy machine 110 may continue to split it by dividing its N i font classes into C i children. Following the same procedure, the hierarchy machine 110 may build up a hierarchical error-bounded tree of nodes. In some example embodiments, the hierarchy machine 110 builds a 2-layer tree in which the first layer contains the C child nodes of the root node, and in which each child node has a certain number of fonts. In such example embodiments, the second layer may contain leaf nodes such that each node in the second layer only contains one font class.
- FIG. 10 is a conceptual diagram that illustrates generation and encoding of local feature vectors (e.g., local feature vectors 1021 and 1023 ) from pixel blocks (e.g., pixel blocks 1011 and 1013 ) of an image 1010 , according to some example embodiments.
- the image 1010 e.g., a digital picture or photo
- the image 1010 may be stored in the database 115 and accessed by the image access module 210 of the hierarchy machine 110 .
- the image 1010 may be divided (e.g., by the feature vector module 220 of the hierarchy machine 110 ) into blocks of pixels (e.g., pixel blocks 1011 , 1012 , 1013 , 1014 , and 1015 ).
- the pixel blocks overlap each other. That is, neighboring (e.g., adjacent) pixel blocks may overlap by one or more pixels (e.g., 10 pixels).
- the pixel block 1011 may be a first pixel block (e.g., having an index of 1 or indexed as 1) of the image 1010
- the pixel block 1013 may be an i-th pixel block (e.g., having an index of i) of the image 1010 .
- FIG. 10 illustrates the pixel block 1013 (e.g., the i-th pixel block) undergoing a mathematical transformation to generate a corresponding local feature vector 1023 (e.g., an i-th local feature vector, labeled “x i ”).
- This mathematical transformation may be performed by the feature vector module 220 .
- the pixel block 1011 e.g., the first pixel block
- the pixel block 1011 may be mathematically transformed to generate its corresponding local feature vector 1021 (e.g., a first local feature vector, labeled “x i ”).
- This process may be repeated for all pixel blocks in the image 1010 (e.g., pixel blocks 1012 , 1014 , and 1015 , as well as other pixel blocks in the image 1010 ).
- these generated local feature vectors e.g., local feature vectors 1021 and 1023
- the first set 1020 of vectors may each have a same number of dimensions, which may be called a first number of dimensions.
- the first set 1020 of vectors may each have 10 dimensions as a result of the mathematical transformation being applied to each of the pixel blocks (e.g., pixel blocks 1011 - 1015 ) of the image 1010 .
- FIG. 10 further illustrates the first set 1020 of vectors being encoded (e.g., by the feature vector module 220 ) to generate a second set 1030 of vectors (e.g., encoded local feature vectors) for the image 1010 .
- the second set 1030 of vectors includes an encoded local feature vector 1033 (e.g., an i-th encoded local feature vector, labeled “y i ”) that corresponds to the local feature vector 1023 (e.g., the i-th local feature vector). That is, the encoded local feature vector 1033 may be an encoded representation of the local feature vector 1023 .
- the second set 1030 of vectors includes encoded local feature vector 1031 (e.g., a first encoded local feature vector, labeled “y 1 ”) that corresponds to the local feature vector 1021 (e.g., the first local feature vector), and the encoded local feature vector 1031 may be an encoded representation of the local feature vector 1021 .
- encoded local feature vector 1031 e.g., a first encoded local feature vector, labeled “y 1 ”
- the encoded local feature vector 1031 may be an encoded representation of the local feature vector 1021 .
- the second set 1030 of vectors may each have a same number of dimensions, which may be distinct from the first number of dimensions for the first set 1020 of vectors, and which may be called a second number of dimensions.
- the second set 1030 of vectors may each have six dimensions as a result of the encoding process being applied to each local feature vector of the first set 1020 of vectors for the image 1010 .
- FIG. 11 is a conceptual diagram that illustrates generation of a first array 1150 of ordered pairs (e.g., ordered pair 1179 ) for the image 1010 , according to some example embodiments.
- This generation of the first array 1150 may be performed by the feature vector module 220 of the hierarchy machine 110 .
- the second set 1030 of encoded local feature vectors e.g., encoded local feature vectors 1031 and 1033 , as discussed above with respect to FIG. 10
- the second set 1030 of encoded local feature vectors may be arranged as an array 1130 of encoded local feature vectors (e.g., encoded local feature vectors 1031 and 1033 ) for the image 1010 .
- each of the encoded local feature vectors (e.g., encoded local feature vectors 1031 and 1033 ) in the second set 1030 of vectors may have the same number (e.g., second number) of dimensions.
- the feature vector module 220 may compare values (e.g., values 1131 and 1133 ) of a particular dimension 1140 for each of these encoded local feature vectors.
- This concept is illustrated in FIG. 11 by depicting the encoded feature vectors side-by-side in the array 1130 , so that the dimension 1140 is represented by a row within the array 1130 , while each encoded local feature vector is represented by a column within the array 1130 .
- the value 1131 of the dimension 1140 in the encoded local feature vector 1031 may be compared to the value 1133 of the same dimension 1140 in the encoded local feature vector 1033 .
- the feature vector module 220 may identify a value for the dimension 1140 that significantly characterizes the image 1010 . For example, the feature vector module 220 may compare all values for the dimension 1140 and determine that the value 1133 has a maximum absolute value (e.g., is a maximum value or a minimum value) among all other values (e.g., value 1131 ) for the dimension 1140 within the array 1130 of encoded local feature vectors. This process may be performed for each dimension (e.g., dimension 1140 ) represented in the second set 1030 of encoded local feature vectors (e.g., encoded local feature vectors 1031 and 1033 ). Accordingly, the feature vector module 220 may identify, determine, or otherwise obtain a characteristic value (e.g., a maximum absolute value or a maximum value) for each dimension of the encoded local feature vectors.
- a characteristic value e.g., a maximum absolute value or a maximum value
- these characteristic values may be paired (e.g., by the feature vector module 220 ) with indices that indicate which encoded local feature vector corresponds to which characteristic value.
- each characteristic value may be paired with the index of its corresponding encoded local feature vector, which is also the index of its corresponding pixel block within the image 1010 .
- a pooling vector 1160 e.g., a pooling vector of maximum values, labeled “z”
- an index vector 1170 e.g., an index vector of indices for maximum values, labeled “e”
- the characteristic values are maximum values.
- the pooling vector 1160 is a vector of maximum values for the dimensions (e.g., dimension 1140 ) that are represented in the second set 1030 of encoded local feature vectors
- the index vector 1170 is a vector of indices for these maximum values.
- the value 1133 may be the maximum value for the dimension 1140 , and the value 1133 corresponds to (e.g., comes from) the encoded local feature vector 1033 , which may be the i-th encoded local feature vector in the second set 1030 of encoded local feature vectors (e.g., corresponding to the i-th pixel block 1013 of the image 1010 ). Therefore, the feature vector module 220 may pair the value 1133 with the index 1173 (e.g., labeled “i”) to generate the ordered pair 1179 .
- the index 1173 e.g., labeled “i”
- the feature vector module 220 may generate the first array 1150 of ordered pairs.
- the feature vector module 220 generates the first array 1150 of ordered pairs by mapping the pooling vector 1160 to the index vector 1170 , and storing the result as the first array 1150 of ordered pairs.
- FIG. 12 is a conceptual diagram that illustrates generation of a second array 1250 of ordered pairs (e.g., ordered pair 1279 ) for the image 1010 , according to some example embodiments.
- This generation of the second array 1250 may be performed by the feature vector module 220 of the hierarchy machine 110 .
- the values (e.g., value 1133 ) of the pooling vector 1160 e.g., a vector of maximum values, as discussed above with respect to FIG. 11
- the values (e.g., value 1133 ) of the pooling vector 1160 may be paired with their corresponding local feature vectors (e.g., local feature vector 1023 from the first set 1020 of local feature vectors, as discussed above with respect to FIG. 10 ).
- the values e.g., value 1133
- the pooling vector 1160 e.g., a vector of maximum values, as discussed above with respect to FIG. 11
- the local feature vectors e.g., local feature vector 1023 from the first set 1020
- the pooling vector 1160 with a matrix 1270 (e.g., labeled “x e ”) of local feature vectors that correspond to the values (e.g., value 1133 ) of the pooling vector 1160 .
- the value 1133 may be the characteristic (e.g., maximum) value for the dimension 1140
- the value 1133 corresponds to (e.g., comes from) the local feature vector 1023 , which may be the i-th local feature vector in the first set 1020 of local feature vectors (e.g., corresponding to the i-th pixel block 1013 of the image 1010 ).
- the feature vector module 220 may pair the value 1133 with the local feature vector 1023 (e.g., labeled “x i ”) to generate the ordered pair 1279 .
- the local feature vector 1023 is identified based on an index (e.g., “i”) of its corresponding encoded local feature vector 1033 .
- the feature vector module 220 may generate the second array 1250 of ordered pairs.
- the feature vector module 220 generates the second array 1250 of ordered pairs by starting with the first array 1150 of ordered pairs and replacing the index vector 1170 with the matrix 1270 of local feature vectors.
- the resulting second array 1250 of ordered pairs may be stored as a feature vector 1280 (e.g., labeled “f”) that corresponds to the image 1010 in its entirety.
- the vector storage module 230 may store the second array 1250 in the database 115 as the feature vector 1280 for the image 1010 .
- the feature vector 1280 maps the values (e.g., value 1133 ) from the pooling vector 1160 to their corresponding local feature vectors (e.g., local feature vector 1023 ).
- the feature vector 1280 for the image 1010 may provide a subset of its original local feature vectors (e.g., a subset of the first set 1020 of local feature vectors) along with corresponding characteristic values (e.g., maximum values) from their encoded counterparts (e.g., in the second set 1030 of encoded local feature vectors).
- the feature vector 1280 may be described as including (e.g., embedding) the most significant local feature vectors of the image 1010 (e.g., most significant for the purpose of recognizing of coarse-grained and fine-grained visual patterns).
- FIG. 13-15 are flowcharts illustrating operations of the hierarchy machine 110 in performing a method 1300 of processing the image 1010 , according to some example embodiments. Operations in the method 1300 may be performed using modules described above with respect to FIG. 2 . As shown in FIG. 13 , the method 1300 includes operations 1310 , 1320 , 1330 , 1340 , and 1350 .
- the image access module 210 of the hierarchy machine 110 may access the image 1010 from the database 115 .
- the feature vector module 220 may determine the pixel blocks (e.g., pixel blocks 1011 - 1015 ), for example, by dividing the image 1010 into overlapping or non-overlapping pixel blocks. In some example embodiments, this preprocessing operation is included as part (e.g., a precursor task, a subroutine, or a portion) of operation 1310 .
- the feature vector module 220 of the hierarchy machine 110 generates the first set 1020 of local feature vectors (e.g., local feature vectors 1021 and 1023 ).
- local feature vectors e.g., local feature vectors 1021 and 1023 .
- this may be performed by executing a mathematical transformation on each of the pixel blocks (e.g., pixel blocks 1011 - 1015 ) of the image 1010 .
- the mathematical transformation may generate the local feature vector 1021 from pixel values of the pixel block 1011
- a mathematical transformation may generate the local feature vector 1023 from pixel values of the pixel block 1013 .
- the feature vector module 220 encodes the first set 1020 of local feature vectors into a second set 1030 of encoded local feature vectors.
- this encoding operation reduces the number of dimensions represented from a first number of dimensions to a second number of dimensions that is less than the first number of dimensions. Accordingly, the ordered pairs in the first array 1150 of ordered pairs may be equal in number to the second number of dimensions, and the ordered pairs in the second array 1250 of ordered pairs may likewise be equal in number to the second number of dimensions.
- the feature vector module 220 generates the first array 1150 of ordered pairs (e.g., ordered pair 1179 ).
- ordered pair 1179 may pair the value 1133 from the encoded local feature vector 1033 with an index of that encoded local feature vector 1033 , and this index may indicate the pixel block 1013 that corresponds to that same encoded local feature vector 1033 .
- the feature vector module 220 generates the second array 1250 of ordered pairs (e.g., ordered pair 1279 ).
- ordered pair 1279 e.g., a second ordered pair
- the ordered pair 1279 may pair the value 1133 from the encoded local feature vector 1033 with the local feature vector 1023 itself (e.g., the corresponding local feature vector for the value 1133 ).
- operation 1340 may include identifying the local feature vector 1023 (e.g., a first vector) based on an index (e.g., “i”) of its corresponding encoded local feature vector 1033 (e.g., a second vector).
- the vector storage module 230 of the hierarchy machine 110 stores the second array 1250 of ordered pairs as the feature vector 1280 of the image 1010 .
- the feature vector 1280 may be used as a representative of the significant features depicted in the image 1010 in any algorithm for visual pattern recognition.
- the feature vector 1280 may be stored in the database 115 .
- the feature vector 1280 is later accessed (e.g., by the image access module 210 ) for use by the classifier trainer module 250 as a basis for training the image classifier module 240 .
- the method 1300 may include one or more of operations 1430 and 1460 .
- operation 1430 may be performed as part (e.g., a precursor task, a subroutine, or a portion) of operation 1330 , in which the feature vector module 220 generates the first array 1150 of ordered pairs.
- the feature vector module 220 determines the characteristic value for the dimension 1140 by selecting the maximum absolute value for the dimension 1140 (e.g., a dimension in common) among the encoded local feature vectors (e.g., encoded local feature vector 1033 ) in the array 1130 of encoded local feature vectors.
- the value 1133 may be selected as the characteristic value for the dimension 1140 .
- the image 1010 may be a test image or a training image whose classification, categorization, or identity is already known (e.g., predetermined).
- the feature vector 1280 of the image 1010 may be used to train an image classifier (e.g., image classifier module 240 ). This training may be performed by the classifier trainer module 250 of the hierarchy machine 110 .
- Operation 1460 may be performed after operation 1350 , in which the vector storage module 230 stores the feature vector 1280 of the image 1010 .
- the classifier trainer module 250 of the hierarchy machine 110 trains the image classifier module 240 (e.g., an image classifier, image categorization module, visual pattern recognizer, or any suitable combination thereof).
- the image classifier module 240 may be trained to classify, categorize, or identify fonts, objects, faces of persons, scenes, or any suitable combination thereof, depicted within the image 1010 .
- the image classifier module 240 may be trained to classify the image 1010 based on the second array 1250 of ordered pairs (e.g., stored in the database 115 as the feature vector 1280 of the image 1010 ).
- the image 1010 may depict some text rendered in a font (e.g., Times New Roman, bold and italic).
- performance of operation 1460 may train the image classifier module 240 to classify the image 1010 by classifying the font in which the text depicted in the image 1010 is rendered.
- the classifying of this font may be based on the second array 1250 of ordered pairs (e.g., stored in the database 115 as the feature vector 1280 of the image 1010 ), which may be used to characterize the visual pattern of the font.
- the image 1010 may depict a face of a person (e.g., a famous celebrity or a wanted criminal).
- performance of operation 1460 may train the image classifier module 240 to classify the image 1010 by classifying the face depicted in the image 1010 (e.g., by classifying a facial expression exhibited by the face, classifying a gender of the face, classifying an age of the face, or any suitable combination thereof).
- the classifying of this face may be based on the second array 1250 of ordered pairs (e.g., stored in the database 115 as the feature vector 1280 of the image 1010 ), which may be used to characterize the face as a visual pattern or characterize a visual pattern within the face (e.g., a visual pattern that includes a scar, a tattoo, makeup, or any suitable combination thereof).
- one or more of operations 1462 , 1464 , and 1466 may be performed as part (e.g., a precursor task, a subroutine, or a portion) of operation 1460 .
- the classifier trainer module 250 calculates classification probability vectors for the second array 1250 of ordered pairs. For example, for the ordered pair 1279 (e.g., the second ordered pair), a classification probability vector may be calculated, and this classification probability vector may define a distribution of probabilities that the local feature vector 1023 (e.g., as a member of the ordered pair 1279 ) represents certain features that characterize various classes (e.g., categories) of images.
- the distribution of probabilities includes a probability of the local feature vector 1023 (e.g., the first vector) representing a feature that characterizes a particular class of images (e.g., a particular style of font, such as italic or bold, or a particular gender of face).
- a probability of the local feature vector 1023 e.g., the first vector
- a particular style of font such as italic or bold, or a particular gender of face
- the image classifier module 240 For purposes of training the image classifier module 240 , it may be helpful to modify the classification probability vectors calculated in operation 1462 (e.g., so that the modified classification probability vectors result in the known classification, categorization, or identity of the image 1010 ). This may be accomplished by determining a weight vector whose values (e.g., scalar values) may be applied as weights to the distribution of probabilities defined by each classification probability vector. Accordingly, in operation 1464 , the classifier trainer module 250 determines such a weight vector (e.g., with the constraint that the weighted classification probability vectors produced the unknown result for the image 1010 , when the weight vector is multiplied to each of the classification probability vectors).
- a weight vector e.g., with the constraint that the weighted classification probability vectors produced the unknown result for the image 1010 , when the weight vector is multiplied to each of the classification probability vectors.
- the modified (e.g., weighted) classification probability vectors define a modified distribution of probabilities, and the modified distribution of probabilities include a modified probability of the local feature vector 1023 (e.g., the first vector) representing a feature that characterizes the particular image class known for the image 1010 .
- the modified distribution of probability indicates that the local feature vector 1023 indeed does represent the feature that characterizes the known class of images for the image 1010 .
- the weight vector may be determined based on a constraint that the feature represented by the local feature vector 1023 characterizes this class of images to which the image 1010 belongs.
- the weight vector may be stored as a template (e.g., in a template or as the template itself).
- the template may be stored in the database 115 , and the template may be subsequently applicable to multiple classes of images (e.g., multiplied to classification probability vectors that are calculated for inside or outside the known classification for the image 1010 ).
- the template may be applicable to images (e.g., candidate images) of unknown classification (e.g., unknown category) or unknown identity.
- the classifier trainer module 250 may store the weight vector as such a template in the database 115 .
- the method 1300 may include one or more of operations 1430 , 1460 , and 1560 .
- Operation 1430 and 1460 are described above with respect to FIG. 14
- operation 1560 may be performed at a point in time after performance of operation 1460 (e.g., seconds, minutes, days, months, or years).
- the image 1010 may be a reference image (e.g., a test image or a training image whose classification, categorization, or identity is already known). Supposing that the image classifier module 240 of the hierarchy machine 110 has been trained (e.g., by the classifier trainer module 250 ) based on the image 1010 (e.g., along with other reference images), the image classifier module 240 may be used to classify one or more candidate images of unknown classification, categorization, or identity.
- a reference image e.g., a test image or a training image whose classification, categorization, or identity is already known.
- the user 132 may use his device 130 to submit a candidate image (e.g., that depicts a visual pattern similar to that found in the image 1010 ) to the hierarchy machine 110 for visual pattern recognition (e.g., image classification, image categorization, or image identification).
- a candidate image e.g., that depicts a visual pattern similar to that found in the image 1010
- visual pattern recognition e.g., image classification, image categorization, or image identification.
- the training of the image classifier module 240 may be performed by the classifier trainer module 250 in operation 1460 .
- image classifier module 240 classifies a candidate image (e.g., a further image, perhaps similar to the image 1010 ). For example, the image classifier module 240 may classify, categorize, or identify fonts, objects, faces of persons, scenes, or any suitable combination thereof, depicted within the candidate image. As noted above, the image classifier module 240 may be trained with the second array 1250 of ordered pairs (e.g., stored in the database 115 as the feature vector 1280 of the image 1010 ).
- the image classifier module 240 may classify the candidate image based on a feature vector of the candidate image (e.g., a counterpart to the feature vector 1280 of the image 1010 , generated in a manner similar to second array 1250 of ordered pairs).
- a feature vector of the candidate image e.g., a counterpart to the feature vector 1280 of the image 1010 , generated in a manner similar to second array 1250 of ordered pairs.
- the candidate image may depict some text rendered in a font (e.g., Times New Roman, bold and italic).
- performance of operation 1560 may classify the candidate image by classifying the font in which the text depicted in the candidate image is rendered.
- the classifying of this font may be based on the feature vector of the candidate image (e.g., the candidate image's version of the feature vector 1280 for the image 1010 , generated in a manner similar to second array 1250 of ordered pairs), which may be used to characterize the visual pattern of the font.
- the candidate image may depict a face of a person (e.g., a famous celebrity or a wanted criminal).
- performance of operation 1560 may classify the candidate image by classifying the face depicted in the candidate image (e.g., by classifying a facial expression exhibited by the face, classifying a gender of the face, classifying an age of the face, or any suitable combination thereof).
- the classifying of this face may be based on the feature vector of the candidate image (e.g., the candidate image's counterpart to the feature vector 1280 of the image 1010 , generated in a manner similar to second array 1250 of ordered pairs), which may be used to characterize the face as a visual pattern or characterize a visual pattern within the face (e.g., a visual pattern that includes a scar, a tattoo, makeup, or any suitable combination thereof).
- the feature vector of the candidate image e.g., the candidate image's counterpart to the feature vector 1280 of the image 1010 , generated in a manner similar to second array 1250 of ordered pairs
- the face may be used to characterize the face as a visual pattern or characterize a visual pattern within the face (e.g., a visual pattern that includes a scar, a tattoo, makeup, or any suitable combination thereof).
- one or more of operations 1562 , 1564 , and 1566 may be performed as part (e.g., a precursor task, a subroutine, or a portion) of operation 1560 .
- the image classifier module 240 initiates performance of operations 1310 - 1350 for the candidate image (e.g., instead of the image 1010 ).
- the hierarchy machine 110 may generate a feature vector for the candidate image and store this feature vector in the database 115 .
- the image classifier module 240 calculates classification probability vectors for the feature vector of the candidate image. This may be performed in a manner similar to that described above with respect to FIG. 7 for operation 1462 . For example, for each ordered pair in the feature vector of the candidate image, a classification probability vector may be calculated to define a distribution of probabilities that the corresponding local feature vector (e.g., as a member of the ordered pair) represents features that characterize various classes (e.g., categories) of images.
- the distribution of probabilities includes a probability of the local feature vector 1023 (e.g., the first vector) representing a feature that characterizes a particular class of images (e.g., a particular style of font, such as italic or bold, or a particular gender of face).
- a probability of the local feature vector 1023 e.g., the first vector
- a particular style of font such as italic or bold, or a particular gender of face
- the weight vector (e.g., templates) determined in operation 1464 (e.g., as discussed above with respect to FIG. 14 ) is applied by the image classifier module 240 to the classification probability vectors that were calculated in operation 1564 for the feature vector of the candidate image.
- the image classifier module 240 may access the weight vector from the database 115 and multiply the classification probability vectors by the weight vector.
- the modified (e.g., weighted) classification probability vectors for the candidate image define a modified distribution of probabilities that include a modified probability of a local feature vector of the candidate image representing a feature that characterizes a particular image class.
- the image classifier module 240 may cause (e.g., utilize, initiate, or execute) the trained image classifier module 240 to probabilistically determine a classification, categorization, or identity of the candidate image.
- an image classification machine may classify a generic image by implementing a pipeline of first encoding local image descriptors (e.g., scale-invariant feature transform (SIFT) descriptors, local binary pattern (LBP) descriptors, kernel descriptors, or any suitable combination thereof) into sparse codes, and then pooling the sparse codes into a fixed-length image feature representation.
- local image descriptors e.g., scale-invariant feature transform (SIFT) descriptors, local binary pattern (LBP) descriptors, kernel descriptors, or any suitable combination thereof
- T [t 1 ; t 2 , . . . t K ⁇ denotes a template model or codebook of size K and x i ⁇ d
- f is the encoding function (e.g., vector quantization, soft assignment, locality-constrained linear coding (LLC), or sparse coding)
- y i ⁇ K is the code for x i .
- the above feature extraction pipeline may be effective at distinguishing different categories of objects, it may be insufficient to capture the subtle differences within an object category for fine-grained recognition (e.g., letter endings or other fine details that characterize various typefaces and fonts for text).
- the above feature extraction pipeline may be extended by embedding local features into the pooling vector to preserve the fine-grained details (e.g., details of local letter parts in text).
- e k e(k)
- z k z(k)
- the max pooling procedure may introduce a competing process for all the local descriptors to match templates.
- Each pooling coefficient z k measures the response significance of x e k with respect to template t k , which is effective at categorizing coarse object shapes, while the pooled local descriptor x e k preserves the local part details that are discriminative for classifying subtle fine-grained differences when the pooling coefficients are similar. Therefore, the feature representation in Equation (4) can capture both coarse level object appearance changes and subtle object part changes. This feature representation may be called “local feature embedding” or “LFE.”
- Local feature embedding may embed the local descriptors from max pooling into a much higher dimensional space of Kd . For instance, if we use 59-dimensional LBP descriptors and a codebook size of 2048, the dimension of f without using spatial pyramid matching (SPM) is already 120,832. Although embedding the image into higher dimensional spaces may be amicable to linear classifiers, training classifiers for very large-scale applications can be very time-consuming. Moreover, a potential drawback of training classifiers for large-scale classification is that, when images of new categories become available or when new images are added to existing categories, the retraining of new classifiers may involve a very high computational cost.
- SPM spatial pyramid matching
- the hierarchy machine 110 may utilize a new large-scale classification algorithm based on local feature metric learning and template selection, which can be readily generalized to new classes and new data at very little computational cost.
- the dataset may be open-ended. For example, new font categories may appear over time and new data samples could be added to the existing categories. It may be important for a practical classification algorithm to be able to generalize to new classes and new data at very little cost.
- Nearest class mean (NCM) may be used for certain large-scale classification tasks in which each class is represented by a mean feature vector that is efficient to compute.
- the hierarchy machine 110 may use NCM based on pooled local features to form a set of weak classifiers.
- a max-margin template selection scheme may be implemented to combine these weak classifiers for the final classification, categorization, or identification of a visual pattern within an image.
- a recognition system may generate (e.g., determine or calculate) a Mahalanobis distance metric for each pooled local feature space, under which an NCM classifier may be formulated using multi-class logistic regression, where the probability for a class c given a pooled local feature x e k is defined by
- ⁇ k c is the class mean vector for the k-th pooled local features in class c
- ⁇ k c ⁇ x e k ⁇ W k 2 ( ⁇ k c ⁇ x e k ) T W k T W k ( ⁇ k c ⁇ x e k ).
- ⁇ k ⁇ 1 W k T W k .
- a metric learning method called within-class covariance normalization may be used to learn the metric W k for the k-th pooled feature space.
- WCCN within-class covariance normalization
- ⁇ k c 1 Z c ⁇ ⁇ i ⁇ I c ⁇ ⁇ z k i ⁇ x e k i , ( 8 )
- I c denotes the sample index set for class c
- Z c ⁇ i ⁇ I c z k i is a normalization factor.
- ⁇ circumflex over ( ⁇ ) ⁇ k represents a smoothed version of the empirical expected within-class covariance matrix
- I is the identity matrix
- ⁇ 2 can take the value of trace( ⁇ k ).
- NCM may be used as the classifier, which may lay the foundation for the multi-class logistic regression in Equation (6).
- the projection components with high within-class variability may be depressed, for example, by discarding the first few largest eigen-values in D k , which corresponds to the subspace where the feature similarity and label similarity are most out of sync (e.g., with large eigenvalues corresponding to large within-class variance).
- the solution of WCCN may be interpreted as the result of discriminative subspace learning.
- the hierarchy machine 110 may evaluate the posterior of a class c for the input image feature representation f by combining the outputs of Equation (6) using a log-linear model:
- w k weights the contribution of each pooled local feature to the final classification
- a is a small constant offset.
- the hierarchy machine 110 may be configured to treat the multi-class logistic regression for each pooled local feature as a weak classifier, and then linearly combine them to obtain a strong classifier:
- the hierarchy machine 110 may avoid the numerical instability and data scale problem of logarithm in Equation (14).
- f) need not have a probabilistic interpretation anymore, but the classification task may again be to find the class with the largest score output.
- this formulation may work slightly better than a log-linear model, and this linear model may be implemented in the hierarchy machine 110 .
- ⁇ k 1 K ⁇ ⁇ w k ⁇ ( p ⁇ ( c i ⁇ x e k i ) - p ⁇ ( c ′ ⁇ x e k i ) ) > 0 , ⁇ ⁇ i , c i ⁇ c i . ( 18 )
- w may be obtained by solving the following optimization:
- Equation (21) is a classical one-class support vector machine (SVM) formulation.
- SVM support vector machine
- Equation (19) may translate to
- Equation (21) the optimization in Equation (21) is the classical SVM formulation with only positive class and thus can be solved by an SVM package.
- the regularization term ⁇ (w) may also take the form of ⁇ w ⁇ 1 , where the l 1 -norm promotes sparsity for template selection, which may have better generalization behavior when the size K of the template model T is very large.
- c* max c′ s(c′
- f) When new data or font classes are added to the database, it is sufficient to calculate the new class mean vectors and estimate the within-class covariances to update the WCCN metric incrementally.
- the template model is universally shared by all classes, the template weights do not need to be retrained. Therefore, the above-described algorithm (e.g., as implemented in the hierarchy machine 110 ) can readily adapt to new data or new classes at little added computational cost.
- one or more of the methodologies described herein may facilitate generation of a hierarchy of visual pattern clusters, as well as facilitate visual pattern recognition in an image.
- generation and use of such a hierarchy of visual pattern clusters may enable a system to omit unrelated classifiers and execute only those classifiers with at least a threshold probability of actually classifying a candidate visual pattern.
- one or more of the methodologies described herein may enable efficient and scalable automated visual pattern recognition.
- one or more of the methodologies described herein may facilitate classification, categorization, or identification of a visual pattern depicted within an image, such as a font used for rendering text or a face that appears in the image.
- one or more the methodologies described herein may facilitate font recognition, facial recognition, facial analysis, or any suitable combination thereof.
- one or more of the methodologies described herein may obviate a need for certain efforts or resources that otherwise would be involved in recognition of visual patterns in images. Efforts expended by a user in recognizing a visual pattern that appears within an image may be reduced by one or more of the methodologies described herein. Computing resources used by one or more machines, databases, or devices (e.g., within the network environment 100 ) may similarly be reduced. Examples of such computing resources include processor cycles, network traffic, memory usage, data storage capacity, power consumption, and cooling capacity.
- FIG. 16 is a block diagram illustrating components of a machine 1600 , according to some example embodiments, able to read instructions from a machine-readable medium (e.g., a machine-readable storage medium, a computer-readable storage medium, or any suitable combination thereof) and perform any one or more of the methodologies discussed herein, in whole or in part.
- a machine-readable medium e.g., a machine-readable storage medium, a computer-readable storage medium, or any suitable combination thereof
- FIG. 16 shows a diagrammatic representation of the machine 1600 in the example form of a computer system and within which instructions 1624 (e.g., software, a program, an application, an applet, an app, or other executable code) for causing the machine 1600 to perform any one or more of the methodologies discussed herein may be executed, in whole or in part.
- instructions 1624 e.g., software, a program, an application, an applet, an app, or other executable code
- the machine 1600 operates as a standalone device or may be connected (e.g., networked) to other machines.
- the machine 1600 may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a distributed (e.g., peer-to-peer) network environment.
- the machine 1600 may be a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, a network switch, a network bridge, or any machine capable of executing the instructions 1624 , sequentially or otherwise, that specify actions to be taken by that machine.
- PC personal computer
- PDA personal digital assistant
- the machine 1600 includes a processor 1602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a radio-frequency integrated circuit (RFIC), or any suitable combination thereof), a main memory 1604 , and a static memory 1606 , which are configured to communicate with each other via a bus 1608 .
- the processor 1602 may contain microcircuits that are configurable, temporarily or permanently, by some or all of the instructions 1624 such that the processor 1602 is configurable to perform any one or more of the methodologies described herein, in whole or in part.
- a set of one or more microcircuits of the processor 1602 may be configurable to execute one or more modules (e.g., software modules) described herein.
- the machine 1600 may further include a graphics display 1610 (e.g., a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)).
- the machine 1600 may also include an alphanumeric input device 1612 (e.g., a keyboard), a cursor control device 1614 (e.g., a mouse, a touchpad, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 1616 , a signal generation device 1618 (e.g., a speaker), and a network interface device 1620 .
- a graphics display 1610 e.g., a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)
- the machine 1600 may also include an alphanumeric input device
- the storage unit 1616 includes a machine-readable medium 1622 (e.g., a tangible and non-transitory machine-readable storage medium) on which are stored the instructions 1624 embodying any one or more of the methodologies or functions described herein.
- the instructions 1624 may also reside, completely or at least partially, within the main memory 1604 , within the processor 1602 (e.g., within the processor's cache memory), or both, during execution thereof by the machine 1600 . Accordingly, the main memory 1604 and the processor 1602 may be considered as machine-readable media (e.g., tangible and non-transitory machine-readable media).
- the instructions 1624 may be transmitted or received over a network 1626 (e.g., network 190 ) via the network interface device 1620 .
- the term “memory” refers to a machine-readable medium able to store data temporarily or permanently and may be taken to include, but not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, and cache memory. While the machine-readable medium 1622 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions.
- machine-readable medium shall also be taken to include any medium, or combination of multiple media, that is capable of storing instructions for execution by a machine (e.g., machine 1600 ), such that the instructions, when executed by one or more processors of the machine (e.g., processor 1602 ), cause the machine to perform any one or more of the methodologies described herein. Accordingly, a “machine-readable medium” refers to a single storage apparatus or device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices.
- the term “machine-readable medium” shall accordingly be taken to include, but not be limited to, one or more tangible data repositories in the form of a solid-state memory, an optical medium, a magnetic medium, or any suitable combination thereof.
- Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules.
- a “hardware module” is a tangible unit capable of performing certain operations and may be configured or arranged in a certain physical manner.
- one or more computer systems e.g., a standalone computer system, a client computer system, or a server computer system
- one or more hardware modules of a computer system e.g., a processor or a group of processors
- software e.g., an application or application portion
- a hardware module may be implemented mechanically, electronically, or any suitable combination thereof.
- a hardware module may include dedicated circuitry or logic that is permanently configured to perform certain operations.
- a hardware module may be a special-purpose processor, such as a field programmable gate array (FPGA) or an ASIC.
- a hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations.
- a hardware module may include software encompassed within a general-purpose processor or other programmable processor. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
- hardware module should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein.
- “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where a hardware module comprises a general-purpose processor configured by software to become a special-purpose processor, the general-purpose processor may be configured as respectively different special-purpose processors (e.g., comprising different hardware modules) at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
- Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) between or among two or more of the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
- a resource e.g., a collection of information
- processors may be temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions described herein.
- processor-implemented module refers to a hardware module implemented using one or more processors.
- the methods described herein may be at least partially processor-implemented, a processor being an example of hardware.
- a processor being an example of hardware.
- the operations of a method may be performed by one or more processors or processor-implemented modules.
- the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS).
- SaaS software as a service
- at least some of the operations may be performed by a group of computers (as examples of machines including processors), with these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., an application program interface (API)).
- API application program interface
- the performance of certain operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines.
- the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
Abstract
Description
y i =f(x i ,T), (1)
z=g({y i}i=1 n), (2)
{z,e}=max({y i}i=1 n), (3)
f=└z 1 x e
f={(z k ,x e
∥μk c −x e
W k =D k −1/2 U k T, (13)
s(c i |f i)>s(c′|f i),∀i,c′≠c i, (17)
p i(c)=└p(c|x e
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/012,770 US9053392B2 (en) | 2013-08-28 | 2013-08-28 | Generating a hierarchy of visual pattern classes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/012,770 US9053392B2 (en) | 2013-08-28 | 2013-08-28 | Generating a hierarchy of visual pattern classes |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150063713A1 US20150063713A1 (en) | 2015-03-05 |
US9053392B2 true US9053392B2 (en) | 2015-06-09 |
Family
ID=52583375
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/012,770 Active 2033-10-30 US9053392B2 (en) | 2013-08-28 | 2013-08-28 | Generating a hierarchy of visual pattern classes |
Country Status (1)
Country | Link |
---|---|
US (1) | US9053392B2 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170034305A1 (en) * | 2015-06-30 | 2017-02-02 | Linkedin Corporation | Managing overlapping taxonomies |
CN106991426A (en) * | 2016-09-23 | 2017-07-28 | 天津大学 | Remote sensing images sparse coding dictionary learning method based on DSP embedded |
US20180089832A1 (en) * | 2016-09-29 | 2018-03-29 | Intel Corporation | Place recognition algorithm |
US10007864B1 (en) * | 2016-10-14 | 2018-06-26 | Cloudera, Inc. | Image processing system and method |
US10657712B2 (en) | 2018-05-25 | 2020-05-19 | Lowe's Companies, Inc. | System and techniques for automated mesh retopology |
US11270101B2 (en) | 2019-11-01 | 2022-03-08 | Industrial Technology Research Institute | Imaginary face generation method and system, and face recognition method and system using the same |
US11449789B2 (en) | 2016-02-16 | 2022-09-20 | Micro Focus Llc | System and method for hierarchical classification |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9141885B2 (en) * | 2013-07-29 | 2015-09-22 | Adobe Systems Incorporated | Visual pattern recognition in an image |
EP3120300A4 (en) * | 2014-03-19 | 2017-11-22 | Neurala Inc. | Methods and apparatus for autonomous robotic control |
KR102024867B1 (en) * | 2014-09-16 | 2019-09-24 | 삼성전자주식회사 | Feature extracting method of input image based on example pyramid and apparatus of face recognition |
US9747636B2 (en) | 2014-12-08 | 2017-08-29 | Bank Of America Corporation | Enhancing information security using an information passport dashboard |
US10037712B2 (en) | 2015-01-30 | 2018-07-31 | Toyota Motor Engineering & Manufacturing North America, Inc. | Vision-assist devices and methods of detecting a classification of an object |
US10217379B2 (en) * | 2015-01-30 | 2019-02-26 | Toyota Motor Engineering & Manufacturing North America, Inc. | Modifying vision-assist device parameters based on an environment classification |
US10635924B2 (en) * | 2015-05-11 | 2020-04-28 | Siemens Aktiengesellschaft | System and method for surgical guidance and intra-operative pathology through endo-microscopic tissue differentiation |
US9280745B1 (en) | 2015-07-08 | 2016-03-08 | Applied Underwriters, Inc. | Artificial intelligence expert system for screening |
US10074042B2 (en) | 2015-10-06 | 2018-09-11 | Adobe Systems Incorporated | Font recognition using text localization |
US9875429B2 (en) | 2015-10-06 | 2018-01-23 | Adobe Systems Incorporated | Font attributes for font recognition and similarity |
US10007868B2 (en) * | 2016-09-19 | 2018-06-26 | Adobe Systems Incorporated | Font replacement based on visual similarity |
US9928448B1 (en) | 2016-09-23 | 2018-03-27 | International Business Machines Corporation | Image classification utilizing semantic relationships in a classification hierarchy |
CN109784398B (en) * | 2019-01-11 | 2023-12-05 | 广东奥普特科技股份有限公司 | Classifier based on feature scale and subclass splitting |
US10916006B2 (en) * | 2019-04-16 | 2021-02-09 | Winbond Electronics Corp. | Recognition method of pattern feature |
US10950017B2 (en) | 2019-07-08 | 2021-03-16 | Adobe Inc. | Glyph weight modification |
US11295181B2 (en) | 2019-10-17 | 2022-04-05 | Adobe Inc. | Preserving document design using font synthesis |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5315668A (en) * | 1991-11-27 | 1994-05-24 | The United States Of America As Represented By The Secretary Of The Air Force | Offline text recognition without intraword character segmentation based on two-dimensional low frequency discrete Fourier transforms |
US6181829B1 (en) * | 1998-01-21 | 2001-01-30 | Xerox Corporation | Method and system for classifying and processing of pixels of image data |
US6229923B1 (en) * | 1998-01-21 | 2001-05-08 | Xerox Corporation | Method and system for classifying and processing of pixels of image data |
US6347153B1 (en) * | 1998-01-21 | 2002-02-12 | Xerox Corporation | Method and system for classifying and processing of pixels of image data |
US6493463B1 (en) * | 1999-09-09 | 2002-12-10 | Xerox Corporation | Segmentation tag cleanup using neighborhood tags |
US6516091B1 (en) * | 1999-09-09 | 2003-02-04 | Xerox Corporation | Block level analysis of segmentation tags |
US6621930B1 (en) * | 2000-08-09 | 2003-09-16 | Elron Software, Inc. | Automatic categorization of documents based on textual content |
US6636331B1 (en) * | 1999-09-09 | 2003-10-21 | Xerox Corporation | Segmentation tag cleanup based on connected components |
US6795589B1 (en) * | 1998-09-03 | 2004-09-21 | Canon Kabushiki Kaisha | Optimizing image compositing |
US20050096950A1 (en) * | 2003-10-29 | 2005-05-05 | Caplan Scott M. | Method and apparatus for creating and evaluating strategies |
US20060088207A1 (en) * | 2004-10-22 | 2006-04-27 | Henry Schneiderman | Object recognizer and detector for two-dimensional images using bayesian network based classifier |
US20070058836A1 (en) * | 2005-09-15 | 2007-03-15 | Honeywell International Inc. | Object classification in video data |
US7221775B2 (en) * | 2002-11-12 | 2007-05-22 | Intellivid Corporation | Method and apparatus for computerized image background analysis |
US20070253625A1 (en) * | 2006-04-28 | 2007-11-01 | Bbnt Solutions Llc | Method for building robust algorithms that classify objects using high-resolution radar signals |
US20080092109A1 (en) * | 2006-10-17 | 2008-04-17 | The Mathworks, Inc. | User-defined hierarchies of user-defined classes of graphical objects in a graphical modeling environment |
US20090324107A1 (en) * | 2008-06-25 | 2009-12-31 | Gannon Technologies Group, Llc | Systems and methods for image recognition using graph-based pattern matching |
US20100119128A1 (en) * | 2008-08-14 | 2010-05-13 | Bond University Ltd. | Cancer diagnostic method and system |
US20110249891A1 (en) * | 2010-04-07 | 2011-10-13 | Jia Li | Ethnicity Classification Using Multiple Features |
US20120039539A1 (en) * | 2010-03-08 | 2012-02-16 | Oren Boiman | Method and system for classifying one or more images |
US20120090834A1 (en) * | 2009-07-06 | 2012-04-19 | Matthias Imhof | Method For Seismic Interpretation Using Seismic Texture Attributes |
US20140283040A1 (en) * | 2013-03-14 | 2014-09-18 | Daniel Shawcross Wilkerson | Hard Object: Lightweight Hardware Enforcement of Encapsulation, Unforgeability, and Transactionality |
US20140282586A1 (en) * | 2013-03-15 | 2014-09-18 | Advanced Elemental Technologies | Purposeful computing |
-
2013
- 2013-08-28 US US14/012,770 patent/US9053392B2/en active Active
Patent Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5315668A (en) * | 1991-11-27 | 1994-05-24 | The United States Of America As Represented By The Secretary Of The Air Force | Offline text recognition without intraword character segmentation based on two-dimensional low frequency discrete Fourier transforms |
US6181829B1 (en) * | 1998-01-21 | 2001-01-30 | Xerox Corporation | Method and system for classifying and processing of pixels of image data |
US6229923B1 (en) * | 1998-01-21 | 2001-05-08 | Xerox Corporation | Method and system for classifying and processing of pixels of image data |
US6347153B1 (en) * | 1998-01-21 | 2002-02-12 | Xerox Corporation | Method and system for classifying and processing of pixels of image data |
US6795589B1 (en) * | 1998-09-03 | 2004-09-21 | Canon Kabushiki Kaisha | Optimizing image compositing |
US6493463B1 (en) * | 1999-09-09 | 2002-12-10 | Xerox Corporation | Segmentation tag cleanup using neighborhood tags |
US6516091B1 (en) * | 1999-09-09 | 2003-02-04 | Xerox Corporation | Block level analysis of segmentation tags |
US6636331B1 (en) * | 1999-09-09 | 2003-10-21 | Xerox Corporation | Segmentation tag cleanup based on connected components |
US6621930B1 (en) * | 2000-08-09 | 2003-09-16 | Elron Software, Inc. | Automatic categorization of documents based on textual content |
US7221775B2 (en) * | 2002-11-12 | 2007-05-22 | Intellivid Corporation | Method and apparatus for computerized image background analysis |
US20050096950A1 (en) * | 2003-10-29 | 2005-05-05 | Caplan Scott M. | Method and apparatus for creating and evaluating strategies |
US20060088207A1 (en) * | 2004-10-22 | 2006-04-27 | Henry Schneiderman | Object recognizer and detector for two-dimensional images using bayesian network based classifier |
US20070058836A1 (en) * | 2005-09-15 | 2007-03-15 | Honeywell International Inc. | Object classification in video data |
US20070253625A1 (en) * | 2006-04-28 | 2007-11-01 | Bbnt Solutions Llc | Method for building robust algorithms that classify objects using high-resolution radar signals |
US20080092109A1 (en) * | 2006-10-17 | 2008-04-17 | The Mathworks, Inc. | User-defined hierarchies of user-defined classes of graphical objects in a graphical modeling environment |
US20080092111A1 (en) * | 2006-10-17 | 2008-04-17 | The Mathworks, Inc. | User-defined hierarchies of user-defined classes of graphical objects in a graphical modeling environment |
US20090324107A1 (en) * | 2008-06-25 | 2009-12-31 | Gannon Technologies Group, Llc | Systems and methods for image recognition using graph-based pattern matching |
US20100119128A1 (en) * | 2008-08-14 | 2010-05-13 | Bond University Ltd. | Cancer diagnostic method and system |
US20120090834A1 (en) * | 2009-07-06 | 2012-04-19 | Matthias Imhof | Method For Seismic Interpretation Using Seismic Texture Attributes |
US20120039539A1 (en) * | 2010-03-08 | 2012-02-16 | Oren Boiman | Method and system for classifying one or more images |
US20110249891A1 (en) * | 2010-04-07 | 2011-10-13 | Jia Li | Ethnicity Classification Using Multiple Features |
US20140283040A1 (en) * | 2013-03-14 | 2014-09-18 | Daniel Shawcross Wilkerson | Hard Object: Lightweight Hardware Enforcement of Encapsulation, Unforgeability, and Transactionality |
US20140282586A1 (en) * | 2013-03-15 | 2014-09-18 | Advanced Elemental Technologies | Purposeful computing |
Non-Patent Citations (3)
Title |
---|
Bengio, S., et al., "Label Embedding Trees for Large Multi-Class Tasks", Advances in Neural Information Processing Systems (NIPS), (2010), 1-10. |
Deng, Jia, et al., "Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition", Advances in Neural Information Processing Systems (NIPS), (2011), 1-9. |
Liu, B., et al., "Probabilistic Label Trees for Efficient Large Scale Image Classification", CVPR 2013, 1-8. |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170034305A1 (en) * | 2015-06-30 | 2017-02-02 | Linkedin Corporation | Managing overlapping taxonomies |
US11449789B2 (en) | 2016-02-16 | 2022-09-20 | Micro Focus Llc | System and method for hierarchical classification |
CN106991426A (en) * | 2016-09-23 | 2017-07-28 | 天津大学 | Remote sensing images sparse coding dictionary learning method based on DSP embedded |
CN106991426B (en) * | 2016-09-23 | 2020-06-12 | 天津大学 | Remote sensing image sparse coding dictionary learning method based on embedded DSP |
US20180089832A1 (en) * | 2016-09-29 | 2018-03-29 | Intel Corporation | Place recognition algorithm |
US10217221B2 (en) * | 2016-09-29 | 2019-02-26 | Intel Corporation | Place recognition algorithm |
US10007864B1 (en) * | 2016-10-14 | 2018-06-26 | Cloudera, Inc. | Image processing system and method |
US10657712B2 (en) | 2018-05-25 | 2020-05-19 | Lowe's Companies, Inc. | System and techniques for automated mesh retopology |
US11270101B2 (en) | 2019-11-01 | 2022-03-08 | Industrial Technology Research Institute | Imaginary face generation method and system, and face recognition method and system using the same |
Also Published As
Publication number | Publication date |
---|---|
US20150063713A1 (en) | 2015-03-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9053392B2 (en) | Generating a hierarchy of visual pattern classes | |
US9524449B2 (en) | Generation of visual pattern classes for visual pattern recognition | |
US9141885B2 (en) | Visual pattern recognition in an image | |
US20170061257A1 (en) | Generation of visual pattern classes for visual pattern regonition | |
Liu et al. | Partially shared latent factor learning with multiview data | |
US10963685B2 (en) | Generating variations of a known shred | |
Zhou et al. | Double shrinking sparse dimension reduction | |
US8428397B1 (en) | Systems and methods for large scale, high-dimensional searches | |
US10803231B1 (en) | Performing tag-based font retrieval using combined font tag recognition and tag-based font retrieval neural networks | |
Tao et al. | Robust spectral ensemble clustering via rank minimization | |
US20170076152A1 (en) | Determining a text string based on visual features of a shred | |
Serra et al. | Gold: Gaussians of local descriptors for image representation | |
Guan et al. | A unified probabilistic model for global and local unsupervised feature selection | |
Tan et al. | Robust object recognition via weakly supervised metric and template learning | |
Jiang et al. | Variational deep embedding: A generative approach to clustering | |
Zhang et al. | Flexible auto-weighted local-coordinate concept factorization: A robust framework for unsupervised clustering | |
Zhao et al. | Bisecting k-means clustering based face recognition using block-based bag of words model | |
CN112163114B (en) | Image retrieval method based on feature fusion | |
Li et al. | Fuzzy bag of words for social image description | |
Chen et al. | Collaborative multiview hashing | |
Guo et al. | Deep embedded k-means clustering | |
Lin et al. | A deep clustering algorithm based on gaussian mixture model | |
KR20210035017A (en) | Neural network training method, method and apparatus of processing data based on neural network | |
Guo et al. | Data induced masking representation learning for face data analysis | |
Yang et al. | Subspace learning by ℓ 0-induced sparsity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ADOBE SYSTEMS INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YANG, JIANCHAO;CHEN, GUANG;JIN, HAILIN;AND OTHERS;SIGNING DATES FROM 20130826 TO 20130828;REEL/FRAME:031104/0064 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: ADOBE INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:ADOBE SYSTEMS INCORPORATED;REEL/FRAME:048867/0882 Effective date: 20181008 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |