US20130044912A1 - Use of association of an object detected in an image to obtain information to display to a user - Google Patents
Use of association of an object detected in an image to obtain information to display to a user Download PDFInfo
- Publication number
- US20130044912A1 US20130044912A1 US13/549,339 US201213549339A US2013044912A1 US 20130044912 A1 US20130044912 A1 US 20130044912A1 US 201213549339 A US201213549339 A US 201213549339A US 2013044912 A1 US2013044912 A1 US 2013044912A1
- Authority
- US
- United States
- Prior art keywords
- user
- identifier
- image
- information
- scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/142—Image acquisition using hand-held instruments; Constructional details of the instruments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/0304—Detection arrangements using opto-electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/12—Picture reproducers
- H04N9/31—Projection devices for colour picture display, e.g. using electronic spatial light modulators [ESLM]
- H04N9/3191—Testing thereof
- H04N9/3194—Testing thereof including sensor feedback
Definitions
- one or more cameras capture a scene that includes an object in real world that is sufficiently small to be carried by a human hand (“portable”), such as a stapler or a book. Thereafter, an image of the scene from the one or more cameras is processed to detect therein a portion corresponding to the object, which is recognized from among a set of pre-selected real world objects. An identifier of the object is then used, with a set of associations that associate object identifiers and identifiers of users, to obtain a user identifier that identifies a user at least partially among a set of users.
- the user identifier identifies the user generically, as belonging to a particular group of users (also called “weak identification”) among several such groups.
- the user identifier identifies a single user uniquely (also called “strong identification”), among all such users in the set.
- the user identifier (obtained, as noted above, by use of an association and the object identifier) is thereafter used in several such embodiments, either alone or in combination with user-supplied information, to obtain and store in memory, information to be output to the user. At least a portion of the obtained information is thereafter output, for example by projection into the scene.
- FIG. 1A illustrates a portable real world object 132 (e.g. a stapler) being imaged by a camera 121 , and its use by processor 100 in an electronic device 120 to cause a projection of information into scene 130 by performing acts illustrated in FIG. 1C , in several embodiments described herein.
- a portable real world object 132 e.g. a stapler
- FIG. 1B illustrates another portable real world object 142 (e.g. a tape dispenser) being imaged by another electronic device 150 in a manner similar to electronic device 120 of FIG. 1A , to display information 119 T on a screen 152 of device 150 , also by performing the acts of FIG. 1C in several embodiments described herein.
- another portable real world object 142 e.g. a tape dispenser
- FIG. 1C illustrates in a high-level flow chart, acts performed by one or more processors 100 (e.g. in one of electronic devices 120 and 150 of FIGS. 1A and 1B ), to use an association of an object detected in an image to obtain information to display, in some embodiments described herein.
- processors 100 e.g. in one of electronic devices 120 and 150 of FIGS. 1A and 1B
- FIG. 1D illustrates in a high-level block diagram, a set of associations 111 in a memory 110 used by processor 100 of FIG. 1C , in some of the described embodiments.
- FIG. 1E illustrates association of portable real world object 132 with group 2 by user 135 making a hand gesture with two outstretched fingers adjacent to portable real world object 132 , in certain embodiments.
- FIG. 1F illustrates use of the association of FIG. 1E to obtain and display information 188 for group 2 , in certain embodiments.
- FIG. 1G illustrates changing the association of FIG. 1E by replacing group 1 with group 2 by user 135 making another hand gesture with index finger 136 pointing at portable real world object 132 , in such embodiments.
- FIG. 1H illustrates use of the association of FIG. 1G to obtain and display information 189 A and 189 B for group 1 , in such embodiments.
- FIGS. 1I and 1J illustrate colored beams 163 and 164 of blue color and green color respectively projected on to object 132 by a projector 122 in certain embodiments.
- FIG. 2A illustrates in a high-level flow chart, acts performed by one or more processors 100 to use an association of an object detected in an image in combination with user input, to obtain information to display, in some embodiments described herein.
- FIG. 2B illustrates in a high-level block diagram, a set of associations 211 in a memory 110 used by processor 100 of FIG. 2A , in some of the described embodiments.
- FIG. 2C illustrates real world object 132 in scene 230 imaged by camera 121 , for use in projection of information in certain embodiments.
- FIG. 2D illustrates processor 100 coupled to memory 110 that contains image 109 as well as an address 291 (in a table 220 ) used to obtain information for display in some embodiments.
- FIG. 2E illustrates display of information in the form of a video 295 projected adjacent to object 132 in some embodiments.
- FIGS. 3A and 3F illustrate in high-level flow charts, acts performed by one or more processors 100 to use an association of an object detected in an image with a single person, to obtain information to display specific to that person, in some embodiments described herein.
- FIG. 3B illustrates in a high-level block diagram, a set of associations 311 in a memory 110 used by processor 100 of FIGS. 3A and 3F , in some of the described embodiments.
- FIG. 3C illustrates, a user interface included in information projected adjacent to object 132 and in some embodiments.
- FIGS. 3D and 3E illustrate, a user's personalized information included in information projected adjacent to object 132 (in FIG. 3D ) and onto object 132 (in FIG. 3E ) in some embodiments.
- FIG. 4A illustrates, in a high-level block diagram, a processor 100 coupled to a memory 110 in a mobile device 120 of some embodiments.
- FIG. 4B illustrates in a high-level flow chart, acts performed by processor 100 of FIG. 4A in projecting information into scene 130 in several embodiments.
- FIGS. 4C and 4D illustrate in intermediate-level flow charts, acts performed by processor 100 in projecting information in certain embodiments.
- FIG. 5 illustrates, in a high-level block diagram, a mobile device 120 of several embodiments.
- one or more device(s) 120 use one or more cameras 121 (and/or sensors such as a microphone) to capture input e.g. one or more images 109 ( FIG. 1B ) from a scene 130 ( FIG. 1A ) that contains an object 132 ( FIG. 1A ).
- object 132 can be any object in the real world (in scene 130 ) that is portable by a human, e.g. small enough to be carried in (and/or moved by) a human hand, such as any handheld object. Examples of object 132 are a stapler, a mug, a bottle, a glass, a book, a cup, etc.
- device(s) 120 can be any electronic device that includes a camera 121 , a memory 110 and a processor 100 , such as a smartphone or a tablet (e.g. iPhone or iPad available from APPLE, Inc.).
- a smartphone e.g. iPhone or iPad available from APPLE, Inc.
- the following description refers to a single device 120 performing the method of FIG. 1C , although multiple such devices can be used to individually perform any one or more of steps 101 - 108 , depending on the embodiment.
- one or more captured images 109 are initially received from a camera 121 (as per act 101 in FIG. 1C ) e.g. via bus 1113 ( FIG. 5 ) and stored in a memory 110 .
- Processor 100 then processes (as per act 102 in FIG. 1C ) the one or more images 109 to detect the presence of an object 132 ( FIG. 1A ) e.g. on a surface of a table 131 , in a scene 130 of real world outside camera 121 (see act 102 in FIG. 1C ).
- processor 100 may be programmed to recognize a portion of image 109 corresponding to portable real world object 132 , to obtain an identifier (“object identifier”) 1120 that uniquely identifies object 132 among a set of predetermined objects.
- object identifier an identifier
- Processor 100 then uses the object identifier (e.g. stapler identifier 1120 in FIG. 1D ) that is obtained in act 102 to look up a set of associations 111 in an act 103 ( FIG. 1C ).
- the result of act 103 is an identifier of a user (e.g. user identifier 112 U) who has been previously associated with portable real world object 132 (e.g. stapler).
- memory 110 that is coupled to processor 100 holds a set of associations 111 including for example an association 112 ( FIG.
- association 111 may be created in different ways depending on the embodiment, and in some illustrative embodiments an association 112 in set 111 is initialized or changed in response to a hand gesture by a user, as described below.
- user identifiers 112 U and 114 U which are used in associations 112 and 114 do not identify a single user uniquely, among the users of a system of such devices 120 . Instead, in these embodiments, user identifiers 112 U and 114 U identify corresponding groups of users, such as a first group 1 of users A, B and C, and a second group 2 of users X, Y and Z (users A-C and X-Z are not shown in FIG. 1D ). In such embodiments (also called “weak identification” embodiments), a user identifier 112 U that is obtained in act 103 (described above) is generic to several users A, B and C within the first group 1 . In other embodiments (also called “strong identification” embodiments), each user identifier obtained in act 103 identifies a single user uniquely, as described below in reference to FIGS. 3A-3F .
- processor 100 of several embodiments uses a user identifier 112 U looked up in act 103 , to generate an address of information 119 ( FIG. 5 ) to be output to the user and then obtains and stores in memory 110 (as per act 104 in FIG. 1C ) the information 119 .
- information 119 to be output obtained in act 104 is common to all users within that group 1 (e.g. common to users A, B and C).
- information 119 includes the text “Score: 73” which represents a score of this group 1 , in a game being played between two groups of users, namely users in group 1 and users in group 2 .
- Information 119 is optionally transformed and displayed or otherwise output to user 135 , in an act 105 (see FIG. 1C ) as information 119 T ( FIG. 1A ). Specifically, in some embodiments, information 119 is displayed by projection of at least a portion of the information (e.g. the string of characters “Score: 73”) into scene 130 by use of a projector 122 as illustrated in FIG. 1A . In other embodiments, information 119 (or a portion thereof) may be output by act 105 ( FIG. 1C ) in other ways, e.g. device 150 ( FIG. 1B ) displaying information 119 to user 135 directly on a screen 151 ( FIG. 1B ) that also displays a live video of scene 130 (e.g. by displaying image 109 ), thereby to provide an augmented reality (AR) display.
- AR augmented reality
- information 119 may be played through a speaker 1111 ( FIG. 5 ) in device 120 , or even through a headset worn by user 135 .
- device 150 is a smartphone that includes a front-facing camera 152 ( FIG. 1B ) in addition to a rear-facing camera 121 that captures image 109 .
- Front-facing camera 152 ( FIG. 1B ) is used in some embodiments to obtain an image of a face of user 135 , for use in face recognition in certain strong identification embodiments described below in reference to FIGS. 3A-3F .
- Device 150 ( FIG. 1B ) may be used in a manner similar or identical to use of device 120 as described herein, depending on the embodiment.
- Processor 100 ( FIG. 5 ) of certain embodiments performs act 103 after act 102 when associations 111 ( FIG. 1D ) have been previously formed and are readily available in memory 110 .
- Associations 111 may be set up in memory 110 based on information input by user 135 (“user input”) in any manner, as will be readily apparent in view of this detailed description.
- user input in the form of text including words spoken by user 135 are extracted by some embodiments of processor 100 operating as a user input extractor 141 E, from an audio signal that is generated by a microphone 1112 ( FIG. 5 ) in the normal manner.
- user input e.g.
- processor 100 is programmed to respond to user input sensed by one or more sensors in device 120 , (e.g. camera 121 ) that detect one or more actions (e.g. gestures) by a user 135 as follows: processor 100 associates an object 132 that is selectively placed within a field of view 121 F of camera 121 with an identifier (e.g. a group identifier) that depends on user input (e.g. hand gesture).
- identifier e.g. a group identifier
- any person e.g. user 135 can use their hand 138 to form a specific gesture (e.g. tapping on object 132 ), to provide user input via camera 121 to processor 100 that in turn uses such user input in any manner described herein.
- processor 100 may use input from user 135 in the form of hand gestures (or hand shapes) captured in a video (or still image) by camera 121 , to initialize or change a user identifier that is generic and has been associated with object 132 , as illustrated in FIGS. 1E-1H , and described below.
- some embodiments accept user input in other forms, e.g.
- audio input such as a whistling sound, and/or a drumming sound and/or a tapping sound, and/or sound of text including words “Group Two” spoken by user 135 may be used to associate an object 132 that is imaged within image 109 with a user identifier that is generic (e.g. commonly used to identify multiple users belonging to a particular group).
- user input extractor 141 E is designed to be responsive to images from camera 121 of a user 135 forming a predetermined shape in a gesture 118 , namely the shape of letter “V” of the English alphabet with hand 138 , by stretching out index finger 136 and stretching out middle finger 137 .
- camera 121 images hand 138 , with the just-described predetermined shape “V” at a location in real world that is adjacent to (or overlapping) portable real world object 132 ( FIG. 1E ).
- processor 100 is programmed to perform act 106 ( FIG. 1C ) to extract from image 109 ( FIG.
- processor 100 responds to detection of such a hand gesture by forming an association (in the set of associations 111 ) that is thereafter used to identify person 135 (as belonging to group 2 ) every time this same hand gesture is recognized by processor 100 .
- processor 100 for processor 100 to be responsive to a hand gesture, user 135 is required to position fingers 136 and 137 sufficiently close to object 132 so that hand gesture fingers 136 , 137 and object 132 are all imaged together within a single image 109 (which may be a still image, or alternatively a frame of video, depending on the embodiment) from camera 121 .
- a single image 109 which may be a still image, or alternatively a frame of video, depending on the embodiment
- processor 100 when user 135 makes the same hand gesture, but outside the field of view 121 F of camera 121 ( FIG. 5 ), processor 100 does not detect such a hand gesture and so processor 100 does not use make or change any association, even when object 132 is detected in image 109 .
- an act 106 is performed by processor 100 after act 102 , to identify the above-described hand gesture (or any other user input depending on the embodiment), from among a library of such gestures (or other such user input) that are predetermined and available in a database 199 on a disk (see FIG. 2D ) or other non-transitory storage media.
- processor 100 performs a look up of a predetermined mapping 116 ( FIG. 1D ) based on the hand gesture (or other such user input) detected in act 106 to obtain a user identifier from the set of associations 111 ( FIG. 5 ).
- two-finger gesture 118 or other user input, e.g.
- whistling or drumming is related by a mapping 116 to an identifier 114 U of group 2 , and therefore in an act 108 this identifier (looked-up from mapping 116 ) is used to form association 114 in the set 111 .
- processor 100 of several embodiments proceeds to obtain information 119 to be output (as per act 104 , described above), followed by optional transformation and output (e.g. projection as per act 105 ), as illustrated in FIG. 1F .
- object 132 was associated with group 1 , and therefore information 119 T which is output into scene 130 includes the text string “Score: 73” which represents a score of group 1 , in the game being played with users in group 2 .
- the same user 135 can change a previously formed association, by making adjacent to the same portable real world object 132 , a second hand gesture 117 (e.g.
- index finger 136 outstretched) that is different from a first hand gesture 118 (e.g. index finger 136 and middle finger 137 both stretched out).
- the hand gesture is made by user 135 sufficiently close to object 132 to ensure that the gesture and object 132 are both captured in a common image by camera 121 .
- Such a second hand gesture 117 ( FIG. 1D ) is detected by processor 100 in act 106 , followed by lookup of mapping 116 , followed by over-writing of a first user identifier 114 U that is currently included in association 114 with a second user identifier 112 U, thereby to change a previously formed association.
- information including the text string “Score: 73” previously displayed (see FIG. 1F ) is now replaced with new information including the text string “Score:0” which is the score of Group 2 as shown in FIG. 1G .
- one or more additional text strings may be displayed to identify the user(s).
- the text string “Group 2 ” is displayed as a portion of information 188 T in FIG. 1G
- the text string “Group 1 ” is displayed as a portion of information 188 T in FIG. 1F
- information 188 T is optionally transformed for display relative to information 188 that is obtained for output, resulting in multiple text strings of information 188 being displayed on different surfaces, e.g. information 189 A displayed on object 132 and information 189 B displayed on table 131 as shown in FIG. 1H .
- a mapping 116 maps hand gestures (or other user input) to user identifiers of groups
- each hand gesture (or other user input) may be mapped to a single user 135 , thereby to uniquely identify each user (“strong identification embodiments”), e.g. as described below in reference to FIGS. 3A-3F .
- yet another data structure (such as an array or a linked list) identifies a group to which each user belongs, and processor 100 may be programmed to use that data structure to identify a user's group when needed for use in associations 111 .
- processor 100 of some embodiments simply uses recognition of a user's hand gesture (or other user input) to select Group 2 from among two groups, namely Group 1 and Group 2 .
- a user's hand gesture or other user input
- processor 100 uses recognition of a user's hand gesture (or other user input) to select Group 2 from among two groups, namely Group 1 and Group 2 .
- the identity of object 132 is not used in some embodiments to obtain the to-be-displayed information.
- processor 100 do use two identifiers based on corresponding detections in image 109 as described above, namely user identifier and object identifier, to obtain and store in memory 110 the to-be-displayed information 119 in act 104 ( FIG. 1C ).
- the to-be-displayed information 119 may be obtained by processor 100 based on recognition of (1) only a hand gesture or (2) only the real world object, or (3) a combination of (1) and (2), depending on the embodiment.
- a hand gesture is not required in certain embodiments of processor 100 that accepts other user input, such as an audio signal (generated by a microphone) that carries sounds made by a user 135 (with their mouth and/or with their hands) and recognized by processor 100 on execution of appropriate software designed to recognize such user-made sounds as user input, e.g. in signals from microphone 1112 ( FIG. 5 ).
- a user identifier with which portable real world object 132 is associated is displayed as text, as illustrated by text string 189 A in FIG. 1 H
- a user identifier may be displayed as color, as illustrated in FIGS. 1I and 1J .
- object 132 in the form of a cap 132 of a bottle is selected by a user 135 to be included in an image of a scene 130 of real world being captured by a camera 121 ( FIG. 5 ).
- identity 214 of bottle cap 132 is associated in a set of associations 211 (see FIG. 2B ) by default with three users that are identified as a group of friends by identity 215 , and this association is shown by device 120 , e.g. by projecting a beam 163 of blue color light on object 132 and in a peripheral region outside of and surrounding object 132 (denoted by the word of text ‘blue’ in FIG. 1I , as colors are not shown in a black-and-white figure). In this example, color blue has been previously associated with the group of friends of identity 215 , as the group's color.
- a book's identity 212 is associated in the set of associations 211 (see FIG. 2B ) by default with four users that are identified as a group (of four students) by identity 213 (e.g. John Doe, Jane Wang, Peter Hall and Tom McCue).
- a person 135 identified as a user of the group of students associates his group's identity 213 (see FIG. 2B ) with object 132 ( FIG. 1J ) by tapping table surface 131 with index finger 136 repeatedly several times in rapid succession (i.e. performs a hand gesture to which processor 100 is programmed to recognize and suitably respond), until person 135 sees a projection of beam 164 of green light on and around object 132 (denoted by the word ‘green’ in FIG. 1J , as this figure is also a black-and-white figure).
- color green was previously associated with the group of students of identity 213 , as its group color.
- tapping is another form of hand gesture recognized by processor 100 , e.g. on processing a camera-generated image containing the gesture and optionally user input in the form of sound, or both in some illustrative embodiments.
- processor 100 performs acts 201 - 203 that are similar or identical to acts 101 - 103 described above in reference to FIG. 1C .
- act 201 one or more rear-facing cameras 121 are used to capture scene 130 ( FIG. 2C ) that includes real world object 132 (in the form of a book) and store image 109 ( FIG. 2D ) in memory 110 in a manner similar to that described above, although in FIGS. 2C and 2D , the object 132 being imaged is a book.
- processor 100 In performing acts 202 - 206 in FIG. 2A processor 100 not only recognizes object 132 as a book in image 109 ( FIG. 2D ) but additionally recognizes a text string 231 A therein ( FIG. 2C ), which is identified by a hand gesture.
- processor 100 operates as a user-input extractor 141 E ( FIG. 5 ) that obtains input from user 135 for identifying information to be obtained for display in act 205 (which is similar to act 105 described above).
- user input is received in processor 100 by detection of a gesture in image 109 ( FIG. 2D ) in which object 132 has also been imaged by camera 121 ( FIG. 2C ).
- user 135 makes a predetermined hand gesture, namely an index finger hand gesture 117 by stretching finger 136 of hand 138 to point to text string 231 A in portable real world object 132 .
- This index finger hand gesture 117 is captured in one or more image(s) 109 ( FIG. 2D ) in which object 132 is also imaged (e.g. finger 136 overlaps object 132 in the same image 109 ).
- the imaged gesture is identified by use of a library of gestures, and an procedure triggered by the index finger hand gesture 117 is performed in act 204 , including OCR of an image portion identified by the gesture, to obtain as user input, the string of characters “Linear Algebra.”
- processor 100 operates as an object-user-input mapper 141 M ( FIG. 5 ) that uses both: (1) the user group identified in act 203 from the presence of object 132 and (2) the user input identified from a text string 231 A (e.g. “Linear Algebra”) detected by use of a gesture identified in act 204 ( FIG. 2A ), to generate an address 291 (in a table 220 in FIG. 2B ).
- a user group identified by act 203 may be first used by object-user-input mapper 141 M ( FIG. 5 ) to identify table 220 ( FIG. 2B ) from among multiple such tables, and then the identified table 220 is looked up with the user-supplied information, to identify an address 291 , which may be accessible on the Internet.
- Such an address 291 is subsequently used (e.g. by information retriever 141 R in FIG. 5 ) to prepare a request for fetching from Internet, a video that is associated with the string 231 A.
- processor 100 in act 205 ( FIG. 2A ) processor 100 generates address 291 as http://ocw.mit.edu/courses/ mathematics/18-06-linear-algebra-spring-2010/video-lectures/ which is then used to retrieve information 119 ( FIG. 5 ).
- Use of table 220 as just described enables a query that is based on a single common text string 231 A to be mapped to different addresses, for information to be displayed to different groups of users.
- a processor for one user A retrieves an address of the website of Stanford Distance learning course (namely http://scpd.stanford.edu/coursesSeminars/seminarsAndWebinars.jsp) from one table (e.g. customized for user A) while processor 100 for another user B retrieves another address 291 for MIT's OpenCourseware website (namely http://videolectures.net/mit_ocw/) from another table 220 (e.g. customized for user B).
- Stanford Distance learning course namely http://scpd.stanford.edu/coursesSeminars/seminarsAndWebinars.jsp
- processor 100 for another user B retrieves another address 291 for MIT's OpenCourseware website (namely http://videolectures.net/mit_ocw/) from another table 220 (e.g. customized for user B).
- Such an address 291 that is retrieved by processor 100 using a table 220 , in combination with one or words in text string 231 A may be used in some embodiments with an Internet-based search service, such as the website www.google.com to identify content for display to user 135 .
- processor 100 issues a request to address 291 in accordance with http protocol and obtains as information to be output, a video stream from the Internet, followed by optional transformation and projection of the information, as described below.
- FIG. 2E A result of performing the just-described method of FIG. 2A is illustrated in FIG. 2E by a video 295 shown projected (after any transformation, as appropriate) on a surface of table 131 adjacent to object 132 .
- video 295 has been automatically selected by processor 100 and is being displayed, based at least partially on optical character recognition (OCR) to obtain from one or more images (e.g. in video 295 ) of object 132 , a text string 231 A that has been identified by an index finger hand gesture 117 .
- OCR optical character recognition
- no additional input is needed by processor 100 from user 135 , after the user makes a predetermined hand gesture to point to text string 231 A and before the video is output, e.g.
- no further user command is needed to invoke a video player in device 120 , as the video player is automatically invoked by processor 100 to play the video stream retrieved from the Internet.
- Other such embodiments may require user input to approve (e.g. confirm) that the video stream is to be displayed.
- text string 231 A is recognized from among many such strings that are included in image 109 , based on the string being located immediately above a tip of index finger 136 which is recognized by processor 100 in act 204 as a portion of a predetermined hand gesture.
- human finger 136 is part of a hand 138 of a human user 135 and in this example finger 136 is used in gesture 117 to identify as user input a string of text in scene 130 , which is to trigger retrieval of information to be output.
- index finger hand gesture 117 as illustrated in FIGS. 2C and 2D
- user 135 makes a circling motion with finger 136 around text string 231 A as a different predetermined gesture that is similarly processed.
- processor 103 completes recognition of real world object 132 (in the form of a book in FIG. 2C ), in this example by recognizing string 231 A. Thereafter, processor 103 generates a request to a source on the Internet to obtain information to be projected in the scene for use by person 135 (e.g. based on a generic user identifier of person 135 as belonging to a group of students).
- user interfaces in certain embodiments of the type described above in reference to FIGS. 2A-2E automatically project information 119 on real world surfaces using a projector 122 embedded in a mobile device 120 .
- user interfaces of the type shown in FIGS. 2A-2E reverse the flow of information of prior art user interfaces which require a user 135 to explicitly look for information, e.g. prior art requires manually using a web browser to navigate to a web site at which a video stream is hosted, and then manually searching for and requesting download of the video stream.
- user interfaces of the type shown in FIGS. 2A-2E automatically output information that is likely to be of interest to the user, e.g. by projection on to surfaces of objects in real world, using an embedded mobile projector.
- mobile device 120 may be any type of electronic device with a form factor sufficiently small to be held in a human hand 138 (similar in size to object 132 ) which provides a new way of interacting with user 135 as described herein.
- user 135 may use such a mobile device 120 in collaboration with other users, with contextual user interfaces based on everyday objects 132 , wherein the user interfaces overlap for multiple users of a specific group, so as to provide common information to all users in that specific group (as illustrated for group 1 in FIG. 1F and group 2 in FIG. 1G ).
- processor 100 may be programmed in some embodiments to automatically contextualize a user interface, by using one or more predetermined techniques to identify and obtain for display information that a user 135 needs to view, when interacting with one or more of objects 132 .
- processor 120 in response to user 135 opening book (shown in FIG. 2C as object 132 ) to a specific page (e.g. page 20), processor 120 automatically processes images of real world to identify therein a text string 231 A based on its large font size relative to the rest of text on page 20. Then, processor 100 of several embodiments automatically identifies a video on “Linear Algebra” available on the Internet e.g. by use of a predetermined website (as described above in reference to act 205 ), and then seeks confirmation from user 135 that the identified video should be played. The user's confirmation may be received by processor 100 in a video stream that contains a predetermined gesture, e.g. user 135 waving of index finger in a motion to make a check mark (as another predetermined gesture that identifies user input).
- a predetermined gesture e.g. user 135 waving of index finger in a motion to make a check mark
- Processor 100 is programmed in some embodiments to implement strong identification embodiments, by performing one or more acts 301 - 309 illustrated in FIG. 3A , by use of user identifiers that uniquely identify a single user X from among all users A-Z.
- information 319 FIGS. 3D and 3E
- FIGS. 3D and 3E information 319 that is obtained for display may be specific to that single user X, for example email messages that are specifically addressed to user X.
- processor 100 performs acts 301 and 302 that are similar or identical to acts 101 and 102 described above in reference to FIG. 1C .
- processor 100 uses an identifier of the portable real world object with a set of associations 311 ( FIG. 3B ), to obtain an identifier that uniquely identifies a user of the portable real world object.
- act 303 is similar to act 103 except for the set of associations 311 being used in act 303 .
- an association 314 maps an object identifier 1140 (such as a bottle cap ID) to a single user 314 U (such as Jane Doe), as illustrated in FIG. 3B .
- user 314 U is uniquely associated with object 132 in the form of a bottle cap ( FIG. 1E ), i.e. no other user is associated with a bottle cap.
- object 132 in the form of a bottle cap ( FIG. 1E )
- other users may be associated with other such portable real world objects (e.g. book in FIG. 2C or stapler in FIG. 1A ), but not with a bottle cap as it has already been uniquely associated with user 314 U ( FIG. 3B ).
- a user identifier which is obtained in act 303 is used in act 304 ( FIG. 3A ) to obtain and store in memory 110 , information 319 that is specific to user 314 U (of the portable real world object 132 ).
- This information 319 is then displayed, in act 305 , similar to act 105 described above, except that the information being displayed is specific to user 314 U as shown in FIGS. 3D and 3E .
- the to-be-displayed information 319 may be obtained in act 304 from a website www.twitter.com, specific to the user's identity.
- the to-be-displayed information 319 received by processor 100 is personalized for user 135 , based on user name and password authentication by the website www.twitter.com.
- the personalized information 319 illustrated in one example is from www.twitter.com other websites can be used, e.g. an email website such as http://mail.yahoo.com can be used to obtain other such information personalized for user 135 .
- user-supplied text (for use in preparing associations 311 ) is received by processor 100 via an authentication (also called login) screen.
- an act 306 is performed by processor 100 after act 302 , to display an authentication screen.
- authentication screen 321 is projected on to table 131 adjacent to object 132 as part of information 322 as shown in FIG. 3C .
- FIG. 3C In the example illustrated in FIG.
- processor 100 obtains the authentication screen to be displayed in act 306 from a computer (not shown) accessible on the Internet, such as a web server at the website www.twitter.com, and this screen is of a user interface such as a dialog box that prompts the user to enter their user name and password.
- processor 100 automatically includes adjacent to such a dialog box, a layout image 333 of a keyboard in information 322 that is projected into scene 130 .
- a layout image 333 of a keyboard in information 322 that is projected into scene 130 .
- processor 100 recognizes additional images (similar to image 109 described above) and generates user input by performing Optical Character Recognition (OCR), and such user input in the form of text string(s) is stored in memory 110 and then sent to the website www.twitter.com.
- OCR Optical Character Recognition
- the same user input is also used in act 308 by some embodiments of processor 100 to identify the user (i.e.
- a user name and password received as the user-supplied text may be locally checked against table 391 by processor 100 .
- processor 100 performs act 309 .
- processor 100 prepares an association 314 in set 311 , to associate an identifier 114 U of object 132 with an identifier 314 U (e.g. name) of the user identified in the authentication screen 321 (projected adjacent to object 132 ), in response to user input being authenticated.
- acts 304 and 305 are performed as described above. Note that in other embodiments, a different user name and password may be used locally by processor 100 .
- the user is authenticated two times, once by processor 100 locally (when the user enters their user name and password to login, into device 120 ), and another time by a remote computer (or web server) that supplies information 119 ( FIG. 5 ) to be output (e.g. at the website www.twitter.com).
- a single authentication is sufficient, e.g. the user name and password that were used to log into device 120 are automatically used (directly or indirectly via a table lookup) to communicate with the remote computer (or web server), to obtain information 119 ( FIG. 5 ).
- a user name that was used to log into device 120 is also used to identify a table 220 (among multiple such tables) used in identifying address 291 , for obtaining information 119 ( FIG. 5 )
- User-specific information 319 that is obtained in act 304 is typically displayed as per act 305 at a location adjacent to object 132 ( FIG. 3D ) or alternatively on object 132 itself ( FIG. 3E ) in order to reduce the likelihood of snooping by users other than user 135 with whom object 132 is uniquely associated.
- Prior to display (e.g. by projection) in act 305 such information may be optionally transformed.
- a specific technique for transformation that is selected for projection of user-specific personalized information can depend on a number of factors, such as the smoothness and/or shape and size and/or a dynamically computed surface normal (or gradient) of object 132 , and/or resolution and legibility of information 119 that is to be projected, etc.
- a transformation technique that is selected may also edit information 119 to be output, e.g. truncate or abbreviate the information, omit images, or down-scale images etc. depending on the embodiment.
- FIG. 3F Although certain embodiments to implement strong identification described above use an authentication screen, other embodiments use two cameras to perform a method of the type illustrated in FIG. 3F , wherein one camera is embedded with a projector 122 in device 120 and other camera is a normal camera included in a mobile device such as a smart phone (or alternatively external thereto, in other form factors). Hence, in the method of FIG. 3F , two cameras are operated to synchronize (or use) hand gestures with a person's face, but otherwise many of the acts are similar or identical to the acts described above in reference to FIG. 3A .
- a back-facing camera 121 in mobile device 150 captures an image 109 of scene 130 . Detection in such an image 109 of a portion that corresponds to the specific hand gesture (as per act 396 in FIG. 3F ) triggers processor 100 to perform act 397 to operate a front-facing camera 152 ( FIG. 1B ). Front-facing camera 152 ( FIG. 1B ) then captures an image including a face of the user 135 , and the image is then segmented (e.g. by module 141 S in FIG. 5 ) to obtain a portion corresponding to the face which is used (in module 141 S) by processor 100 performing act 398 to determine a user identifier (e.g. perform authentication).
- a user identifier e.g. perform authentication
- processor 100 of some embodiments compares the image portion corresponding to the face to a database 199 ( FIG. 2D ) of faces, and on finding a match obtains from the database an identifier of the user (selected from among user identifiers of faces in the database).
- the user's face 120 is received by processor 100 from a front facing camera 152 and is detected as such in act 398 , thereby resulting in a unique identifier for the user that may be supplied to user input extractor 141 E for use in preparing an association, to associate the user identifier with an object identifier, in response to detecting a predetermined gesture adjacent to the object.
- Illustrative embodiments of a segmentation module 141 S FIG.
- act 309 is performed as described above in response to detection of the gesture, to prepare an association so that object 132 (e.g. bottle cap) is identified as belonging to person 135 .
- object 132 e.g. bottle cap
- an object identifier of portable real world object 132 in image 109 is automatically identified by processor 100 using data 445 (also called “object data”, see FIG. 4A ) on multiple real world objects in a database 199 that is coupled to processor 100 in the normal manner.
- processor 100 recognizes object 132 to be a bottle cap (which is identified by an identifier ID 1 ), based on attributes in data 441 D ( FIG. 4A ) in database 199 matching attributes of a portion 132 I of image 109 ( FIG. 4A ) received as per act 431 ( FIG. 4B ).
- processor 100 is programmed with software to operate as an object extractor 141 O (see FIG. 5 ) which determines feature vectors from an image 109 , and compares these feature vectors to corresponding feature vectors of objects that are previously computed and stored in a database 199 , to identify an object. Comparison between feature vectors can be done differently depending on the embodiment (e.g. using Euclidean distance).
- object extractor 141 O identifies from database 199 an object 132 that most closely matches the feature vectors from image 109 , resulting in one or more object identifiers 112 O, 112 U ( FIG. 5 ).
- processor 100 is programmed with software to identify clusters of features that vote for a common pose of an object (e.g. using the Hough transform).
- bins that accumulate a preset minimum number of votes are identified by object extractor 141 O, as object 132 .
- object extractor 141 O extract SIFT features as described in the preceding paragraph, while other embodiments use a method described by Viola and Jones in a 25-page article entitled “Robust Real-time Object Detection,” in the Second International Workshop On Statistical And Computational Theories Of Vision—Modeling, Learning, Computing, And Sampling, Vancouver, Canada, Jul. 13, 2001 that is incorporated by reference herein in its entirety.
- features that are determined from an image 109 as described above are used in some embodiments by object extractor 141 O ( FIG. 5 ) to generate a geometric description of items in image 109 that is received from a camera, such as object 132 .
- Similar or identical software may be used by processor 100 to extract from image 109 , a blob 136 I ( FIG. 2D ) of a finger in a hand gesture (and/or to recognize a user's face as described herein).
- processor 100 use Haar features which consist of vector windows that are used to calculate edges, line features and center-surrounded features in an image 109 .
- vector windows are run by processor 100 across portions of image 109 ( FIG.
- processor 100 uses such features (also called “feature vectors”) differently depending upon the item to be recognized (object, or face, or hand gesture). Hence, depending on the embodiment, processor 100 uses vector windows that are different for objects, hand gestures, face recognition etc.
- Use of Haar features by processor 100 in certain embodiments has limitations, such as robustness and low fps (frames per second) due to dependency on scaling and rotation of Haar vector windows.
- processor 100 may be programmed with software to operate as object extractor 141 O ( FIG. 5 ) that uses other methods, such as a method described in an 8-page article entitled “Object Recognition from Local Scale-Invariant Features” by David G. Lowe, in Proceedings of the International Conference on Computer Vision, Corfu (September 1999), which is incorporated by reference herein in its entirety.
- object extractor 141 O of the type described above
- three dimensional (3D) surfaces of the object(s) are segmented by processor 100 into local regions with a curvature (or other such property) within a predetermined range, so that the regions are similar, relative to one another.
- processor 100 may be optionally programmed to operate as information transformer 141 T ( FIG. 5 ) to truncate or otherwise manipulate information 119 , so that the information fits within local regions of object 132 identified by segmentation. Truncation or manipulation of content in information 119 by processor 100 reduces or eliminates the likelihood that projection of information 119 on to object 132 will irregularly wrap between local regions which may have surface properties different from one another.
- Processor 100 of some embodiments segments a 3D surface of object 132 to identify local regions therein as described in a 14-page article entitled “Partitioning 3D Surface Meshes Using Watershed Segmentation” by Alan P. Mangan and Ross T. Whitaker, in IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 5, NO. 4, OCTOBER-DECEMBER 1999 which is incorporated by reference herein in its entirety.
- processor 100 operating as information transformer 141 T calibrates camera 121 ( FIG. 5 ) using any calibration method, such as the method described in an a 4-page article entitled “Camera Calibration Toolbox for Matlab” at “http://www.vision.caltech. edu/ bouguetj/calib_doc/” as available on Apr. 10, 2012, which is incorporated by reference herein in its entirety.
- information transformer 141 T FIG. 5
- FIG. 5 are programmed to determine shapes and/or surface normals of surfaces of object 132 in image 109 , using one of the following two methods described in the next paragraph.
- a first method uses a projection of light, e.g. as described in an 8-page article entitled “Dynamic scene shape reconstruction using a single structured light pattern” by Hiroshi Kawasaki et al, IEEE Conference on Computer Vision and Pattern Recognition 2008, which is incorporated by reference herein in its entirety.
- a second method also uses a projection of light, e.g. as described in a 13-page article entitled “Rapid Shape Acquisition Using Color Structured Light and Multi-pass Dynamic Programming” by Li Zhang et al, in Proc. Int. Symposium on 3D Data Processing Visualization and Transmission (3DPVT), 2002, which is incorporated by reference herein in its entirety.
- certain other data 442 D and 443 D in object data 445 may include attributes to be used by object extractor 141 O ( FIG. 5 ) in identifying various other objects such as a object 132 ( FIG. 2C ) having identifier ID 2 ( FIG. 4A ) or a cup (not shown) having identifier ID 3 , in an image 109 analyzed by processor 100 in act 432 ( FIG. 4B ).
- one or more of object identifiers ID 1 , ID 2 and ID 3 uniquely identify within processor 100 , corresponding portable real world objects, namely a bottle cap, a book and a cup when these objects are imaged in an image 109 of a scene 130 by a camera of device 120 .
- image 109 illustrated in FIG. 4A includes a portion 132 I that corresponds to the entirety of object 132
- another image that captures only a portion of object 132 may be sufficient for processor 100 to recognize object 132 (in act 432 of FIG. 4B ).
- processor 100 may be programmed to operate as object extractor 141 O to perform act 432 to recognize additional information captured from object 132 , as described above.
- each of associations 448 , 449 contains a single user's name as the user identifier associated with a corresponding object identifier.
- certain embodiments may use both forms of identification in processor 100 operating as an information retriever 141 R ( FIG. 5 ) with different types of portable real world objects and/or different information display software 141 ( FIG. 4A ), depending on the programming of processor 100 ( FIG. 4B ).
- associations 447 may be set up in different ways in database 199 , prior to their use by processor 100 , depending on the embodiment.
- processor 100 is programmed with software to operate as information identifier 141 I ( FIG. 5 ) that extracts user input (in user input extractor 141 E) and uses associations to generate an address 291 of information to be output (in object-user-input mapper 141 M).
- processor 100 is programmed with software to operate as user input extractor 141 E to process an image 109 received in act 431 or to process additional image(s) received in other such acts or to process other information captured from scene 130 (e.g.
- processor 100 may return to act 432 (described above).
- image 109 may include image portions corresponding to one or more human fingers, e.g. index finger 136 ( FIG. 1E ) and middle finger 137 are parts of a human hand 138 , of a person 135 .
- Processor 100 is programmed to operate as user input extractor 141 E ( FIG. 5 ) in act 439 A to use predetermined information (not shown in FIG. 4A ) in database 199 to recognize in image 109 ( FIG. 4A ) in memory 110 certain user gestures (e.g. index and middle finger images 136 I and 137 I outstretched in human hand image 138 I), and then use a recognized gesture to identify person 135 (e.g. as belonging to a specific group), followed by identification of information to be output.
- predetermined information not shown in FIG. 4A
- predetermined information not shown in FIG. 4A
- database 199 to recognize in image 109 ( FIG. 4A ) in memory 110 certain user gestures (e.g. index and middle finger images 136 I and 137 I
- processor 100 may perform different acts after act 439 A, to identify user 135 as per act 433 , e.g. in user input extractor 141 E.
- processor 100 is programmed to perform an act 439 D to recognize a face of the user 135 in another image from another camera.
- mobile device 120 includes two cameras, namely a rear camera 121 that images object 132 and a front camera 152 that images a user's face.
- face features feature vectors for human faces
- gesture features feature vectors for hand gestures
- processor 100 is programmed (by instructions in one or more non-transitory computer readable storage media, such as a disk or ROM) to compare a portion of an image segmented therefrom and corresponding to a face of user 138 , with a database of faces. A closest match resulting from the comparison identifies to processor 100 a user identifier, from among user identifiers of multiple faces in the database. Processor 100 then associates (e.g. in act 439 C in FIG. 4B ) this user identifier with an object identifier e.g. by user input extractor 141 E ( FIG. 5 ) preparing an association, in response to detection of a predetermined hand gesture adjacent to object 132 .
- processor 100 is programmed (by instructions in one or more non-transitory computer readable storage media, such as a disk or ROM) to compare a portion of an image segmented therefrom and corresponding to a face of user 138 , with a database of faces. A closest match resulting from the comparison identifies
- An act 439 D may be followed by processor 100 performing another additional act, such as 439 E to synchronize (or otherwise use) recognition of the user's face with the user gesture.
- a user gesture recognized in act 439 A may be used to identify person 135 by user input extractor 141 E looking up a table 451 of FIG. 4A (similar to mapping 116 in FIG. 1D ) as per act 439 B ( FIG. 4B ).
- processor 100 performs act 439 C (see FIG. 4B ) to associate a user identifier of person 135 with portable real world object 132 .
- a user identifier that is used in act 439 C depends on whether strong or weak identification is implemented by processor 100 , for real world object 132 .
- processor 100 is optionally programmed to operate as information transformer 141 T to perform act 435 .
- processor 100 of mobile device 120 identifies from among a group of predetermined techniques 461 , 462 , a specific technique 461 to transform the obtained information 188 for projection into scene 130 .
- transformation technique 461 FIG. 4A
- transformation technique 462 is to project adjacent to object 132
- one of these techniques is identified prior to projection of information 188 .
- the specific technique 461 is selected (and therefore identified) automatically based on one or more surface properties of real world object 132 as determined by processor 100 , such as surface roughness (or smoothness), orientation of surface normal, color, opacity, etc, whereas in other embodiments processor 100 uses user input (e.g. in the form of spoken words) that explicitly identify a specific technique to be used.
- processor 100 uses user input (e.g. in the form of spoken words) that explicitly identify a specific technique to be used.
- information transformer 141 T uses the specific technique that was identified (e.g. on-object technique 461 ) to transform information 188 (or 189 ) in memory 110 , and then supply transformed information 188 T (or 189 T) resulting from use of the specific technique to a projector 122 ( FIG. 1E ).
- projector 122 of mobile device 120 projects on object 132 in scene 130 the transformed information 188 T (or 189 T), which is thereby displayed on object 132 as illustrated in FIG. 1G .
- information transformer 141 T may perform any steps using one or more transformation techniques 461 and 462 described above.
- any other transformation technique can be used to prepare information 188 for output to a user.
- information 189 may be transformed by another technique into a first component 189 A of transformed information 189 T that is projected onto object 132 (namely the text string “Group 1 ”), and a second component 189 B of the transformed information 189 T that is projected adjacent to object 132 (namely the text string “Score 0”).
- processor 100 operates projector 122 ( FIG. 5 ) to project the transformed information 188 T (or 189 T) on to object 132 that is identified by the object identifier that was obtained in act 432 (described above).
- processor 100 operating as user input extractor 141 E receives and processes additional images in a manner similar to that described above, e.g. to receive user input 482 and store it in memory 110 ( FIG. 4A ), by recognizing one or more parts of an additional image that includes transformed information 188 T, 189 T projected into scene 130 .
- the parts of the additional image that are recognized may be, for example, another hand gesture 117 in which only one finger namely index finger 136 is outstretched as illustrated in FIG. 1H .
- processor 100 operating as user input extractor 141 E may determine that the user is now part of Group 1 , and therefore now obtains information 189 of Group 1 (see FIG. 4A ) in act 434 ( FIG. 4B ), followed by output to the user (e.g. by transformation and projection) as per one or more of acts 435 - 437 described above.
- processor 100 uses recognition of a bottle cap as the portable real world object 132 to invoke execution of information transformer 141 T (e.g. instructions to perform acts 431 - 438 and 439 A- 439 E) from among multiple such softwares 141 O, 141 I, 141 R and 141 T.
- software 141 is generic to multiple objects, although in other embodiments software 141 (also called information display software) is customized for and individually associated with corresponding objects, such as, for example, a book software 442 S ( FIG. 4A ) associated with a book identified by ID 2 and a cup software 443 S ( FIG. 4A ) associated with a cup identified by ID 3 as described above in reference to FIG. 4A .
- table 451 is described above for use with user input in the form of a hand gesture to identify a user, such a table 451 can alternatively be looked up by processor 100 using an identifier of object 132 , depending on how table 451 is set up, in various embodiments described herein.
- Use of table 451 with object identifiers enables “strong” identification in some embodiments of information identifier 141 I, wherein a person 135 identifies to processor 100 (ahead of time), an object 132 that is to be uniquely associated with his/her identity.
- Other embodiments of processor 100 use both an object identifier as well as user input to look up another table 220 , which enables “weak” identification as described herein.
- Some embodiments implement one or more acts of FIG. 4B by performing one or more acts of the type described below in reference to FIGS. 4C and 4D . Note that the acts of FIGS. 4C and 4D described below can alternatively be performed in other embodiments that do not perform any act of FIG. 4B .
- Some embodiments of processor 100 are programmed to operate as object extractor 141 O to track portable real world objects that may be temporarily occluded from view of rear-facing camera 121 that captures image 109 , by performing an act 411 ( FIG. 4C ) to check the image 109 for presence of each object in a set of objects (in database 199 of FIG. 4A ) and to identify a subset of these objects as being initially present in image 109 .
- processor 100 of some embodiments adds an identifier of an object 132 in the subset to a list 498 ( FIG. 4A ), and starts a timer for that identifier, and the timer starts incrementing automatically from 0 at a preset frequency (e.g. every millisecond). Therefore, if there are multiple identifiers in list 498 for multiple objects, then correspondingly multiple timers are started, by repeated performance of act 412 for each object in the subset.
- processor 100 checks if list 498 is empty. As list 498 was just populated, it is not empty at this stage and therefore processor 100 performs act 414 ( FIG. 4C ) and then returns to act 413 . Whenever list 498 becomes empty, processor goes from act 413 via the yes branch to act 411 (described above).
- additional images are captured into memory 110 , and processed by processor 100 in the manner described herein.
- processor 100 scans through list 498 to check if each object identified in the list 498 is found in the additional image.
- processor 100 resets the timer (started in act 412 as noted above) which starts incrementing automatically again from 0.
- processor 100 removes the identifier from the list and stops the corresponding timer.
- an object 132 when an object 132 is absent from view of camera 121 for more than the preset limit, the object 132 is no longer used in the manner described above, to retrieve and display information.
- use of a timer as illustrated in FIG. 4C and described above reduces the likelihood that a display of information (e.g. the user's emails) is interrupted when object 132 that triggered the information display is accidentally occluded from view of camera 121 , e.g. for the period of time identified in the preset limit.
- the preset limit of time in some embodiments is set by processor 100 , based on one or more input(s) from user 135 , and hence a different value can be set by user 135 depending on location e.g. a first limit used at a user's home (or office) and a second limit (lower than the first limit) in a public location.
- processor 100 are optionally programmed to operate as information transformer 141 T to perform act 421 ( FIG. 4D ) to compute a value of a property of object 132 identified in image 109 , such as size of a surface, shape of the surface, orientation of surface normal, surface smoothness, etc. Then in act 422 , processor 100 checks if the property's value satisfies a predetermined test on feasibility for projection on to object 132 .
- processor 100 may check if the surface area of a surface of object 132 is large enough to accommodate the to-be-projected information, and/or if object 132 has a color that is sufficiently neutral for use as a display, and/or a normal at the surface of object 132 is oriented within a preset range relative to a projector 122 .
- Such feasibility tests are designed ahead of time, and programmed into processor 100 to ensure that the to-be-projected information is displayed in manner suitable for user 135 e.g. font size is legible. Numerous other feasibility tests, for information projection on to an object 132 in real world, will be readily apparent, in view of this detailed description.
- processor 100 goes to act 423 and uses a first technique 461 ( FIG. 4A ) to generate transformed information 119 T ( FIG. 5 ) for projection on to object 132 , followed by act 437 ( FIG. 4B ).
- first technique 461 may transform information 119 based on an orientation (in the three angles, pitch, yaw and roll) of a surface of object 132 relative to orientation of projector 122 to ensure legibility when information 119 T is rendered on object 132 .
- a specific manner in which information 119 is transformed can be different in different embodiments, and in some embodiments there is no transformation e.g. when the information 119 is to be displayed on a screen of mobile device 120 (as shown in FIG.
- processor 100 goes to act 424 and uses a second technique 462 ( FIG. 4A ) to generate transformed information 119 T for projection adjacent to (but not on to) object 132 . After performing one of acts 423 and 424 , processor 100 then goes to act 437 (described above in reference to FIG. 4B ).
- one or more of acts 421 - 424 described above are performed as described in U.S. application Ser. No. ______, Attorney Docket No. Q111570U2os, filed concurrently herewith, and entitled “Dynamic Selection of Surfaces In Real World For Projection of Information Thereon” which has been incorporated by reference above.
- processor 100 may be partially or wholly included in one or more other processor(s) and/or other computer(s) that interoperate(s) with such a mobile device 120 , e.g. by exchanging information therewith via a cellular link or a WiFi link.
- processor(s) and/or other computer(s) that interoperate(s) with such a mobile device 120 , e.g. by exchanging information therewith via a cellular link or a WiFi link.
- one camera 121 is shown in FIG. 1E , depending on the embodiment, one or more cameras (see FIG. 5 ) may be used.
- FIG. 4B While certain acts illustrated in FIG. 4B are described for some embodiments as being performed by mobile device 120 , some or all of acts in FIG. 4B may be performed by use of one or more computers and/or one or more processors and/or one or more cameras. Therefore, it is to be understood that several such embodiments use one or more devices to perform such act(s), either alone or in combination with one another.
- Mobile device 120 may be any device that includes a projector 122 and/or a camera 121 , and device 120 may include additional parts that are normally used in any hand held device, e.g. sensors, such as accelerometers, gyroscopes or the like, which may be used in one or more acts described above, e.g. in determining the pose of mobile device 120 relative to object 132 in the real world.
- sensors such as accelerometers, gyroscopes or the like, which may be used in one or more acts described above, e.g. in determining the pose of mobile device 120 relative to object 132 in the real world.
- User input 482 that is generated from images captured by camera 121 ( FIG. 1E ) allows a user to reach into scene 130 and manipulate real world object 132 directly, as opposed to on-screen based interaction, where users interact by directly touching a screen 151 of mobile device 150 ( FIG. 1B ).
- image-based user supplied information is obtained as input
- methods of the type described above in reference to FIG. 4B enable a user to use his hands in scene 130 with information projected into the real world, as the user is supplying input which changes the information being projected into scene 130 .
- projected user interfaces in information projected into a scene as described herein can have a broad range of applications.
- projected user interfaces can be used to generate user input 482 ( FIG. 4A ) by projecting information 322 including screen 321 and keyboard image 333 ( FIG. 3C ) similar to real world typing using a real keyboard.
- a projected user interface allows a user to supply input to select between different software for execution and display of projected information and/or select between different groups of users to play games, and in general to specify various parameters to software being executed by a processor 100 that generates the information which is projected into scene 130 (e.g. see FIG. 1G ).
- mobile device 120 as described herein reverse a flow of information between (A) user interfaces and (B) user input (relative to a conventional flow).
- device 120 instead of users explicitly looking for information to be displayed, several embodiments of device 120 automatically obtain and display interactive information, e.g. by projection on real world surfaces using an embedded mobile projector.
- Other embodiments may display information as described herein on a screen that is supported on an eye-glass frame worn by a user, for example.
- Still other embodiments may display information as described herein on a screen that forms an integral portion of a smart phone (such as a touch screen), in the normal manner.
- mobile device 120 may be used to receive input from a user, e.g. an IR camera may be used to receive user input in the form of hand gestures.
- IR camera may be used to receive user input in the form of hand gestures.
- hand gesture recognition systems may be implemented in several embodiments of a mobile device 120 as described herein.
- an embedded projector in mobile device 120 projects a cell phone's normal display on everyday surfaces such as a surface of a wall or a surface of a desk, with which a user interacts using hand gestures.
- mobile device 120 may be any electronic device that is portable by hand, such as a cellular or other wireless communication device, personal communication system (PCS) device, personal navigation device (PND), Personal Information Manager (PIM), Personal Digital Assistant (PDA), laptop, tablet, or an eye glass frame that supports a display to be worn on a person's face, a headset, a camera, or other suitable mobile device that is capable of imaging scene 130 and/or projecting information into scene 130 .
- a single device 120 includes both camera 121 and projector 122 whereas in other embodiments one such device includes camera 121 and another such device includes projector 122 and both devices communicate with one another either directly or via a computer (not shown).
- a prototype of mobile device 120 is built with custom hardware (PCB) board taped onto the back of a smartphone (e.g. GALAXY NEXUS available from Samsung Electronics Co. Ltd).
- PCB custom hardware
- One such approach performs computation in the infrared (IR) spectrum.
- IR infrared
- hand and body tracking is robust and very accurate, although additional hardware may be integrated within such a smartphone 120 used for display of information.
- Some embodiments of device 120 use IR sensors (e.g. in an IR camera) that have been proven to work on commercially successful platforms, such as the Xbox Kinect.
- Certain embodiments of device 120 implement augmented reality (AR) applications using marker patterns, such as checkerboard for camera calibration and detection of objects within a scene of real world, followed by use of object identifiers to display information, as described herein.
- AR augmented reality
- mobile device 120 may be programmed with software 141 that uses a mobile-projector system in combination with a camera.
- An embedded projector 122 is used in such embodiments to display information 119 T on everyday surfaces such as a wall, with which users interact using hand gestures.
- mobile device 120 is operatively coupled to an external IR camera 1006 that tracks an IR laser stylus (not shown), or gloves with one or more IR LEDs 1121 , 1122 ( FIG. 5 ) mounted at the finger tips (also called IR gloves).
- External IR camera 1006 is used in some embodiments in a manner similar or identical to receipt of IR images and tracking of objects within the images by use of an IR camera in a remote control device (also called “Wiimote” or “Wii Remote”) for gaming console Wii, available from Nintendo Co. Ltd. So, IR camera 1006 may be used in some embodiments as described in a section entitled “Tracking Your Fingers with the Wiimote” in a 2-page article available at http://johnnylee.net/projects/ wii/ as available on Apr. 9, 2012, which is incorporated by reference herein in its entirety. Alternatively, some non-IR embodiments of device 120 use one or more normal RGB (red-green-blue) CMOS cameras 121 ( FIG. 5 ) to capture an image of scene 130 including object 132 .
- RGB red-green-blue
- An object extractor 141 O in a mobile device 120 of the type described herein may use any known object recognition method, based on “computer vision” techniques.
- a mobile device 120 may also include means for controlling operation of a real world object 132 (that may be electronic) in response to user input of the type described above such as a toy equipped with an IR or RF transmitter or a wireless a transmitter enabled to receive and/or transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network.
- mobile device 120 may additionally include a graphics engine 1004 to generate information 119 to be output, an image processor 1005 to process image(s) 109 and/or transform information 119 , and a read only memory (ROM) 1007 to store firmware and/or constant data.
- Mobile device 120 may also include a disk 1008 to store software and/or database 199 for use by processor 100 .
- Mobile device 120 may further include a wireless transmitter and receiver 1010 and/or any other communication interfaces 1009 , sensors 1003 , a touch screen 1001 or other screen 1002 , a speaker 1111 and a microphone 1112 .
- Some embodiments of user input extractor 141 E sense user input in the form of hand gestures in images generated by an infra-red (IR) camera that tracks an IR laser stylus or gloves with IR LEDs (also called IR gloves), while certain other embodiments of user input extractor 141 E sense hand gestures using an existing camera (e.g. in a normal cell phone) that captures images of a user's fingers.
- IR infra-red
- an existing camera e.g. in a normal cell phone
- an external PCB board 1130 FIG. 5
- an ARM Cortex processor (not shown) is interfaced with an IR camera 1006 ( FIG. 5 ) and a Bluetooth module (not shown).
- Hand tracking data from IR camera 1006 is sent via Bluetooth to a smartphone 1140 in device 120 that has a touch screen 1001 (e.g. HTC Explorer available from HTC Corporation).
- PCB board 1130 mounted on and operatively coupled to a smartphone 1140 ( FIG. 5 ).
- PCB board 1130 includes IR camera 1006 , such as mbed LPC1768 available from Foxconn, e.g. Mbed as described at http://mbed.org/nxp/lpc1768/ and a Bluetooth chipset e.g. BlueSMiRF Gold as described at http://www.sparkfun.com/products/582.
- Mbed ARM processor
- user input captured in images by a camera is extracted therefrom by the smartphone in device 120 performing gesture recognition on data received from an infra-red (IR) sensor, as described in an article entitled “iGesture: A General Gesture Recognition Framework” by Signer et al, In Proc. ICDAR '07, 5 pages which is incorporated by reference herein in its entirety.
- IR infra-red
- some embodiments of user input extractor 141 E operate with a user wearing infra-red (IR) gloves that are identified in another image generated by an IR camera.
- An IR camera of such embodiments may be externally coupled to a smartphone in mobile device 120 in some embodiments while in other embodiments the IR camera is built into the smartphone.
- Some embodiments operate with the user 135 using an IR laser stylus 1135 whose coordinates are detected by device 120 in any manner known in the art.
- Still other embodiments of user input extractor 141 E ( FIG. 5 ) receive user input in other forms as noted above, e.g. as audio input from microphone 1112 .
- user input extractor 141 E processes a frame of video captured by a camera to obtain user input in the form of hand gestures, by segmenting each image into one or more areas of interest, such as a user's hands. Any known method can be modified for use in user input extractor 141 E as described herein, to remove background noise, followed by identification of a portion of the image which contains the user's hand, which is then used to generate a binary image (also called a “blob”). A next step in some embodiments of user input extractor 141 E is to calculate locations (e.g. coordinates) of the user's fingers within the blob.
- An IR camera 1006 is not used in certain embodiments wherein a normal RGB camera is used instead to generate one or more images 109 which contain user input.
- the user input is extracted from images 109 by user input extractor 141 E ( FIG. 5 ) performing one of two methods as follows.
- a first method is of the type described in a 4-page article entitled “HandVu: Vision-based Hand Gesture Recognition and User Interface” at http://www.movesinstitute.org/ ⁇ kolsch /HandVu/HandVu.html” as available on Apr. 9, 2012, which is incorporated by reference herein in its entirety.
- a second method is of the type described in another 4-page article entitled “A Robust Method for Hand Gesture Segmentation and Recognition Using Forward Spotting Scheme in Conditional Random Fields” by Mahmoud Elmezain, Ayoub Al-Hamadi, and Bernd Michaelis, in International Conference on Pattern Recognition, 2010, which is incorporated by reference herein in its entirety.
- Such embodiments that use an existing RGB camera in a normal smartphone may use a combination of skin segmentation, graph cut and recognition of hand movement, to detect hand gestures.
- user input extractor 141 E For recognition of hand gestures, some embodiments of user input extractor 141 E ( FIG. 5 ) are designed to use a supervised learning approach in an initialization phase of device 120 .
- supervised learning approach user input extractor 141 E learns different gestures from input binary images (e.g. consisting of a user's hands) during initialization, and generates a mathematical model to be used to identify gestures in images generated during normal operation (after the initialization phase) by using Support Vector Machines (SVM) of the type known in the prior art.
- SVM Support Vector Machines
- IR camera 1006 uses an infrared (IR) camera 1006 ( FIG. 5 ) to extract portions of an image 109 that correspond to a user's fingers as blobs.
- a user holds an IR light source, such as a laser pointer or alternatively the user wears IR gloves.
- a user wears on a hand 138 ( FIG. 1E ) a glove (not shown) with an IR LED 1121 , 1122 ( FIG. 5 ) on each finger 136 , 137 ( FIG. 1E ). Detection of position of one IR LED 1121 ( FIG. 5 ) on left index finger 136 ( FIG. 1E ) by IR camera 1006 ( FIG.
- gesture recognition is performed by processor 100 executing software to operate as user input extractor 141 E ( FIG. 5 ) as described herein.
- processor 100 executing software to operate as user input extractor 141 E ( FIG. 5 ) as described herein.
- co-ordinates of IR LEDs 1121 and 1122 generated by IR camera 1006 are used by processor 100 to identify a blob (e.g. human hand image 138 I in FIG.
- Some embodiments extract blobs in two-dimensional (2D) space due to limitations inherent in design (for a similar setup, see the 2-page article described above in reference to http://johnnylee.net/projects/wii/).
- Certain embodiments of user input extractor 141 E ( FIG. 5 ) perform blob detection in 3D space, using a depth camera.
- images from an IR camera 1006 are used to translate hand movements to specific co-ordinates within native applications running on a smartphone included in device 120 .
- Several embodiments implement a simple camera calibration technique, similar to the techniques described in the 2-page article at http://johnnylee.net/projects/wii/.
- mobile device 120 generate a depth map of a scene by use of a 3D Time-of-Flight (TOF) camera of the type known in the prior art.
- TOF Time-of-Flight
- a Time-of-Flight camera is used in certain embodiments, to measure a phase difference between photons coming onto a sensor, which in turn provides a distance between the sensor and objects in the scene.
- device 120 also use Time-of-flight cameras, e.g. as described by Freedman in US Patent Publication 2010/0118123 entitled “Depth Mapping using Projected Patterns” which is incorporated by reference herein in its entirety.
- Such embodiments of device 120 may use projector 122 ( FIG. 5 ) to shine an infrared (IR) light pattern on object 132 in the scene.
- IR infrared
- a reflected light pattern is observed in such embodiments by a depth camera 121 , which generates a depth map.
- a depth map is used to enhance segmentation of image to identify areas that contain a user's face, the user's hand and/or one or more objects in an image received from an RGB camera 121 (which is thereby additionally used).
- a device 120 of some embodiments uses projector 122 to project information 119 T onto a surface of object 132 , followed by capture of a user's finger movements as follows.
- a laser pointer or gloves with IR LEDs on fingers
- the IR camera implements a motion capture system.
- an IR camera 1006 is calibrated. After camera calibration, certain embodiments generate a one-to-one mapping between the screen resolution of device 120 and the user's hand movements.
- the IR camera 1006 captures the brightest IR point.
- the coordinates of such points are processed inside an application layer or kernel layer. Based on the processed data, the user input is determined by processor 100 .
- one of the methods is based on tracking IR light sources, e.g. IR LEDs 1121 and 1122 ( FIG. 5 ) mounted at finger tips of a glove.
- IR light sources e.g. IR LEDs 1121 and 1122 ( FIG. 5 ) mounted at finger tips of a glove.
- One limitation of some embodiments is a need for external hardware (i.e. hardware not present in a conventional smartphone), such as an IR camera 1006 ( FIG. 5 ).
- Certain embodiments of mobile device 120 use one or more Infrared time-of-flight (TOF) camera(s) 1006 instead of or in addition to a CMOS infrared camera 1006 .
- TOF Infrared time-of-flight
- background noise may be present in images 109 being captured and filtered by device 120 .
- Such embodiments may utilize a frame buffer of a screen in mobile device 120 and perform stereo correspondence to reduce such noise.
- Several embodiments of device 120 implement any known techniques to reduce background noise arising from use of stereo cameras (e.g. 3D cameras) 121 .
- Mobile device 120 of several described embodiments may also include means for remotely controlling a real world object which may be a toy, in response to user input e.g. by use of transmitter in transceiver 1010 , which may be an IR or RF transmitter or a wireless a transmitter enabled to transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network.
- transceiver 1010 which may be an IR or RF transmitter or a wireless a transmitter enabled to transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network.
- mobile device 120 may include other elements, such as a read-only-memory 1007 which may be used to store firmware for use by processor 100 .
- any one or more of object extractor 141 O, information identifier 141 I, information retriever 141 R, information transformer 141 T and segmentation module 141 S illustrated in FIG. 5 and described above can, but need not necessarily include, one or more microprocessors, embedded processors, controllers, application specific integrated circuits (ASICs), digital signal processors (DSPs), and the like.
- processor is intended to describe the functions implemented by the system rather than specific hardware.
- memory refers to any type of computer storage medium, including long term, short term, or other memory associated with the mobile platform, and is not to be limited to any particular type of memory or number of memories, or type of media upon which memory is stored.
- methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in firmware in ROM 1007 ( FIG. 5 ) or software, or hardware or any combination thereof.
- the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
- ASICs application specific integrated circuits
- DSPs digital signal processors
- DSPDs digital signal processing devices
- PLDs programmable logic devices
- FPGAs field programmable gate arrays
- processors controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
- the methodologies may be implemented with modules (e.g., procedures, functions
- software 141 may include program codes stored in memory 110 and executed by processor 100 .
- Memory 110 may be implemented within or external to the processor 100 . If implemented in firmware and/or software, the functions may be stored as one or more computer instructions or code on a computer-readable medium. Examples include nontransitory computer-readable storage media encoded with a data structure (such as a sequence of images) and computer-readable media encoded with a computer program (such as software 141 that can be executed to perform the method of FIGS. 1C , 2 A, 3 A, 3 F, and 4 B- 4 D).
- Computer-readable media includes physical computer storage media.
- a storage medium may be any available medium that can be accessed by a computer.
- such computer-readable media can comprise RAM, ROM, Flash Memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store program code in the form of software instructions (also called “processor instructions” or “computer instructions”) or data structures and that can be accessed by a computer;
- disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- item 120 shown in FIG. 5 of some embodiments is a mobile device
- item 120 is implemented by use of form factors that are different, e.g. in certain other embodiments item 120 is a mobile platform (such as a tablet, e.g. iPad available from Apple, Inc.) while in still other embodiments item 120 is any electronic device or system.
- a mobile platform such as a tablet, e.g. iPad available from Apple, Inc.
- item 120 is any electronic device or system.
- Illustrative embodiments of such an electronic device or system 120 may include multiple physical parts that intercommunicate wirelessly, such as a processor and a memory that are portions of a stationary computer, such as a lap-top computer, a desk-top computer, or a server computer communicating over one or more wireless link(s) with sensors and user input circuitry enclosed in a housing that is small enough to be held in a hand.
- a processor and a memory that are portions of a stationary computer, such as a lap-top computer, a desk-top computer, or a server computer communicating over one or more wireless link(s) with sensors and user input circuitry enclosed in a housing that is small enough to be held in a hand.
Abstract
Camera(s) capture a scene, including an object that is portable. An image of the scene is processed to segment therefrom a portion corresponding to the object, which is then identified from among a set of predetermined real world objects. An identifier of the object is used, with a set of associations between object identifiers and user identifiers, to obtain a user identifier that identifies a user at least partially from among a set of users. Specifically, the user identifier may identify a group of users that includes the user (“weak identification”) or alternatively the user identifier may identify the user uniquely (“strong identification”) in the set. The user identifier is used either alone or in combination with user input to obtain and store in memory, information to be output to the user. At least a portion of the obtained information is thereafter output, e.g. displayed by projection into the scene.
Description
- This application claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/525,628 filed on Aug. 19, 2011 and entitled “Projection of Information Onto Real World Objects or Adjacent Thereto”, which is assigned to the assignee hereof and which is incorporated herein by reference in its entirety.
- This application is also related to U.S. application Ser. No. ______, Attorney Docket No. Q111570U2os, filed concurrently herewith, and entitled “Dynamic Selection of Surfaces In Real World For Projection of Information Thereon” which is assigned to the assignee hereof and which is incorporated herein by reference in its entirety.
- It is well known to use a projector to project information for use via hand gestures by a user. For details on such prior art, see an article by Mistri, P., Maes, P., Chang, L. “WUW—Wear Ur World—A wearable Gestural Interface,” CHI 2009, Apr. 4-9, 2009, Boston, Mass., USA, 6 pages that is incorporated by reference herein in its entirety.
- Computer recognition of hand gestures of the type described above raises several issues, such as lighting conditions and robustness. Several such issues are addressed by use of Time of Flight cameras, e.g. as described in the article entitled “Picture Browsing and Map Interaction using a Projector Phone” by Andrew Greaves, Alina Hang, and Enrico Rukzio, MobileHCI 2008, Sep. 2-5, 2008, Amsterdam, the Netherlands 4 pages. For additional information on such background on identifying hand gestures, see Mitra and Acharya, “Gesture Recognition: A Survey”, IEEE transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews, Vol. 37, No. 3, May 2007, 14 pages that is incorporated by reference herein in its entirety.
- In prior art of the type described above, traditional approaches appear to require a user to explicitly look for information that the user desires, and which is not personalized automatically for the user. Requiring the user to explicitly look for desired information can be non-trivial when the information is being projected. Thus, what is needed is an improved way to obtain information that may be of interest to a user, as described below.
- In several embodiments, one or more cameras capture a scene that includes an object in real world that is sufficiently small to be carried by a human hand (“portable”), such as a stapler or a book. Thereafter, an image of the scene from the one or more cameras is processed to detect therein a portion corresponding to the object, which is recognized from among a set of pre-selected real world objects. An identifier of the object is then used, with a set of associations that associate object identifiers and identifiers of users, to obtain a user identifier that identifies a user at least partially among a set of users. In some embodiments, the user identifier identifies the user generically, as belonging to a particular group of users (also called “weak identification”) among several such groups. In other embodiments, the user identifier identifies a single user uniquely (also called “strong identification”), among all such users in the set. The user identifier (obtained, as noted above, by use of an association and the object identifier) is thereafter used in several such embodiments, either alone or in combination with user-supplied information, to obtain and store in memory, information to be output to the user. At least a portion of the obtained information is thereafter output, for example by projection into the scene.
-
FIG. 1A illustrates a portable real world object 132 (e.g. a stapler) being imaged by acamera 121, and its use byprocessor 100 in anelectronic device 120 to cause a projection of information intoscene 130 by performing acts illustrated inFIG. 1C , in several embodiments described herein. -
FIG. 1B illustrates another portable real world object 142 (e.g. a tape dispenser) being imaged by anotherelectronic device 150 in a manner similar toelectronic device 120 ofFIG. 1A , to displayinformation 119T on ascreen 152 ofdevice 150, also by performing the acts ofFIG. 1C in several embodiments described herein. -
FIG. 1C illustrates in a high-level flow chart, acts performed by one or more processors 100 (e.g. in one ofelectronic devices FIGS. 1A and 1B ), to use an association of an object detected in an image to obtain information to display, in some embodiments described herein. -
FIG. 1D illustrates in a high-level block diagram, a set ofassociations 111 in amemory 110 used byprocessor 100 ofFIG. 1C , in some of the described embodiments. -
FIG. 1E illustrates association of portablereal world object 132 withgroup 2 byuser 135 making a hand gesture with two outstretched fingers adjacent to portablereal world object 132, in certain embodiments. -
FIG. 1F illustrates use of the association ofFIG. 1E to obtain and displayinformation 188 forgroup 2, in certain embodiments. -
FIG. 1G illustrates changing the association ofFIG. 1E by replacinggroup 1 withgroup 2 byuser 135 making another hand gesture withindex finger 136 pointing at portablereal world object 132, in such embodiments. -
FIG. 1H illustrates use of the association ofFIG. 1G to obtain and displayinformation group 1, in such embodiments. -
FIGS. 1I and 1J illustratecolored beams object 132 by aprojector 122 in certain embodiments. -
FIG. 2A illustrates in a high-level flow chart, acts performed by one ormore processors 100 to use an association of an object detected in an image in combination with user input, to obtain information to display, in some embodiments described herein. -
FIG. 2B illustrates in a high-level block diagram, a set ofassociations 211 in amemory 110 used byprocessor 100 ofFIG. 2A , in some of the described embodiments. -
FIG. 2C illustratesreal world object 132 in scene 230 imaged bycamera 121, for use in projection of information in certain embodiments. -
FIG. 2D illustratesprocessor 100 coupled tomemory 110 that containsimage 109 as well as an address 291 (in a table 220) used to obtain information for display in some embodiments. -
FIG. 2E illustrates display of information in the form of avideo 295 projected adjacent toobject 132 in some embodiments. -
FIGS. 3A and 3F illustrate in high-level flow charts, acts performed by one ormore processors 100 to use an association of an object detected in an image with a single person, to obtain information to display specific to that person, in some embodiments described herein. -
FIG. 3B illustrates in a high-level block diagram, a set ofassociations 311 in amemory 110 used byprocessor 100 ofFIGS. 3A and 3F , in some of the described embodiments. -
FIG. 3C illustrates, a user interface included in information projected adjacent to object 132 and in some embodiments. -
FIGS. 3D and 3E illustrate, a user's personalized information included in information projected adjacent to object 132 (inFIG. 3D ) and onto object 132 (inFIG. 3E ) in some embodiments. -
FIG. 4A illustrates, in a high-level block diagram, aprocessor 100 coupled to amemory 110 in amobile device 120 of some embodiments. -
FIG. 4B illustrates in a high-level flow chart, acts performed byprocessor 100 ofFIG. 4A in projecting information intoscene 130 in several embodiments. -
FIGS. 4C and 4D illustrate in intermediate-level flow charts, acts performed byprocessor 100 in projecting information in certain embodiments. -
FIG. 5 illustrates, in a high-level block diagram, amobile device 120 of several embodiments. - In accordance with the described embodiments, one or more device(s) 120 (
FIG. 1A ) use one or more cameras 121 (and/or sensors such as a microphone) to capture input e.g. one or more images 109 (FIG. 1B ) from a scene 130 (FIG. 1A ) that contains an object 132 (FIG. 1A ). Depending on the embodiment, object 132 can be any object in the real world (in scene 130) that is portable by a human, e.g. small enough to be carried in (and/or moved by) a human hand, such as any handheld object. Examples ofobject 132 are a stapler, a mug, a bottle, a glass, a book, a cup, etc. Also depending on the embodiment, device(s) 120 can be any electronic device that includes acamera 121, amemory 110 and aprocessor 100, such as a smartphone or a tablet (e.g. iPhone or iPad available from APPLE, Inc.). For convenience, the following description refers to asingle device 120 performing the method ofFIG. 1C , although multiple such devices can be used to individually perform any one or more of steps 101-108, depending on the embodiment. - Accordingly, one or more captured images 109 (
FIG. 1B ) are initially received from a camera 121 (as peract 101 inFIG. 1C ) e.g. via bus 1113 (FIG. 5 ) and stored in amemory 110.Processor 100 then processes (as peract 102 inFIG. 1C ) the one ormore images 109 to detect the presence of an object 132 (FIG. 1A ) e.g. on a surface of a table 131, in ascene 130 of real world outside camera 121 (seeact 102 inFIG. 1C ). For example,processor 100 may be programmed to recognize a portion ofimage 109 corresponding to portablereal world object 132, to obtain an identifier (“object identifier”) 1120 that uniquely identifiesobject 132 among a set of predetermined objects. -
Processor 100 then uses the object identifier (e.g. stapler identifier 1120 inFIG. 1D ) that is obtained inact 102 to look up a set ofassociations 111 in an act 103 (FIG. 1C ). The result ofact 103 is an identifier of a user (e.g. user identifier 112U) who has been previously associated with portable real world object 132 (e.g. stapler). Specifically, in certain aspects of several embodiments,memory 110 that is coupled toprocessor 100 holds a set ofassociations 111 including for example an association 112 (FIG. 1D ) that associates astapler identifier 1120 withuser identifier 112U and anotherassociation 114 that associates a tape-dispenser identifier 1140 with anotheruser identifier 114U. Such a set ofassociation 111 may be created in different ways depending on the embodiment, and in some illustrative embodiments anassociation 112 inset 111 is initialized or changed in response to a hand gesture by a user, as described below. - In certain embodiments,
user identifiers associations such devices 120. Instead, in these embodiments,user identifiers first group 1 of users A, B and C, and asecond group 2 of users X, Y and Z (users A-C and X-Z are not shown inFIG. 1D ). In such embodiments (also called “weak identification” embodiments), auser identifier 112U that is obtained in act 103 (described above) is generic to several users A, B and C within thefirst group 1. In other embodiments (also called “strong identification” embodiments), each user identifier obtained inact 103 identifies a single user uniquely, as described below in reference toFIGS. 3A-3F . - Referring back to
FIG. 1C ,processor 100 of several embodiments uses auser identifier 112U looked up inact 103, to generate an address of information 119 (FIG. 5 ) to be output to the user and then obtains and stores in memory 110 (as peract 104 inFIG. 1C ) theinformation 119. In certain weak identification embodiments, whereinuser identifier 112U is of agroup 1,information 119 to be output obtained inact 104 is common to all users within that group 1 (e.g. common to users A, B and C). For example,information 119 includes the text “Score: 73” which represents a score of thisgroup 1, in a game being played between two groups of users, namely users ingroup 1 and users ingroup 2. -
Information 119 is optionally transformed and displayed or otherwise output touser 135, in an act 105 (seeFIG. 1C ) asinformation 119T (FIG. 1A ). Specifically, in some embodiments,information 119 is displayed by projection of at least a portion of the information (e.g. the string of characters “Score: 73”) intoscene 130 by use of aprojector 122 as illustrated inFIG. 1A . In other embodiments, information 119 (or a portion thereof) may be output by act 105 (FIG. 1C ) in other ways, e.g. device 150 (FIG. 1B ) displayinginformation 119 touser 135 directly on a screen 151 (FIG. 1B ) that also displays a live video of scene 130 (e.g. by displaying image 109), thereby to provide an augmented reality (AR) display. - In still other embodiments,
information 119 may be played through a speaker 1111 (FIG. 5 ) indevice 120, or even through a headset worn byuser 135. In an embodiment illustrated inFIG. 1B ,device 150 is a smartphone that includes a front-facing camera 152 (FIG. 1B ) in addition to a rear-facingcamera 121 that capturesimage 109. Front-facing camera 152 (FIG. 1B ) is used in some embodiments to obtain an image of a face ofuser 135, for use in face recognition in certain strong identification embodiments described below in reference toFIGS. 3A-3F . Device 150 (FIG. 1B ) may be used in a manner similar or identical to use ofdevice 120 as described herein, depending on the embodiment. - Processor 100 (
FIG. 5 ) of certain embodiments performsact 103 afteract 102 when associations 111 (FIG. 1D ) have been previously formed and are readily available inmemory 110.Associations 111 may be set up inmemory 110 based on information input by user 135 (“user input”) in any manner, as will be readily apparent in view of this detailed description. For example, user input in the form of text including words spoken byuser 135 are extracted by some embodiments ofprocessor 100 operating as auser input extractor 141E, from an audio signal that is generated by a microphone 1112 (FIG. 5 ) in the normal manner. In certain embodiments, user input (e.g. for use in preparing associations 111) is received byprocessor 100 viacamera 121 in the form of still images of shapes or a sequence of frames of video of gestures of ahand 138 of a user 135 (also called social protocol), and comparing the received gestures with a library. In some “weak identification” embodiments,processor 100 is programmed to respond to user input sensed by one or more sensors indevice 120, (e.g. camera 121) that detect one or more actions (e.g. gestures) by auser 135 as follows:processor 100 associates anobject 132 that is selectively placed within a field ofview 121F ofcamera 121 with an identifier (e.g. a group identifier) that depends on user input (e.g. hand gesture). - As would be readily apparent in view of this detailed description, any person,
e.g. user 135 can use theirhand 138 to form a specific gesture (e.g. tapping on object 132), to provide user input viacamera 121 toprocessor 100 that in turn uses such user input in any manner described herein. For example,processor 100 may use input fromuser 135 in the form of hand gestures (or hand shapes) captured in a video (or still image) bycamera 121, to initialize or change a user identifier that is generic and has been associated withobject 132, as illustrated inFIGS. 1E-1H , and described below. As noted above, some embodiments accept user input in other forms, e.g. audio input such as a whistling sound, and/or a drumming sound and/or a tapping sound, and/or sound of text including words “Group Two” spoken byuser 135 may be used to associate anobject 132 that is imaged withinimage 109 with a user identifier that is generic (e.g. commonly used to identify multiple users belonging to a particular group). - In some embodiments,
user input extractor 141E is designed to be responsive to images fromcamera 121 of auser 135 forming a predetermined shape in agesture 118, namely the shape of letter “V” of the English alphabet withhand 138, by stretching outindex finger 136 and stretching outmiddle finger 137. Specifically,camera 121images hand 138, with the just-described predetermined shape “V” at a location in real world that is adjacent to (or overlapping) portable real world object 132 (FIG. 1E ). Moreover,processor 100 is programmed to perform act 106 (FIG. 1C ) to extract from image 109 (FIG. 1D ) inmemory 110 this hand gesture 118 (index and middle finger images 136I and 137I outstretched in human hand image 138I inFIG. 4A ). During an initialization phase,processor 100 responds to detection of such a hand gesture by forming an association (in the set of associations 111) that is thereafter used to identify person 135 (as belonging to group 2) every time this same hand gesture is recognized byprocessor 100. - In some embodiments, for
processor 100 to be responsive to a hand gesture,user 135 is required to positionfingers hand gesture fingers camera 121. Note that in the just-described embodiments, whenuser 135 makes the same hand gesture, but outside the field ofview 121F of camera 121 (FIG. 5 ),processor 100 does not detect such a hand gesture and soprocessor 100 does not use make or change any association, even whenobject 132 is detected inimage 109. - In some embodiments, an
act 106 is performed byprocessor 100 afteract 102, to identify the above-described hand gesture (or any other user input depending on the embodiment), from among a library of such gestures (or other such user input) that are predetermined and available in adatabase 199 on a disk (seeFIG. 2D ) or other non-transitory storage media. Next, in act 107,processor 100 performs a look up of a predetermined mapping 116 (FIG. 1D ) based on the hand gesture (or other such user input) detected inact 106 to obtain a user identifier from the set of associations 111 (FIG. 5 ). In the example illustrated inFIG. 1D , two-finger gesture 118 (or other user input, e.g. whistling or drumming) is related by a mapping 116 to anidentifier 114U ofgroup 2, and therefore in anact 108 this identifier (looked-up from mapping 116) is used to formassociation 114 in theset 111. - On completion of
act 108,processor 100 of several embodiments proceeds to obtaininformation 119 to be output (as peract 104, described above), followed by optional transformation and output (e.g. projection as per act 105), as illustrated inFIG. 1F . In one example shown inFIG. 1A , object 132 was associated withgroup 1, and thereforeinformation 119T which is output intoscene 130 includes the text string “Score: 73” which represents a score ofgroup 1, in the game being played with users ingroup 2. In several such embodiments, as illustrated inFIG. 1G , thesame user 135 can change a previously formed association, by making adjacent to the same portablereal world object 132, a second hand gesture 117 (e.g. index finger 136 outstretched) that is different from a first hand gesture 118 (e.g. index finger 136 andmiddle finger 137 both stretched out). As noted above, the hand gesture is made byuser 135 sufficiently close to object 132 to ensure that the gesture and object 132 are both captured in a common image bycamera 121. - Such a second hand gesture 117 (
FIG. 1D ) is detected byprocessor 100 inact 106, followed by lookup of mapping 116, followed by over-writing of afirst user identifier 114U that is currently included inassociation 114 with asecond user identifier 112U, thereby to change a previously formed association. Hence, after performance ofacts FIG. 1F ) is now replaced with new information including the text string “Score:0” which is the score ofGroup 2 as shown inFIG. 1G . In some embodiments, in addition to the just-described text string, one or more additional text strings may be displayed to identify the user(s). For example, the text string “Group 2” is displayed as a portion ofinformation 188T inFIG. 1G , and while the text string “Group 1” is displayed as a portion ofinformation 188T inFIG. 1F . In some embodiments,information 188T is optionally transformed for display relative toinformation 188 that is obtained for output, resulting in multiple text strings ofinformation 188 being displayed on different surfaces,e.g. information 189A displayed onobject 132 andinformation 189B displayed on table 131 as shown inFIG. 1H . - Although in some embodiments a mapping 116 maps hand gestures (or other user input) to user identifiers of groups, in other embodiments each hand gesture (or other user input) may be mapped to a
single user 135, thereby to uniquely identify each user (“strong identification embodiments”), e.g. as described below in reference toFIGS. 3A-3F . In several such embodiments, yet another data structure (such as an array or a linked list) identifies a group to which each user belongs, andprocessor 100 may be programmed to use that data structure to identify a user's group when needed for use inassociations 111. - In obtaining to-be-displayed information in act 104 (
FIG. 1C ),processor 100 of some embodiments simply uses recognition of a user's hand gesture (or other user input) to selectGroup 2 from among two groups, namelyGroup 1 andGroup 2. Specifically, although presence of portablereal world object 132 is required byprocessor 100 in animage 109 in order to recognize a gesture, the identity ofobject 132 is not used in some embodiments to obtain the to-be-displayed information. However, other embodiments ofprocessor 100 do use two identifiers based on corresponding detections inimage 109 as described above, namely user identifier and object identifier, to obtain and store inmemory 110 the to-be-displayed information 119 in act 104 (FIG. 1C ). - Moreover, as will be readily apparent in view of this detailed description, the to-be-displayed information 119 (
FIG. 5 ) may be obtained byprocessor 100 based on recognition of (1) only a hand gesture or (2) only the real world object, or (3) a combination of (1) and (2), depending on the embodiment. Also as noted above, a hand gesture is not required in certain embodiments ofprocessor 100 that accepts other user input, such as an audio signal (generated by a microphone) that carries sounds made by a user 135 (with their mouth and/or with their hands) and recognized byprocessor 100 on execution of appropriate software designed to recognize such user-made sounds as user input, e.g. in signals from microphone 1112 (FIG. 5 ). - Although in some embodiments, a user identifier with which portable
real world object 132 is associated is displayed as text, as illustrated bytext string 189A in FIG. 1H, in other embodiments a user identifier may be displayed as color, as illustrated inFIGS. 1I and 1J . Specifically, for example, to begin with,object 132 in the form of acap 132 of a bottle is selected by auser 135 to be included in an image of ascene 130 of real world being captured by a camera 121 (FIG. 5 ). - Initially
identity 214 ofbottle cap 132 is associated in a set of associations 211 (seeFIG. 2B ) by default with three users that are identified as a group of friends byidentity 215, and this association is shown bydevice 120, e.g. by projecting abeam 163 of blue color light onobject 132 and in a peripheral region outside of and surrounding object 132 (denoted by the word of text ‘blue’ inFIG. 1I , as colors are not shown in a black-and-white figure). In this example, color blue has been previously associated with the group of friends ofidentity 215, as the group's color. Similarly, a book'sidentity 212 is associated in the set of associations 211 (seeFIG. 2B ) by default with four users that are identified as a group (of four students) by identity 213 (e.g. John Doe, Jane Wang, Peter Hall and Tom McCue). - At this stage, a person 135 (e.g. Tom McCue) identified as a user of the group of students associates his group's identity 213 (see
FIG. 2B ) with object 132 (FIG. 1J ) by tappingtable surface 131 withindex finger 136 repeatedly several times in rapid succession (i.e. performs a hand gesture to whichprocessor 100 is programmed to recognize and suitably respond), untilperson 135 sees a projection ofbeam 164 of green light on and around object 132 (denoted by the word ‘green’ inFIG. 1J , as this figure is also a black-and-white figure). In this example, color green was previously associated with the group of students ofidentity 213, as its group color. Moreover, tapping is another form of hand gesture recognized byprocessor 100, e.g. on processing a camera-generated image containing the gesture and optionally user input in the form of sound, or both in some illustrative embodiments. - In certain embodiments of the type illustrated in
FIG. 2A ,processor 100 performs acts 201-203 that are similar or identical to acts 101-103 described above in reference toFIG. 1C . Accordingly, inact 201, one or more rear-facingcameras 121 are used to capture scene 130 (FIG. 2C ) that includes real world object 132 (in the form of a book) and store image 109 (FIG. 2D ) inmemory 110 in a manner similar to that described above, although inFIGS. 2C and 2D , theobject 132 being imaged is a book. In performing acts 202-206 inFIG. 2A processor 100 not only recognizesobject 132 as a book in image 109 (FIG. 2D ) but additionally recognizes atext string 231A therein (FIG. 2C ), which is identified by a hand gesture. - Specifically, in act 204 (
FIG. 2A ),processor 100 operates as a user-input extractor 141E (FIG. 5 ) that obtains input fromuser 135 for identifying information to be obtained for display in act 205 (which is similar to act 105 described above). For example, in act 204, user input is received inprocessor 100 by detection of a gesture in image 109 (FIG. 2D ) in which object 132 has also been imaged by camera 121 (FIG. 2C ). In this example, user 135 (FIG. 2C ) makes a predetermined hand gesture, namely an indexfinger hand gesture 117 by stretchingfinger 136 ofhand 138 to point totext string 231A in portablereal world object 132. This indexfinger hand gesture 117 is captured in one or more image(s) 109 (FIG. 2D ) in which object 132 is also imaged (e.g.finger 136 overlaps object 132 in the same image 109). The imaged gesture is identified by use of a library of gestures, and an procedure triggered by the indexfinger hand gesture 117 is performed in act 204, including OCR of an image portion identified by the gesture, to obtain as user input, the string of characters “Linear Algebra.” - Subsequently, in act 205 (
FIG. 2A ),processor 100 operates as an object-user-input mapper 141M (FIG. 5 ) that uses both: (1) the user group identified inact 203 from the presence ofobject 132 and (2) the user input identified from atext string 231A (e.g. “Linear Algebra”) detected by use of a gesture identified in act 204 (FIG. 2A ), to generate an address 291 (in a table 220 inFIG. 2B ). For example, a user group identified by act 203 (FIG. 2A ) may be first used by object-user-input mapper 141M (FIG. 5 ) to identify table 220 (FIG. 2B ) from among multiple such tables, and then the identified table 220 is looked up with the user-supplied information, to identify anaddress 291, which may be accessible on the Internet. - Such an
address 291 is subsequently used (e.g. byinformation retriever 141R inFIG. 5 ) to prepare a request for fetching from Internet, a video that is associated with thestring 231A. For example, in act 205 (FIG. 2A )processor 100 generatesaddress 291 as http://ocw.mit.edu/courses/ mathematics/18-06-linear-algebra-spring-2010/video-lectures/ which is then used to retrieve information 119 (FIG. 5 ). Use of table 220 as just described enables a query that is based on a singlecommon text string 231A to be mapped to different addresses, for information to be displayed to different groups of users. For example, a processor for one user A retrieves an address of the website of Stanford Distance learning course (namely http://scpd.stanford.edu/coursesSeminars/seminarsAndWebinars.jsp) from one table (e.g. customized for user A) whileprocessor 100 for another user B retrieves anotheraddress 291 for MIT's OpenCourseware website (namely http://videolectures.net/mit_ocw/) from another table 220 (e.g. customized for user B). - Such an
address 291 that is retrieved byprocessor 100 using a table 220, in combination with one or words intext string 231A may be used in some embodiments with an Internet-based search service, such as the website www.google.com to identify content for display touser 135. Subsequently, inact 206,processor 100 issues a request to address 291 in accordance with http protocol and obtains as information to be output, a video stream from the Internet, followed by optional transformation and projection of the information, as described below. - A result of performing the just-described method of
FIG. 2A is illustrated inFIG. 2E by avideo 295 shown projected (after any transformation, as appropriate) on a surface of table 131 adjacent to object 132. As noted above,video 295 has been automatically selected byprocessor 100 and is being displayed, based at least partially on optical character recognition (OCR) to obtain from one or more images (e.g. in video 295) ofobject 132, atext string 231A that has been identified by an indexfinger hand gesture 117. In several embodiments, no additional input is needed byprocessor 100 fromuser 135, after the user makes a predetermined hand gesture to point totext string 231A and before the video is output, e.g. no further user command is needed to invoke a video player indevice 120, as the video player is automatically invoked byprocessor 100 to play the video stream retrieved from the Internet. Other such embodiments may require user input to approve (e.g. confirm) that the video stream is to be displayed. - Note that
text string 231A is recognized from among many such strings that are included inimage 109, based on the string being located immediately above a tip ofindex finger 136 which is recognized byprocessor 100 in act 204 as a portion of a predetermined hand gesture. As noted above,human finger 136 is part of ahand 138 of ahuman user 135 and in thisexample finger 136 is used ingesture 117 to identify as user input a string of text inscene 130, which is to trigger retrieval of information to be output. Instead of indexfinger hand gesture 117 as illustrated inFIGS. 2C and 2D , in certainalternative embodiments user 135 makes a circling motion withfinger 136 aroundtext string 231A as a different predetermined gesture that is similarly processed. Hence, in act 204,processor 103 completes recognition of real world object 132 (in the form of a book inFIG. 2C ), in this example by recognizingstring 231A. Thereafter,processor 103 generates a request to a source on the Internet to obtain information to be projected in the scene for use by person 135 (e.g. based on a generic user identifier ofperson 135 as belonging to a group of students). - Accordingly, user interfaces in certain embodiments of the type described above in reference to
FIGS. 2A-2E automatically projectinformation 119 on real world surfaces using aprojector 122 embedded in amobile device 120. Thus, user interfaces of the type shown inFIGS. 2A-2E reverse the flow of information of prior art user interfaces which require auser 135 to explicitly look for information, e.g. prior art requires manually using a web browser to navigate to a web site at which a video stream is hosted, and then manually searching for and requesting download of the video stream. Instead, user interfaces of the type shown inFIGS. 2A-2E automatically output information that is likely to be of interest to the user, e.g. by projection on to surfaces of objects in real world, using an embedded mobile projector. - Depending on the embodiment,
mobile device 120 may be any type of electronic device with a form factor sufficiently small to be held in a human hand 138 (similar in size to object 132) which provides a new way of interacting withuser 135 as described herein. For example,user 135 may use such amobile device 120 in collaboration with other users, with contextual user interfaces based oneveryday objects 132, wherein the user interfaces overlap for multiple users of a specific group, so as to provide common information to all users in that specific group (as illustrated forgroup 1 inFIG. 1F andgroup 2 inFIG. 1G ). Moreover, as described above in reference toFIGS. 2A-2E ,processor 100 may be programmed in some embodiments to automatically contextualize a user interface, by using one or more predetermined techniques to identify and obtain for display information that auser 135 needs to view, when interacting with one or more ofobjects 132. - In some illustrative embodiments, in response to
user 135 opening book (shown inFIG. 2C as object 132) to a specific page (e.g. page 20),processor 120 automatically processes images of real world to identify therein atext string 231A based on its large font size relative to the rest of text onpage 20. Then,processor 100 of several embodiments automatically identifies a video on “Linear Algebra” available on the Internet e.g. by use of a predetermined website (as described above in reference to act 205), and then seeks confirmation fromuser 135 that the identified video should be played. The user's confirmation may be received byprocessor 100 in a video stream that contains a predetermined gesture,e.g. user 135 waving of index finger in a motion to make a check mark (as another predetermined gesture that identifies user input). -
Processor 100 is programmed in some embodiments to implement strong identification embodiments, by performing one or more acts 301-309 illustrated inFIG. 3A , by use of user identifiers that uniquely identify a single user X from among all users A-Z. In examples of such embodiments that use strong identification, information 319 (FIGS. 3D and 3E ) that is obtained for display may be specific to that single user X, for example email messages that are specifically addressed to user X. - In certain embodiments of the type illustrated in
FIG. 3A ,processor 100 performsacts acts FIG. 1C . Thereafter, inact 303,processor 100 uses an identifier of the portable real world object with a set of associations 311 (FIG. 3B ), to obtain an identifier that uniquely identifies a user of the portable real world object. Note thatact 303 is similar to act 103 except for the set ofassociations 311 being used inact 303. Specifically, inset 311, anassociation 314 maps an object identifier 1140 (such as a bottle cap ID) to asingle user 314U (such as Jane Doe), as illustrated inFIG. 3B . Accordingly, in such embodiments,user 314U is uniquely associated withobject 132 in the form of a bottle cap (FIG. 1E ), i.e. no other user is associated with a bottle cap. Hence, other users may be associated with other such portable real world objects (e.g. book inFIG. 2C or stapler inFIG. 1A ), but not with a bottle cap as it has already been uniquely associated withuser 314U (FIG. 3B ). - Thereafter, a user identifier which is obtained in
act 303 is used in act 304 (FIG. 3A ) to obtain and store inmemory 110,information 319 that is specific touser 314U (of the portable real world object 132). Thisinformation 319 is then displayed, inact 305, similar to act 105 described above, except that the information being displayed is specific touser 314U as shown inFIGS. 3D and 3E . For example, the to-be-displayed information 319 may be obtained inact 304 from a website www.twitter.com, specific to the user's identity. In some examples, the to-be-displayed information 319 received byprocessor 100 is personalized foruser 135, based on user name and password authentication by the website www.twitter.com. Although thepersonalized information 319 illustrated in one example is from www.twitter.com other websites can be used, e.g. an email website such as http://mail.yahoo.com can be used to obtain other such information personalized foruser 135. - In several strong identification embodiments of the type illustrated in
FIGS. 3A and 3B , user-supplied text (for use in preparing associations 311) is received byprocessor 100 via an authentication (also called login) screen. Specifically, in some embodiments anact 306 is performed byprocessor 100 afteract 302, to display an authentication screen. For example, inact 306authentication screen 321 is projected on to table 131 adjacent to object 132 as part ofinformation 322 as shown inFIG. 3C . In the example illustrated inFIG. 3C ,processor 100 obtains the authentication screen to be displayed inact 306 from a computer (not shown) accessible on the Internet, such as a web server at the website www.twitter.com, and this screen is of a user interface such as a dialog box that prompts the user to enter their user name and password. - In some embodiments,
processor 100 automatically includes adjacent to such a dialog box, alayout image 333 of a keyboard ininformation 322 that is projected intoscene 130. Note that although only akeyboard image 333 is illustrated inFIG. 3C , a mouse may additionally be included in the projectedinformation 322. Next, asuser 135 types onkeyboard image 333, inact 307processor 100 recognizes additional images (similar toimage 109 described above) and generates user input by performing Optical Character Recognition (OCR), and such user input in the form of text string(s) is stored inmemory 110 and then sent to the website www.twitter.com. The same user input is also used inact 308 by some embodiments ofprocessor 100 to identify the user (i.e. by using at least a portion of the user input as a user identifier), followed by performing authentication using a table 391 inmemory 110. For example, in act 308 a user name and password received as the user-supplied text may be locally checked against table 391 byprocessor 100. - At this stage, the user's identity is known, and a “strong” identification embodiment is implemented by
processor 100 performingact 309. Specifically, inact 309,processor 100 prepares anassociation 314 inset 311, to associate anidentifier 114U ofobject 132 with anidentifier 314U (e.g. name) of the user identified in the authentication screen 321 (projected adjacent to object 132), in response to user input being authenticated. Thereafter, acts 304 and 305 are performed as described above. Note that in other embodiments, a different user name and password may be used locally byprocessor 100. Hence, in one such example the user is authenticated two times, once byprocessor 100 locally (when the user enters their user name and password to login, into device 120), and another time by a remote computer (or web server) that supplies information 119 (FIG. 5 ) to be output (e.g. at the website www.twitter.com). In some embodiments, a single authentication is sufficient, e.g. the user name and password that were used to log intodevice 120 are automatically used (directly or indirectly via a table lookup) to communicate with the remote computer (or web server), to obtain information 119 (FIG. 5 ). In some embodiments, a user name that was used to log intodevice 120 is also used to identify a table 220 (among multiple such tables) used in identifyingaddress 291, for obtaining information 119 (FIG. 5 ) - User-
specific information 319 that is obtained inact 304 is typically displayed as peract 305 at a location adjacent to object 132 (FIG. 3D ) or alternatively onobject 132 itself (FIG. 3E ) in order to reduce the likelihood of snooping by users other thanuser 135 with whom object 132 is uniquely associated. Prior to display (e.g. by projection) inact 305, such information may be optionally transformed. A specific technique for transformation that is selected for projection of user-specific personalized information can depend on a number of factors, such as the smoothness and/or shape and size and/or a dynamically computed surface normal (or gradient) ofobject 132, and/or resolution and legibility ofinformation 119 that is to be projected, etc. A transformation technique that is selected may also editinformation 119 to be output, e.g. truncate or abbreviate the information, omit images, or down-scale images etc. depending on the embodiment. - Although certain embodiments to implement strong identification described above use an authentication screen, other embodiments use two cameras to perform a method of the type illustrated in
FIG. 3F , wherein one camera is embedded with aprojector 122 indevice 120 and other camera is a normal camera included in a mobile device such as a smart phone (or alternatively external thereto, in other form factors). Hence, in the method ofFIG. 3F , two cameras are operated to synchronize (or use) hand gestures with a person's face, but otherwise many of the acts are similar or identical to the acts described above in reference toFIG. 3A . - Specifically, when
person 135 makes a specific hand gesture adjacent to object 132 or touches object 132 in a certain manner (e.g. taps on the object), a back-facing camera 121 (FIG. 1B ) inmobile device 150 captures animage 109 ofscene 130. Detection in such animage 109 of a portion that corresponds to the specific hand gesture (as peract 396 inFIG. 3F ) triggersprocessor 100 to performact 397 to operate a front-facing camera 152 (FIG. 1B ). Front-facing camera 152 (FIG. 1B ) then captures an image including a face of theuser 135, and the image is then segmented (e.g. bymodule 141S inFIG. 5 ) to obtain a portion corresponding to the face which is used (inmodule 141S) byprocessor 100 performingact 398 to determine a user identifier (e.g. perform authentication). - Specifically, in
act 398,processor 100 of some embodiments compares the image portion corresponding to the face to a database 199 (FIG. 2D ) of faces, and on finding a match obtains from the database an identifier of the user (selected from among user identifiers of faces in the database). Hence, the user'sface 120 is received byprocessor 100 from afront facing camera 152 and is detected as such inact 398, thereby resulting in a unique identifier for the user that may be supplied touser input extractor 141E for use in preparing an association, to associate the user identifier with an object identifier, in response to detecting a predetermined gesture adjacent to the object. Illustrative embodiments of asegmentation module 141S (FIG. 5 ) may identify users as described by Viola and Jones in an article entitled “Robust Real-Time Face Detection”, 18 pages, International Journal of Computer Vision 57(2), 137-154, 2004 that is incorporated by reference herein in its entirety. Next, act 309 is performed as described above in response to detection of the gesture, to prepare an association so that object 132 (e.g. bottle cap) is identified as belonging toperson 135. - In certain embodiments, an object identifier of portable
real world object 132 in image 109 (FIG. 4A ) is automatically identified byprocessor 100 using data 445 (also called “object data”, seeFIG. 4A ) on multiple real world objects in adatabase 199 that is coupled toprocessor 100 in the normal manner. Specifically, in an example illustrated inFIGS. 4A and 4B ,processor 100 recognizesobject 132 to be a bottle cap (which is identified by an identifier ID1), based on attributes indata 441D (FIG. 4A ) indatabase 199 matching attributes of a portion 132I of image 109 (FIG. 4A ) received as per act 431 (FIG. 4B ). - Hence, in some embodiments,
processor 100 is programmed with software to operate as an object extractor 141O (seeFIG. 5 ) which determines feature vectors from animage 109, and compares these feature vectors to corresponding feature vectors of objects that are previously computed and stored in adatabase 199, to identify an object. Comparison between feature vectors can be done differently depending on the embodiment (e.g. using Euclidean distance). On completion of comparison, object extractor 141O identifies fromdatabase 199 anobject 132 that most closely matches the feature vectors fromimage 109, resulting in one ormore object identifiers 112O, 112U (FIG. 5 ). - Any type of features known to a skilled artisan may be extracted from
image 109 by object extractor 141O (FIG. 5 ), depending on the embodiment. For more information on use of such feature, see a 15-page article entitled “Scale-invariant feature transform” at “http://en.wikipedia.org/ wiki/Scale-invariant_feature_transform” as available on Apr. 9, 2012, which is incorporated by reference herein in its entirety. In several such embodiments,processor 100 is programmed with software to identify clusters of features that vote for a common pose of an object (e.g. using the Hough transform). In several such embodiments, bins that accumulate a preset minimum number of votes (e.g. 3 votes) are identified by object extractor 141O, asobject 132. - Some embodiments of object extractor 141O extract SIFT features as described in the preceding paragraph, while other embodiments use a method described by Viola and Jones in a 25-page article entitled “Robust Real-time Object Detection,” in the Second International Workshop On Statistical And Computational Theories Of Vision—Modeling, Learning, Computing, And Sampling, Vancouver, Canada, Jul. 13, 2001 that is incorporated by reference herein in its entirety. For more information on such a method, see a 3-page article entitled “Viola-Jones object detection framework” at “http://en.wikipedia. org/wiki/Viola%E2%80%93Jones_object_detection_framework” as available on Apr. 9, 2012, which is incorporated by reference herein in its entirety.
- Accordingly, features (that are determined from an
image 109 as described above) are used in some embodiments by object extractor 141O (FIG. 5 ) to generate a geometric description of items inimage 109 that is received from a camera, such asobject 132. Similar or identical software may be used byprocessor 100 to extract fromimage 109, a blob 136I (FIG. 2D ) of a finger in a hand gesture (and/or to recognize a user's face as described herein). As noted above, some embodiments ofprocessor 100 use Haar features which consist of vector windows that are used to calculate edges, line features and center-surrounded features in animage 109. In certain embodiments, vector windows are run byprocessor 100 across portions of image 109 (FIG. 2D ) and the resulting output is an integer value. If the value at a certain position in the image exceeds a certain threshold, such embodiments determine that there is a positive match. Depending on the embodiment,processor 100 uses such features (also called “feature vectors”) differently depending upon the item to be recognized (object, or face, or hand gesture). Hence, depending on the embodiment,processor 100 uses vector windows that are different for objects, hand gestures, face recognition etc. Use of Haar features byprocessor 100 in certain embodiments has limitations, such as robustness and low fps (frames per second) due to dependency on scaling and rotation of Haar vector windows. - Various embodiments of
processor 100 may be programmed with software to operate as object extractor 141O (FIG. 5 ) that uses other methods, such as a method described in an 8-page article entitled “Object Recognition from Local Scale-Invariant Features” by David G. Lowe, in Proceedings of the International Conference on Computer Vision, Corfu (September 1999), which is incorporated by reference herein in its entirety. In some embodiments, after one or more objects are identified by object extractor 141O (of the type described above), three dimensional (3D) surfaces of the object(s) are segmented byprocessor 100 into local regions with a curvature (or other such property) within a predetermined range, so that the regions are similar, relative to one another. - The above-described segmentation into local regions is done by
processor 100 so that when information is projected on such object(s),processor 100 may be optionally programmed to operate asinformation transformer 141T (FIG. 5 ) to truncate or otherwise manipulateinformation 119, so that the information fits within local regions ofobject 132 identified by segmentation. Truncation or manipulation of content ininformation 119 byprocessor 100 reduces or eliminates the likelihood that projection ofinformation 119 on to object 132 will irregularly wrap between local regions which may have surface properties different from one another.Processor 100 of some embodiments segments a 3D surface ofobject 132 to identify local regions therein as described in a 14-page article entitled “Partitioning 3D Surface Meshes Using Watershed Segmentation” by Alan P. Mangan and Ross T. Whitaker, in IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 5, NO. 4, OCTOBER-DECEMBER 1999 which is incorporated by reference herein in its entirety. - In some embodiments, to project
information 119 onto local regions ofobject 132,processor 100 operating asinformation transformer 141T calibrates camera 121 (FIG. 5 ) using any calibration method, such as the method described in an a 4-page article entitled “Camera Calibration Toolbox for Matlab” at “http://www.vision.caltech. edu/ bouguetj/calib_doc/” as available on Apr. 10, 2012, which is incorporated by reference herein in its entirety. Several embodiments ofinformation transformer 141T (FIG. 5 ) are programmed to determine shapes and/or surface normals of surfaces ofobject 132 inimage 109, using one of the following two methods described in the next paragraph. - A first method uses a projection of light, e.g. as described in an 8-page article entitled “Dynamic scene shape reconstruction using a single structured light pattern” by Hiroshi Kawasaki et al, IEEE Conference on Computer Vision and Pattern Recognition 2008, which is incorporated by reference herein in its entirety. A second method also uses a projection of light, e.g. as described in a 13-page article entitled “Rapid Shape Acquisition Using Color Structured Light and Multi-pass Dynamic Programming” by Li Zhang et al, in Proc. Int. Symposium on 3D Data Processing Visualization and Transmission (3DPVT), 2002, which is incorporated by reference herein in its entirety.
- Referring to
FIG. 4A , certainother data FIG. 4A ) of some embodiments may include attributes to be used by object extractor 141O (FIG. 5 ) in identifying various other objects such as a object 132 (FIG. 2C ) having identifier ID2 (FIG. 4A ) or a cup (not shown) having identifier ID3, in animage 109 analyzed byprocessor 100 in act 432 (FIG. 4B ). As will be readily apparent to the skilled artisan in view of this detailed description, on completion ofact 432, one or more of object identifiers ID1, ID2 and ID3 uniquely identify withinprocessor 100, corresponding portable real world objects, namely a bottle cap, a book and a cup when these objects are imaged in animage 109 of ascene 130 by a camera ofdevice 120. - Although
image 109 illustrated inFIG. 4A includes a portion 132I that corresponds to the entirety ofobject 132, depending on the aspect of the described embodiments, another image that captures only a portion ofobject 132 may be sufficient forprocessor 100 to recognize object 132 (inact 432 ofFIG. 4B ). Moreover, depending on the aspect of the described embodiments,processor 100 may be programmed to operate as object extractor 141O to performact 432 to recognize additional information captured fromobject 132, as described above. - Several weak identification embodiments use
groups 446 inassociations associations processor 100 operating as aninformation retriever 141R (FIG. 5 ) with different types of portable real world objects and/or different information display software 141 (FIG. 4A ), depending on the programming of processor 100 (FIG. 4B ). - As noted above, associations 447 (
FIG. 4A ) may be set up in different ways indatabase 199, prior to their use byprocessor 100, depending on the embodiment. In several embodiments,processor 100 is programmed with software to operate as information identifier 141I (FIG. 5 ) that extracts user input (inuser input extractor 141E) and uses associations to generate anaddress 291 of information to be output (in object-user-input mapper 141M). Specifically, in some embodiments,processor 100 is programmed with software to operate asuser input extractor 141E to process animage 109 received inact 431 or to process additional image(s) received in other such acts or to process other information captured from scene 130 (e.g. audio signal of a drumming sound or a tapping sound, and/or whistling sound made by a user 135), so as to recognize a user's hand gesture inact 439A for use in initialization of an association. Optionally, after completion ofact 439A,processor 100 may return to act 432 (described above). - As noted above, depending on the embodiment, image 109 (or additional such images that are later captured) may include image portions corresponding to one or more human fingers, e.g. index finger 136 (
FIG. 1E ) andmiddle finger 137 are parts of ahuman hand 138, of aperson 135.Processor 100 is programmed to operate asuser input extractor 141E (FIG. 5 ) inact 439A to use predetermined information (not shown inFIG. 4A ) indatabase 199 to recognize in image 109 (FIG. 4A ) inmemory 110 certain user gestures (e.g. index and middle finger images 136I and 137I outstretched in human hand image 138I), and then use a recognized gesture to identify person 135 (e.g. as belonging to a specific group), followed by identification of information to be output. - Depending on the embodiment,
processor 100 may perform different acts afteract 439A, to identifyuser 135 as peract 433, e.g. inuser input extractor 141E. For example, in several embodiments,processor 100 is programmed to perform anact 439D to recognize a face of theuser 135 in another image from another camera. In several such embodiments,mobile device 120 includes two cameras, namely arear camera 121 that images object 132 and afront camera 152 that images a user's face. Moreover, such embodiments store in non-transitory storage media, feature vectors for human faces (“face features”) in a database similar or identical to feature vectors for hand gestures (“gesture features”). Accordingly,processor 100 is programmed (by instructions in one or more non-transitory computer readable storage media, such as a disk or ROM) to compare a portion of an image segmented therefrom and corresponding to a face ofuser 138, with a database of faces. A closest match resulting from the comparison identifies to processor 100 a user identifier, from among user identifiers of multiple faces in the database.Processor 100 then associates (e.g. inact 439C inFIG. 4B ) this user identifier with an object identifier e.g. byuser input extractor 141E (FIG. 5 ) preparing an association, in response to detection of a predetermined hand gesture adjacent to object 132. - An
act 439D (FIG. 4B ) may be followed byprocessor 100 performing another additional act, such as 439E to synchronize (or otherwise use) recognition of the user's face with the user gesture. Alternatively, in other embodiments, a user gesture recognized inact 439A may be used to identifyperson 135 byuser input extractor 141E looking up a table 451 ofFIG. 4A (similar to mapping 116 inFIG. 1D ) as peract 439B (FIG. 4B ). Afteracts 439D and 439E or alternatively afteract 439B,processor 100 performsact 439C (seeFIG. 4B ) to associate a user identifier ofperson 135 with portablereal world object 132. A user identifier that is used inact 439C depends on whether strong or weak identification is implemented byprocessor 100, forreal world object 132. - Referring back to
FIG. 4B , after performance of act 434 to obtain to-be-projected information 119 using at least the user identifier,processor 100 is optionally programmed to operate asinformation transformer 141T to performact 435. Specifically, inact 435,processor 100 ofmobile device 120 identifies from among a group ofpredetermined techniques specific technique 461 to transform the obtainedinformation 188 for projection intoscene 130. For example, transformation technique 461 (FIG. 4A ) is to project onobject 132, whereastransformation technique 462 is to project adjacent to object 132, and one of these techniques is identified prior to projection ofinformation 188. In some embodiments, thespecific technique 461 is selected (and therefore identified) automatically based on one or more surface properties ofreal world object 132 as determined byprocessor 100, such as surface roughness (or smoothness), orientation of surface normal, color, opacity, etc, whereas inother embodiments processor 100 uses user input (e.g. in the form of spoken words) that explicitly identify a specific technique to be used. - Then, as per
act 436 inFIG. 4B ,information transformer 141T uses the specific technique that was identified (e.g. on-object technique 461) to transform information 188 (or 189) inmemory 110, and then supply transformedinformation 188T (or 189T) resulting from use of the specific technique to a projector 122 (FIG. 1E ). Next, as peract 437 inFIG. 4B ,projector 122 ofmobile device 120 projects onobject 132 inscene 130 the transformedinformation 188T (or 189T), which is thereby displayed onobject 132 as illustrated inFIG. 1G . - Various embodiments of
information transformer 141T (FIG. 5 ) may perform any steps using one ormore transformation techniques information 188 for output to a user. For example, as illustrated inFIG. 1H ,information 189 may be transformed by another technique into afirst component 189A of transformedinformation 189T that is projected onto object 132 (namely the text string “Group 1”), and asecond component 189B of the transformedinformation 189T that is projected adjacent to object 132 (namely the text string “Score 0”). - Referring back to
FIG. 4B , inact 437processor 100 operates projector 122 (FIG. 5 ) to project the transformedinformation 188T (or 189T) on to object 132 that is identified by the object identifier that was obtained in act 432 (described above). Thereafter, inact 438,processor 100 operating asuser input extractor 141E receives and processes additional images in a manner similar to that described above, e.g. to receiveuser input 482 and store it in memory 110 (FIG. 4A ), by recognizing one or more parts of an additional image that includes transformedinformation scene 130. The parts of the additional image that are recognized may be, for example, anotherhand gesture 117 in which only one finger namelyindex finger 136 is outstretched as illustrated inFIG. 1H . On recognition of this hand gesture 117 (only index finger outstretched),processor 100 operating asuser input extractor 141E may determine that the user is now part ofGroup 1, and therefore now obtainsinformation 189 of Group 1 (seeFIG. 4A ) in act 434 (FIG. 4B ), followed by output to the user (e.g. by transformation and projection) as per one or more of acts 435-437 described above. - In some embodiments,
processor 100 uses recognition of a bottle cap as the portablereal world object 132 to invoke execution ofinformation transformer 141T (e.g. instructions to perform acts 431-438 and 439A-439E) from among multiplesuch softwares software 141 is generic to multiple objects, although in other embodiments software 141 (also called information display software) is customized for and individually associated with corresponding objects, such as, for example, abook software 442S (FIG. 4A ) associated with a book identified by ID2 and acup software 443S (FIG. 4A ) associated with a cup identified by ID3 as described above in reference toFIG. 4A . - Note that although table 451 is described above for use with user input in the form of a hand gesture to identify a user, such a table 451 can alternatively be looked up by
processor 100 using an identifier ofobject 132, depending on how table 451 is set up, in various embodiments described herein. Use of table 451 with object identifiers enables “strong” identification in some embodiments of information identifier 141I, wherein aperson 135 identifies to processor 100 (ahead of time), anobject 132 that is to be uniquely associated with his/her identity. Other embodiments ofprocessor 100 use both an object identifier as well as user input to look up another table 220, which enables “weak” identification as described herein. - Some embodiments implement one or more acts of
FIG. 4B by performing one or more acts of the type described below in reference toFIGS. 4C and 4D . Note that the acts ofFIGS. 4C and 4D described below can alternatively be performed in other embodiments that do not perform any act ofFIG. 4B . Some embodiments ofprocessor 100 are programmed to operate as object extractor 141O to track portable real world objects that may be temporarily occluded from view of rear-facingcamera 121 that capturesimage 109, by performing an act 411 (FIG. 4C ) to check theimage 109 for presence of each object in a set of objects (indatabase 199 ofFIG. 4A ) and to identify a subset of these objects as being initially present inimage 109. - Moreover in
act 412,processor 100 of some embodiments adds an identifier of anobject 132 in the subset to a list 498 (FIG. 4A ), and starts a timer for that identifier, and the timer starts incrementing automatically from 0 at a preset frequency (e.g. every millisecond). Therefore, if there are multiple identifiers inlist 498 for multiple objects, then correspondingly multiple timers are started, by repeated performance ofact 412 for each object in the subset. Next, inact 413,processor 100 checks iflist 498 is empty. Aslist 498 was just populated, it is not empty at this stage and thereforeprocessor 100 performs act 414 (FIG. 4C ) and then returns to act 413. Wheneverlist 498 becomes empty, processor goes fromact 413 via the yes branch to act 411 (described above). - In act 414 (
FIG. 4C ), additional images are captured intomemory 110, and processed byprocessor 100 in the manner described herein. For example,processor 100 scans throughlist 498 to check if each object identified in thelist 498 is found in the additional image. When an identifier inlist 498 identifies anobject 132 that is recognized to be present in the additional image,processor 100 resets the timer (started inact 412 as noted above) which starts incrementing automatically again from 0. When an identifier inlist 498 identifies an object that is not recognized in the additional image, and if its timer has reached a preset limit (e.g. 10,000 milliseconds) thenprocessor 100 removes the identifier from the list and stops the corresponding timer. - Accordingly, when an
object 132 is absent from view ofcamera 121 for more than the preset limit, theobject 132 is no longer used in the manner described above, to retrieve and display information. Thus, use of a timer as illustrated inFIG. 4C and described above reduces the likelihood that a display of information (e.g. the user's emails) is interrupted whenobject 132 that triggered the information display is accidentally occluded from view ofcamera 121, e.g. for the period of time identified in the preset limit. Hence, when auser 135 inadvertently places ahand 138 or other such item between acamera 121 andobject 132, the output ofinformation 119T is not stopped, until the preset limit of time passes, which can eliminate jitter from a projection or other display ofinformation 119T, as described herein. The preset limit of time in some embodiments is set byprocessor 100, based on one or more input(s) fromuser 135, and hence a different value can be set byuser 135 depending on location e.g. a first limit used at a user's home (or office) and a second limit (lower than the first limit) in a public location. - Referring to
FIG. 4D , some embodiments ofprocessor 100 are optionally programmed to operate asinformation transformer 141T to perform act 421 (FIG. 4D ) to compute a value of a property ofobject 132 identified inimage 109, such as size of a surface, shape of the surface, orientation of surface normal, surface smoothness, etc. Then inact 422,processor 100 checks if the property's value satisfies a predetermined test on feasibility for projection on to object 132. For example,processor 100 may check if the surface area of a surface ofobject 132 is large enough to accommodate the to-be-projected information, and/or ifobject 132 has a color that is sufficiently neutral for use as a display, and/or a normal at the surface ofobject 132 is oriented within a preset range relative to aprojector 122. Such feasibility tests are designed ahead of time, and programmed intoprocessor 100 to ensure that the to-be-projected information is displayed in manner suitable foruser 135 e.g. font size is legible. Numerous other feasibility tests, for information projection on to anobject 132 in real world, will be readily apparent, in view of this detailed description. - If the answer in
act 422 is yes,processor 100 goes to act 423 and uses a first technique 461 (FIG. 4A ) to generate transformedinformation 119T (FIG. 5 ) for projection on to object 132, followed by act 437 (FIG. 4B ). Depending on the embodiment,first technique 461 may transforminformation 119 based on an orientation (in the three angles, pitch, yaw and roll) of a surface ofobject 132 relative to orientation ofprojector 122 to ensure legibility wheninformation 119T is rendered onobject 132. A specific manner in whichinformation 119 is transformed can be different in different embodiments, and in some embodiments there is no transformation e.g. when theinformation 119 is to be displayed on a screen of mobile device 120 (as shown inFIG. 1B ). If the answer inact 422 is no,processor 100 goes to act 424 and uses a second technique 462 (FIG. 4A ) to generate transformedinformation 119T for projection adjacent to (but not on to)object 132. After performing one ofacts processor 100 then goes to act 437 (described above in reference toFIG. 4B ). - In some of the described embodiments, one or more of acts 421-424 described above are performed as described in U.S. application Ser. No. ______, Attorney Docket No. Q111570U2os, filed concurrently herewith, and entitled “Dynamic Selection of Surfaces In Real World For Projection of Information Thereon” which has been incorporated by reference above.
- Note that although the description of certain embodiments refers to
processor 100 being a part of amobile device 120 as an illustrative example, in other embodiments such aprocessor 100 may be partially or wholly included in one or more other processor(s) and/or other computer(s) that interoperate(s) with such amobile device 120, e.g. by exchanging information therewith via a cellular link or a WiFi link. Moreover, although onecamera 121 is shown inFIG. 1E , depending on the embodiment, one or more cameras (seeFIG. 5 ) may be used. Hence, although certain acts illustrated inFIG. 4B are described for some embodiments as being performed bymobile device 120, some or all of acts inFIG. 4B may be performed by use of one or more computers and/or one or more processors and/or one or more cameras. Therefore, it is to be understood that several such embodiments use one or more devices to perform such act(s), either alone or in combination with one another. -
Processor 100 which is programmed with software inmemory 110 as described above in reference toFIG. 4B and/orFIGS. 4C and 4D may be included in amobile device 120 as noted above.Mobile device 120 may be any device that includes aprojector 122 and/or acamera 121, anddevice 120 may include additional parts that are normally used in any hand held device, e.g. sensors, such as accelerometers, gyroscopes or the like, which may be used in one or more acts described above, e.g. in determining the pose ofmobile device 120 relative to object 132 in the real world. - In performing the method of
FIG. 4B to project information into a scene as described above, there might be different interaction metaphors used.User input 482 that is generated from images captured by camera 121 (FIG. 1E ) allows a user to reach intoscene 130 and manipulatereal world object 132 directly, as opposed to on-screen based interaction, where users interact by directly touching ascreen 151 of mobile device 150 (FIG. 1B ). Specifically, when image-based user supplied information is obtained as input, methods of the type described above in reference toFIG. 4B enable a user to use his hands inscene 130 with information projected into the real world, as the user is supplying input which changes the information being projected intoscene 130. - User interfaces in information projected into a scene as described herein can have a broad range of applications. Specifically, projected user interfaces can be used to generate user input 482 (
FIG. 4A ) by projectinginformation 322 includingscreen 321 and keyboard image 333 (FIG. 3C ) similar to real world typing using a real keyboard. A projected user interface allows a user to supply input to select between different software for execution and display of projected information and/or select between different groups of users to play games, and in general to specify various parameters to software being executed by aprocessor 100 that generates the information which is projected into scene 130 (e.g. seeFIG. 1G ). - Hence, several embodiments of
mobile device 120 as described herein reverse a flow of information between (A) user interfaces and (B) user input (relative to a conventional flow). Specifically, in several examples of the type noted above in reference toFIGS. 2A-2C , instead of users explicitly looking for information to be displayed, several embodiments ofdevice 120 automatically obtain and display interactive information, e.g. by projection on real world surfaces using an embedded mobile projector. Other embodiments may display information as described herein on a screen that is supported on an eye-glass frame worn by a user, for example. Still other embodiments may display information as described herein on a screen that forms an integral portion of a smart phone (such as a touch screen), in the normal manner. - Various descriptions of implementation details of some embodiments of
mobile device 120 are merely illustrative and not limiting. For example, depending on the embodiment, any method may be used bymobile device 120 to receive input from a user, e.g. an IR camera may be used to receive user input in the form of hand gestures. Moreover, various types of hand gesture recognition systems may be implemented in several embodiments of amobile device 120 as described herein. In certain embodiments, an embedded projector inmobile device 120 projects a cell phone's normal display on everyday surfaces such as a surface of a wall or a surface of a desk, with which a user interacts using hand gestures. - It should be understood that
mobile device 120 may be any electronic device that is portable by hand, such as a cellular or other wireless communication device, personal communication system (PCS) device, personal navigation device (PND), Personal Information Manager (PIM), Personal Digital Assistant (PDA), laptop, tablet, or an eye glass frame that supports a display to be worn on a person's face, a headset, a camera, or other suitable mobile device that is capable of imagingscene 130 and/or projecting information intoscene 130. In some embodiments, asingle device 120 includes bothcamera 121 andprojector 122 whereas in other embodiments one such device includescamera 121 and another such device includesprojector 122 and both devices communicate with one another either directly or via a computer (not shown). - In several embodiments, a prototype of
mobile device 120 is built with custom hardware (PCB) board taped onto the back of a smartphone (e.g. GALAXY NEXUS available from Samsung Electronics Co. Ltd). One such approach performs computation in the infrared (IR) spectrum. In some such embodiments, hand and body tracking is robust and very accurate, although additional hardware may be integrated within such asmartphone 120 used for display of information. Some embodiments ofdevice 120 use IR sensors (e.g. in an IR camera) that have been proven to work on commercially successful platforms, such as the Xbox Kinect. Certain embodiments ofdevice 120 implement augmented reality (AR) applications using marker patterns, such as checkerboard for camera calibration and detection of objects within a scene of real world, followed by use of object identifiers to display information, as described herein. - Depending on the embodiment,
mobile device 120 may be programmed withsoftware 141 that uses a mobile-projector system in combination with a camera. An embeddedprojector 122 is used in such embodiments to displayinformation 119T on everyday surfaces such as a wall, with which users interact using hand gestures. Also in some embodiments,mobile device 120 is operatively coupled to anexternal IR camera 1006 that tracks an IR laser stylus (not shown), or gloves with one or more IR LEDs 1121, 1122 (FIG. 5 ) mounted at the finger tips (also called IR gloves). -
External IR camera 1006 is used in some embodiments in a manner similar or identical to receipt of IR images and tracking of objects within the images by use of an IR camera in a remote control device (also called “Wiimote” or “Wii Remote”) for gaming console Wii, available from Nintendo Co. Ltd. So,IR camera 1006 may be used in some embodiments as described in a section entitled “Tracking Your Fingers with the Wiimote” in a 2-page article available at http://johnnylee.net/projects/ wii/ as available on Apr. 9, 2012, which is incorporated by reference herein in its entirety. Alternatively, some non-IR embodiments ofdevice 120 use one or more normal RGB (red-green-blue) CMOS cameras 121 (FIG. 5 ) to capture an image ofscene 130 includingobject 132. - An object extractor 141O in a
mobile device 120 of the type described herein may use any known object recognition method, based on “computer vision” techniques. Such amobile device 120 may also include means for controlling operation of a real world object 132 (that may be electronic) in response to user input of the type described above such as a toy equipped with an IR or RF transmitter or a wireless a transmitter enabled to receive and/or transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network. - As illustrated in
FIG. 5 ,mobile device 120 may additionally include agraphics engine 1004 to generateinformation 119 to be output, animage processor 1005 to process image(s) 109 and/or transforminformation 119, and a read only memory (ROM) 1007 to store firmware and/or constant data.Mobile device 120 may also include adisk 1008 to store software and/ordatabase 199 for use byprocessor 100.Mobile device 120 may further include a wireless transmitter andreceiver 1010 and/or anyother communication interfaces 1009,sensors 1003, atouch screen 1001 orother screen 1002, a speaker 1111 and a microphone 1112. - Some embodiments of
user input extractor 141E (FIG. 5 ) sense user input in the form of hand gestures in images generated by an infra-red (IR) camera that tracks an IR laser stylus or gloves with IR LEDs (also called IR gloves), while certain other embodiments ofuser input extractor 141E sense hand gestures using an existing camera (e.g. in a normal cell phone) that captures images of a user's fingers. Specifically, in some embodiments (e.g. IntuoIR), an external PCB board 1130 (FIG. 5 ) having mounted thereon an ARM Cortex processor (not shown) is interfaced with an IR camera 1006 (FIG. 5 ) and a Bluetooth module (not shown). Hand tracking data fromIR camera 1006 is sent via Bluetooth to asmartphone 1140 indevice 120 that has a touch screen 1001 (e.g. HTC Explorer available from HTC Corporation). - Hence, certain embodiments of
device 120 includePCB board 1130 mounted on and operatively coupled to a smartphone 1140 (FIG. 5 ).PCB board 1130 includesIR camera 1006, such as mbed LPC1768 available from Foxconn, e.g. Mbed as described at http://mbed.org/nxp/lpc1768/ and a Bluetooth chipset e.g. BlueSMiRF Gold as described at http://www.sparkfun.com/products/582. Mbed (ARM processor) is used in some embodiments ofPCB 1130 to collect data fromIR camera 1006 and transmit co-ordinates of brightest points to smartphone 1140 (e.g. via Bluetooth link 1131). - In some embodiments, user input captured in images by a camera is extracted therefrom by the smartphone in
device 120 performing gesture recognition on data received from an infra-red (IR) sensor, as described in an article entitled “iGesture: A General Gesture Recognition Framework” by Signer et al, In Proc. ICDAR '07, 5 pages which is incorporated by reference herein in its entirety. - Hence, some embodiments of
user input extractor 141E (FIG. 5 ) operate with a user wearing infra-red (IR) gloves that are identified in another image generated by an IR camera. An IR camera of such embodiments may be externally coupled to a smartphone inmobile device 120 in some embodiments while in other embodiments the IR camera is built into the smartphone. Some embodiments operate with theuser 135 using anIR laser stylus 1135 whose coordinates are detected bydevice 120 in any manner known in the art. Still other embodiments ofuser input extractor 141E (FIG. 5 ) receive user input in other forms as noted above, e.g. as audio input from microphone 1112. - In certain embodiments,
user input extractor 141E (FIG. 5 ) processes a frame of video captured by a camera to obtain user input in the form of hand gestures, by segmenting each image into one or more areas of interest, such as a user's hands. Any known method can be modified for use inuser input extractor 141E as described herein, to remove background noise, followed by identification of a portion of the image which contains the user's hand, which is then used to generate a binary image (also called a “blob”). A next step in some embodiments ofuser input extractor 141E is to calculate locations (e.g. coordinates) of the user's fingers within the blob. - An
IR camera 1006 is not used in certain embodiments wherein a normal RGB camera is used instead to generate one ormore images 109 which contain user input. The user input is extracted fromimages 109 byuser input extractor 141E (FIG. 5 ) performing one of two methods as follows. A first method is of the type described in a 4-page article entitled “HandVu: Vision-based Hand Gesture Recognition and User Interface” at http://www.movesinstitute.org/˜kolsch /HandVu/HandVu.html” as available on Apr. 9, 2012, which is incorporated by reference herein in its entirety. A second method is of the type described in another 4-page article entitled “A Robust Method for Hand Gesture Segmentation and Recognition Using Forward Spotting Scheme in Conditional Random Fields” by Mahmoud Elmezain, Ayoub Al-Hamadi, and Bernd Michaelis, in International Conference on Pattern Recognition, 2010, which is incorporated by reference herein in its entirety. Hence, such embodiments that use an existing RGB camera in a normal smartphone may use a combination of skin segmentation, graph cut and recognition of hand movement, to detect hand gestures. - For recognition of hand gestures, some embodiments of
user input extractor 141E (FIG. 5 ) are designed to use a supervised learning approach in an initialization phase ofdevice 120. In the supervised learning approach,user input extractor 141E learns different gestures from input binary images (e.g. consisting of a user's hands) during initialization, and generates a mathematical model to be used to identify gestures in images generated during normal operation (after the initialization phase) by using Support Vector Machines (SVM) of the type known in the prior art. Experimental results show that several methods of the type presented herein work well in real time and under changing illumination conditions. - As noted above, some embodiments use an infrared (IR) camera 1006 (
FIG. 5 ) to extract portions of animage 109 that correspond to a user's fingers as blobs. In several embodiments, a user holds an IR light source, such as a laser pointer or alternatively the user wears IR gloves. Specifically, in certain embodiments, a user wears on a hand 138 (FIG. 1E ) a glove (not shown) with an IR LED 1121, 1122 (FIG. 5 ) on eachfinger 136, 137 (FIG. 1E ). Detection of position of one IR LED 1121 (FIG. 5 ) on left index finger 136 (FIG. 1E ) by IR camera 1006 (FIG. 5 ) is used with detection of another IR LED 1122 (FIG. 5 ) on middle finger 137 (FIG. 1E ) also by the IR camera 1006 (FIG. 5 ) to identify as a blob, a human hand image 138I (FIG. 4A ) inimage 109. - After
device 120 detects one or more such blob(s), gesture recognition is performed byprocessor 100 executing software to operate asuser input extractor 141E (FIG. 5 ) as described herein. Specifically, in some embodiments ofmobile device 120, co-ordinates ofIR LEDs 1121 and 1122 generated byIR camera 1006 are used byprocessor 100 to identify a blob (e.g. human hand image 138I inFIG. 4A ) corresponding to ahand 138 in animage 109 from an RGB camera 121 (which is thereby additionally used), and the blob is then used byuser input extractor 141E to generate features (also called feature vectors such as a “swipe” gesture or a “thumbs up” gesture) that are then matched to corresponding features in a database, to identify as user input, a hand gesture inimage 109. - Some embodiments extract blobs in two-dimensional (2D) space due to limitations inherent in design (for a similar setup, see the 2-page article described above in reference to http://johnnylee.net/projects/wii/). Certain embodiments of
user input extractor 141E (FIG. 5 ) perform blob detection in 3D space, using a depth camera. In some embodiments (called “IntuoIR”), images from anIR camera 1006 are used to translate hand movements to specific co-ordinates within native applications running on a smartphone included indevice 120. Several embodiments implement a simple camera calibration technique, similar to the techniques described in the 2-page article at http://johnnylee.net/projects/wii/. - Some embodiments of
mobile device 120 generate a depth map of a scene by use of a 3D Time-of-Flight (TOF) camera of the type known in the prior art. A Time-of-Flight camera is used in certain embodiments, to measure a phase difference between photons coming onto a sensor, which in turn provides a distance between the sensor and objects in the scene. - Other embodiments of
device 120 also use Time-of-flight cameras, e.g. as described by Freedman inUS Patent Publication 2010/0118123 entitled “Depth Mapping using Projected Patterns” which is incorporated by reference herein in its entirety. Such embodiments ofdevice 120 may use projector 122 (FIG. 5 ) to shine an infrared (IR) light pattern onobject 132 in the scene. A reflected light pattern is observed in such embodiments by adepth camera 121, which generates a depth map. Hence, in some embodiments ofmobile device 120, a depth map is used to enhance segmentation of image to identify areas that contain a user's face, the user's hand and/or one or more objects in an image received from an RGB camera 121 (which is thereby additionally used). - A
device 120 of some embodiments usesprojector 122 to projectinformation 119T onto a surface ofobject 132, followed by capture of a user's finger movements as follows. By using a laser pointer (or gloves with IR LEDs on fingers) and the IR camera, such embodiments implement a motion capture system. Depending on a region where information is projected, anIR camera 1006 is calibrated. After camera calibration, certain embodiments generate a one-to-one mapping between the screen resolution ofdevice 120 and the user's hand movements. When there is an IR light point inside a projected display area, theIR camera 1006 captures the brightest IR point. The coordinates of such points are processed inside an application layer or kernel layer. Based on the processed data, the user input is determined byprocessor 100. - Experimental results for a method using
IR camera 1006 coupled to or included indevice 120 of some embodiments were obtained with the number of samples or “frame rate” of theIR camera 1006 at approximately 120 samples per second, for use in gesture recognition in real time. Two factors that affect the performance of some embodiments ofmobile device 120 are: distance (betweenIR camera 1006 and the IR LEDs 1121, 1122) and light conditions. WhenIR camera 1006 faces high intensity light source, false coordinates are generated 87.18% of time. And when a prototype of the type described herein (FIG. 5 ) is present in ambient environment, there is nearly no noise observed. More importantly, if a room is uniformly lit at any intensity, the noise is close to 0%. Several of these tests were performed with an IR light source 1121 directly facing theIR camera 1006. - The above description presented certain methods for hand gesture recognition that are used in some embodiments of the type described herein. As noted above, one of the methods (IntuoIR) is based on tracking IR light sources, e.g. IR LEDs 1121 and 1122 (
FIG. 5 ) mounted at finger tips of a glove. One limitation of some embodiments is a need for external hardware (i.e. hardware not present in a conventional smartphone), such as an IR camera 1006 (FIG. 5 ). Certain embodiments ofmobile device 120 use one or more Infrared time-of-flight (TOF) camera(s) 1006 instead of or in addition to a CMOSinfrared camera 1006. In some such embodiments, background noise may be present inimages 109 being captured and filtered bydevice 120. Such embodiments may utilize a frame buffer of a screen inmobile device 120 and perform stereo correspondence to reduce such noise. Several embodiments ofdevice 120 implement any known techniques to reduce background noise arising from use of stereo cameras (e.g. 3D cameras) 121. -
Mobile device 120 of several described embodiments may also include means for remotely controlling a real world object which may be a toy, in response to user input e.g. by use of transmitter intransceiver 1010, which may be an IR or RF transmitter or a wireless a transmitter enabled to transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network. Of course,mobile device 120 may include other elements, such as a read-only-memory 1007 which may be used to store firmware for use byprocessor 100. - Also, depending on the embodiment, various functions of the type described herein may be implemented in software (executed by one or more processors or processor cores) or in dedicated hardware circuitry or in firmware, or in any combination thereof. Accordingly, depending on the embodiment, any one or more of object extractor 141O, information identifier 141I,
information retriever 141R,information transformer 141T andsegmentation module 141S illustrated inFIG. 5 and described above can, but need not necessarily include, one or more microprocessors, embedded processors, controllers, application specific integrated circuits (ASICs), digital signal processors (DSPs), and the like. The term processor is intended to describe the functions implemented by the system rather than specific hardware. Moreover, as used herein the term “memory” refers to any type of computer storage medium, including long term, short term, or other memory associated with the mobile platform, and is not to be limited to any particular type of memory or number of memories, or type of media upon which memory is stored. - Hence, methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in firmware in ROM 1007 (
FIG. 5 ) or software, or hardware or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof. For a firmware and/or software implementation, the methodologies may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. - Any machine-readable medium tangibly embodying computer instructions may be used in implementing the methodologies described herein. For example, software 141 (
FIG. 5 ) may include program codes stored inmemory 110 and executed byprocessor 100.Memory 110 may be implemented within or external to theprocessor 100. If implemented in firmware and/or software, the functions may be stored as one or more computer instructions or code on a computer-readable medium. Examples include nontransitory computer-readable storage media encoded with a data structure (such as a sequence of images) and computer-readable media encoded with a computer program (such assoftware 141 that can be executed to perform the method ofFIGS. 1C , 2A, 3A, 3F, and 4B-4D). - Computer-readable media includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, Flash Memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store program code in the form of software instructions (also called “processor instructions” or “computer instructions”) or data structures and that can be accessed by a computer; disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- Although the present invention is illustrated in connection with specific embodiments for instructional purposes, the present invention is not limited thereto. Hence, although
item 120 shown inFIG. 5 of some embodiments is a mobile device, inother embodiments item 120 is implemented by use of form factors that are different, e.g. in certainother embodiments item 120 is a mobile platform (such as a tablet, e.g. iPad available from Apple, Inc.) while in stillother embodiments item 120 is any electronic device or system. Illustrative embodiments of such an electronic device orsystem 120 may include multiple physical parts that intercommunicate wirelessly, such as a processor and a memory that are portions of a stationary computer, such as a lap-top computer, a desk-top computer, or a server computer communicating over one or more wireless link(s) with sensors and user input circuitry enclosed in a housing that is small enough to be held in a hand. - Although several aspects are illustrated in connection with specific embodiments for instructional purposes, various embodiments of the type described herein are not limited thereto. Various adaptations and modifications may be made without departing from the scope of the described embodiments. Therefore, the spirit and scope of the appended claims should not be limited to the foregoing description.
Claims (24)
1. A method comprising:
receiving from a camera an image of a scene, the scene comprising an object in real world, the object being portable by hand;
processing the image using data on a plurality of predetermined portable real world objects, to detect at least a portion of the image corresponding to the object and obtain an object identifier identifying the object uniquely among the plurality;
one or more processors using the object identifier with at least an association in a set of associations to lookup a user identifier that identifies a user at least partially among a plurality of users; and
using at least the user identifier to obtain and store in computer memory, information to be output to the user identified by the user identifier.
2. The method of claim 1 wherein:
the user identifier identifies from among the plurality of users, a first group of users including the user;
the information to be output is common to all users in the first group; and
the plurality of users includes at least a second group of users different from the first group.
3. The method of claim 1 further comprising:
detecting in the image, a gesture adjacent to the object;
the one or more processors using the gesture to look up the user identifier from a mapping between hand gestures and user identifiers; and
preparing the association in the set, to associate the user identifier with the object identifier, in response to the detecting.
4. The method of claim 1 wherein:
the user identifier uniquely identifies the user among the plurality of users; and
the information is specific to the user.
5. The method of claim 1 further comprising:
displaying an authentication screen adjacent to the object in the scene;
using user input in the authentication screen to determine the user identifier; and
preparing the association in the set, to associate the user identifier with the object identifier, in response to the user input being authenticated.
6. The method of claim 1 wherein the portion is hereinafter first portion, and the image is hereinafter first image, the method further comprising:
segmenting from a second image obtained from a second camera, a second portion thereof corresponding to a face of the user;
comparing the second portion with a database of faces, to obtain the user identifier selected from among user identifiers of faces in the database; and
preparing the association in the set, to associate the user identifier with the object identifier, in response to detecting a predetermined gesture adjacent to the object.
7. The method of claim 1 further comprising:
projecting at least a portion of the information into the scene.
8. The method of claim 1 wherein said processing the image comprising:
checking the image for presence of each object among the plurality of predetermined portable real world objects, to identify a subset of the plurality of predetermined portable real world objects;
identifying in a list each object in the subset and starting a timer for the each object in the subset; and
while the list is not empty, repeatedly capturing additional images of the scene and scanning through the list to determine whether each object identified therein is present in each additional image and if so resetting the timer for the each object in the list and removing an object from the list when the timer for the object reaches a preset limit; and
when the list is empty, returning to the checking.
9. One or more non-transitory computer readable storage media comprising:
instructions to one or more processors to receive an image of a scene from a camera, the scene comprising an object in real world, the object being portable by hand;
instructions to the one or more processors to process the image using data on a plurality of predetermined portable real world objects, to obtain an object identifier uniquely identifying the object from among the plurality;
instructions to the one or more processors to use the object identifier with at least an association in a set of associations to obtain a user identifier that identifies a user of the object at least partially among a plurality of users; and
instructions to the one or more processors to obtain and store in computer memory, information to be output to the user, by using at least the user identifier.
10. The one or more non-transitory computer readable storage media of claim 9 wherein:
the user identifier identifies from among the plurality of users, a first group of users including the user; and
the information is common to all users in the first group.
11. The one or more non-transitory computer readable storage media of claim 9 further comprising:
instructions to the one or more processors to detect a gesture adjacent to the object in the scene;
instructions to the one or more processors to use the gesture to look up the user identifier from a mapping of hand gestures to user identifiers; and
instructions to the one or more processors to prepare the association in the set, to associate the user identifier with the object identifier.
12. The one or more non-transitory computer readable storage media of claim 9 wherein:
the user identifier uniquely identifies the user in the plurality of users; and
the information is unique to the user.
13. The one or more non-transitory computer readable storage media of claim 9 further comprising:
instructions to display an authentication screen adjacent to the object in the scene; and
instructions to use user input in the authentication screen to determine the user identifier; and
instructions to prepare the association in the set, to associate the user identifier with the object identifier, in response to detecting the user input to be authenticated.
14. The one or more non-transitory computer readable storage media of claim 9 wherein the portion is hereinafter first portion, and the image is hereinafter first image, the one or more non-transitory computer readable storage media further comprising:
instructions to the one or more processors to segment from a second image obtained from a second camera, a second portion thereof corresponding to a face of the user;
instructions to the one or more processors to compare the second portion with a database of faces to identify the user identifier from among user identifiers of faces in the database; and
instructions to the one or more processors to prepare the association in the set, to associate the user identifier with the object identifier, in response to detecting the user performing a predetermined gesture adjacent to the object.
15. The one or more non-transitory computer readable storage media of claim 9 wherein:
the instructions to output comprise instructions to project the portion of the information into the scene.
16. The one or more non-transitory computer readable storage media of claim 9 wherein the instructions to the one or more processors to process the image comprise:
instructions to check the image for presence of each object among the plurality of predetermined portable real world objects, to identify a subset of the plurality of predetermined portable real world objects;
instructions to identify in a list each object in the subset and starting a timer for the each object in the subset; and
while the list is not empty, instructions to repeatedly capture additional images of the scene and scanning through the list to determine whether each object identified therein is present in each additional image and if so resetting the timer for the each object in the list and removing an object from the list when the timer for the object reaches a preset limit; and
instructions responsive to the list being empty, to return to the instructions to check.
17. One or more devices comprising:
a camera;
one or more processors, operatively coupled to the camera;
memory, operatively coupled to the one or more processors; and
software held in the memory that when executed by the one or more processors, causes the one or more processors to:
receive an image of a scene from the camera, the scene comprising an object in real world, the object being portable by hand;
process the image using data on a plurality of predetermined portable real world objects, to obtain an object identifier uniquely identifying the object from among the plurality;
use the object identifier with at least an association in a set of associations to obtain a user identifier that identifies a user of the object at least partially among a plurality of users; and
obtain and store in the memory, information to be displayed, by using at least the user identifier.
18. The one or more devices of claim 17 wherein the software further causes the one or more processors to:
detect a gesture adjacent to the object in the scene;
use the gesture to look up the user identifier from a mapping of hand gestures to user identifiers; and
prepare the association in the set, to associate the user identifier with the object identifier.
19. The one or more devices of claim 17 wherein the software further causes the one or more processors to:
display an authentication screen adjacent to the object in the scene; and
use user input in the authentication screen to determine the user identifier; and
prepare the association in the set, to associate the user identifier with the object identifier, in response to detecting the user input to be authenticated.
20. The one or more devices of claim 17 wherein the portion is hereinafter first portion, and the image is hereinafter first image and the software further causes the one or more processors to:
segment from a second image obtained from a second camera, a portion thereof corresponding to a face of the user;
compare the second portion with a database of faces to identify the user identifier from among user identifiers of faces in the database; and
prepare the association in the set, to associate the user identifier with the object identifier, in response to detecting the user performing a predetermined gesture adjacent to the object.
21. The one or more devices of claim 17 wherein the software that causes the one or more processors to process the image comprises software to cause the one or more processors to:
check the image for presence of each object in the plurality of predetermined portable real world objects, to identify a subset therein as being present in the image;
add an identifier of each object in the subset to a list, starting a timer for each identifier of each object in the subset; and
while the list is not empty, repeatedly scan through the list to determine if each object identified therein is present in each additional image of the scene and if so resetting the timer for each identifier in the list and removing any identifier from the list when the timer for said any identifier reaches a preset limit; and
when the list is empty, repeat the check.
22. A system comprising a processor coupled to a memory and a camera, the system comprising:
means for processing an image of a scene received from the camera, the scene comprising an object in real world, the object being portable by hand, the image being processed by using data on a plurality of predetermined portable real world objects, to obtain an object identifier uniquely identifying the object from among the plurality;
means for using the object identifier with at least an association in a set of associations to obtain a user identifier that identifies a user of the object at least partially among a plurality of users; and
means for obtaining and storing in the memory, information to be displayed, by using at least the user identifier.
23. The system of claim 22 further comprising:
means for detecting a gesture adjacent to the object in the scene;
means for using the gesture to look up the user identifier from a mapping of hand gestures to user identifiers; and
means for preparing the association in the set, to associate the user identifier with the object identifier.
24. The system of claim 22 further comprising:
means for displaying an authentication screen adjacent to the object in the scene;
means for using user input in the authentication screen to determine the user identifier; and
means for preparing the association in the set, to associate the user identifier with the object identifier, in response to detecting the user input to be authenticated.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/549,339 US20130044912A1 (en) | 2011-08-19 | 2012-07-13 | Use of association of an object detected in an image to obtain information to display to a user |
PCT/US2012/046816 WO2013028279A1 (en) | 2011-08-19 | 2012-07-14 | Use of association of an object detected in an image to obtain information to display to a user |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161525628P | 2011-08-19 | 2011-08-19 | |
US13/549,339 US20130044912A1 (en) | 2011-08-19 | 2012-07-13 | Use of association of an object detected in an image to obtain information to display to a user |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130044912A1 true US20130044912A1 (en) | 2013-02-21 |
Family
ID=47712374
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/549,339 Abandoned US20130044912A1 (en) | 2011-08-19 | 2012-07-13 | Use of association of an object detected in an image to obtain information to display to a user |
US13/549,388 Active 2033-07-24 US9245193B2 (en) | 2011-08-19 | 2012-07-13 | Dynamic selection of surfaces in real world for projection of information thereon |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/549,388 Active 2033-07-24 US9245193B2 (en) | 2011-08-19 | 2012-07-13 | Dynamic selection of surfaces in real world for projection of information thereon |
Country Status (7)
Country | Link |
---|---|
US (2) | US20130044912A1 (en) |
EP (1) | EP2745237B1 (en) |
JP (2) | JP2014531787A (en) |
KR (1) | KR101575016B1 (en) |
CN (1) | CN103875004B (en) |
IN (1) | IN2014MN00316A (en) |
WO (2) | WO2013028279A1 (en) |
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120051625A1 (en) * | 2010-08-23 | 2012-03-01 | Texas Instruments Incorporated | Method and Apparatus for 2D to 3D Conversion Using Scene Classification and Face Detection |
US20140075349A1 (en) * | 2012-09-10 | 2014-03-13 | Samsung Electronics Co., Ltd. | Transparent display apparatus and object selection method using the same |
US20140125580A1 (en) * | 2012-11-02 | 2014-05-08 | Samsung Electronics Co., Ltd. | Method and device for providing information regarding an object |
US20140185872A1 (en) * | 2012-12-28 | 2014-07-03 | Hyundai Motor Company | Method and system for recognizing hand gesture using selective illumination |
WO2014153120A1 (en) * | 2013-03-14 | 2014-09-25 | George Martin | Methods and apparatus for message playback |
US20140325459A1 (en) * | 2004-02-06 | 2014-10-30 | Nokia Corporation | Gesture control system |
US20150022444A1 (en) * | 2012-02-06 | 2015-01-22 | Sony Corporation | Information processing apparatus, and information processing method |
US20150062046A1 (en) * | 2013-09-03 | 2015-03-05 | Samsung Electronics Co., Ltd. | Apparatus and method of setting gesture in electronic device |
US20150092981A1 (en) * | 2013-10-01 | 2015-04-02 | Electronics And Telecommunications Research Institute | Apparatus and method for providing activity recognition based application service |
US20150193088A1 (en) * | 2013-07-15 | 2015-07-09 | Intel Corporation | Hands-free assistance |
US20150208244A1 (en) * | 2012-09-27 | 2015-07-23 | Kyocera Corporation | Terminal device |
US20150219500A1 (en) * | 2012-09-11 | 2015-08-06 | Barco N.V. | Projection system with safety detection |
US20150227198A1 (en) * | 2012-10-23 | 2015-08-13 | Tencent Technology (Shenzhen) Company Limited | Human-computer interaction method, terminal and system |
US20150234477A1 (en) * | 2013-07-12 | 2015-08-20 | Magic Leap, Inc. | Method and system for determining user input based on gesture |
US20150339860A1 (en) * | 2014-05-26 | 2015-11-26 | Kyocera Document Solutions Inc. | Article information providing apparatus that provides information of article, article information providing system,and article information provision method |
US9245193B2 (en) | 2011-08-19 | 2016-01-26 | Qualcomm Incorporated | Dynamic selection of surfaces in real world for projection of information thereon |
US20160116983A1 (en) * | 2014-10-23 | 2016-04-28 | Samsung Electronics Co., Ltd. | User input method for use in portable device using virtual input area |
US9344615B1 (en) * | 2015-01-26 | 2016-05-17 | International Business Machines Corporation | Discriminating visual recognition program for digital cameras |
CN105993038A (en) * | 2014-02-07 | 2016-10-05 | 皇家飞利浦有限公司 | Method of operating a control system and control system therefore |
US9560272B2 (en) | 2014-03-24 | 2017-01-31 | Samsung Electronics Co., Ltd. | Electronic device and method for image data processing |
US9563955B1 (en) * | 2013-05-15 | 2017-02-07 | Amazon Technologies, Inc. | Object tracking techniques |
US9612403B2 (en) | 2013-06-11 | 2017-04-04 | Magic Leap, Inc. | Planar waveguide apparatus with diffraction element(s) and system employing same |
US20170126691A1 (en) * | 2014-09-18 | 2017-05-04 | International Business Machines Corporation | Dynamic multi-user computer configuration settings |
US9671566B2 (en) | 2012-06-11 | 2017-06-06 | Magic Leap, Inc. | Planar waveguide apparatus with diffraction element(s) and system employing same |
US20170278304A1 (en) * | 2016-03-24 | 2017-09-28 | Qualcomm Incorporated | Spatial relationships for integration of visual images of physical environment into virtual reality |
US9886769B1 (en) * | 2014-12-09 | 2018-02-06 | Jamie Douglas Tremaine | Use of 3D depth map with low and high resolution 2D images for gesture recognition and object tracking systems |
US20180039479A1 (en) * | 2016-08-04 | 2018-02-08 | Adobe Systems Incorporated | Digital Content Search and Environmental Context |
US9913117B2 (en) * | 2013-06-27 | 2018-03-06 | Samsung Electronics Co., Ltd | Electronic device and method for exchanging information using the same |
US10096166B2 (en) | 2014-11-19 | 2018-10-09 | Bae Systems Plc | Apparatus and method for selectively displaying an operational environment |
US10139961B2 (en) * | 2016-08-18 | 2018-11-27 | Microsoft Technology Licensing, Llc | Touch detection using feature-vector dictionary |
US10216273B2 (en) * | 2015-02-25 | 2019-02-26 | Bae Systems Plc | Apparatus and method for effecting a control action in respect of system functions |
US20190076731A1 (en) * | 2017-09-08 | 2019-03-14 | Niantic, Inc. | Methods and Systems for Generating Detailed Datasets of an Environment via Gameplay |
US10262465B2 (en) | 2014-11-19 | 2019-04-16 | Bae Systems Plc | Interactive control station |
US10277945B2 (en) * | 2013-04-05 | 2019-04-30 | Lenovo (Singapore) Pte. Ltd. | Contextual queries for augmenting video display |
US10430559B2 (en) | 2016-10-18 | 2019-10-01 | Adobe Inc. | Digital rights management in virtual and augmented reality |
US10506221B2 (en) | 2016-08-03 | 2019-12-10 | Adobe Inc. | Field of view rendering control of digital content |
US10521967B2 (en) | 2016-09-12 | 2019-12-31 | Adobe Inc. | Digital content interaction and navigation in virtual and augmented reality |
US10825254B1 (en) * | 2019-05-30 | 2020-11-03 | International Business Machines Corporation | Augmented reality book selection-assist |
US10896219B2 (en) * | 2017-09-13 | 2021-01-19 | Fuji Xerox Co., Ltd. | Information processing apparatus, data structure of image file, and non-transitory computer readable medium |
CN112673393A (en) * | 2018-08-29 | 2021-04-16 | 天阶金融科技有限公司 | System and method for providing one or more services using augmented reality display |
US11073901B2 (en) * | 2015-07-07 | 2021-07-27 | Seiko Epson Corporation | Display device, control method for display device, and computer program |
CN113577766A (en) * | 2021-08-05 | 2021-11-02 | 百度在线网络技术(北京)有限公司 | Object processing method and device |
US11221819B2 (en) * | 2014-03-31 | 2022-01-11 | Amazon Technologies, Inc. | Extendable architecture for augmented reality system |
US20220075856A1 (en) * | 2016-05-19 | 2022-03-10 | Payfone Inc., D/B/A Prove | Identifying and authenticating users based on passive factors determined from sensor data |
US11461820B2 (en) | 2016-08-16 | 2022-10-04 | Adobe Inc. | Navigation and rewards involving physical goods and services |
US11474671B2 (en) * | 2020-01-31 | 2022-10-18 | Salesforce.Com, Inc. | Neutralizing designs of user interfaces |
US11543933B2 (en) * | 2014-05-14 | 2023-01-03 | Purdue Research Foundation | Manipulating virtual environment using non-instrumented physical object |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9456187B1 (en) | 2012-06-01 | 2016-09-27 | Amazon Technologies, Inc. | Edge-based pose detection |
US10528853B1 (en) | 2012-06-29 | 2020-01-07 | Amazon Technologies, Inc. | Shape-Based Edge Detection |
EP3080551A1 (en) * | 2013-12-12 | 2016-10-19 | Testo AG | Method for the positionally accurate projection of a mark onto an object, and projection apparatus |
US20150193915A1 (en) * | 2014-01-06 | 2015-07-09 | Nvidia Corporation | Technique for projecting an image onto a surface with a mobile device |
US9207780B2 (en) * | 2014-01-27 | 2015-12-08 | Fuji Xerox Co., Ltd. | Systems and methods for hiding and finding digital content associated with physical objects via coded lighting |
US20170223321A1 (en) * | 2014-08-01 | 2017-08-03 | Hewlett-Packard Development Company, L.P. | Projection of image onto object |
US9715865B1 (en) * | 2014-09-26 | 2017-07-25 | Amazon Technologies, Inc. | Forming a representation of an item with light |
US20180013998A1 (en) * | 2015-01-30 | 2018-01-11 | Ent. Services Development Corporation Lp | Relationship preserving projection of digital objects |
CN107211104A (en) * | 2015-02-03 | 2017-09-26 | 索尼公司 | Information processor, information processing method and program |
CN111857332A (en) * | 2015-02-12 | 2020-10-30 | 北京三星通信技术研究有限公司 | Method and device for acquiring note information |
CN106033257B (en) * | 2015-03-18 | 2019-05-31 | 联想(北京)有限公司 | A kind of control method and device |
WO2016151869A1 (en) * | 2015-03-23 | 2016-09-29 | Nec Corporation | Information processing apparatus, information processing method, and program |
CN104796678A (en) * | 2015-04-28 | 2015-07-22 | 联想(北京)有限公司 | Information processing method and electronic device |
CN108351970B (en) * | 2015-10-30 | 2023-04-04 | 联合利华知识产权控股有限公司 | Hair diameter measurement |
JP6957462B2 (en) * | 2015-10-30 | 2021-11-02 | ユニリーバー・ナームローゼ・ベンノートシヤープ | Hair curl measurement |
US9799111B2 (en) * | 2016-02-11 | 2017-10-24 | Symbol Technologies, Llc | Methods and systems for highlighting box surfaces and edges in mobile box dimensioning |
US10249084B2 (en) | 2016-06-10 | 2019-04-02 | Microsoft Technology Licensing, Llc | Tap event location with a selection apparatus |
US10720082B1 (en) * | 2016-09-08 | 2020-07-21 | Ctskh, Llc | Device and system to teach stem lessons using hands-on learning method |
KR102048674B1 (en) * | 2017-07-31 | 2019-11-26 | 코닉오토메이션 주식회사 | Lighting stand type multimedia device |
CN111433729A (en) | 2017-12-04 | 2020-07-17 | 惠普发展公司,有限责任合伙企业 | Peripheral display device |
DE102018203349A1 (en) * | 2018-03-07 | 2019-09-12 | BSH Hausgeräte GmbH | Interaction module |
TWI724858B (en) * | 2020-04-08 | 2021-04-11 | 國軍花蓮總醫院 | Mixed Reality Evaluation System Based on Gesture Action |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090237328A1 (en) * | 2008-03-20 | 2009-09-24 | Motorola, Inc. | Mobile virtual and augmented reality system |
US20100082629A1 (en) * | 2008-09-29 | 2010-04-01 | Yahoo! Inc. | System for associating data items with context |
US20100083373A1 (en) * | 2008-09-29 | 2010-04-01 | Scott White | Methods and apparatus for determining user authorization from motion of a gesture-based control unit |
US20100153457A1 (en) * | 2008-12-15 | 2010-06-17 | Grant Isaac W | Gestural Interface Device and Method |
US20110154266A1 (en) * | 2009-12-17 | 2011-06-23 | Microsoft Corporation | Camera navigation for presentations |
US20110213664A1 (en) * | 2010-02-28 | 2011-09-01 | Osterhout Group, Inc. | Local advertising content on an interactive head-mounted eyepiece |
US20120162254A1 (en) * | 2010-12-22 | 2012-06-28 | Anderson Glen J | Object mapping techniques for mobile augmented reality applications |
US20120242800A1 (en) * | 2011-03-23 | 2012-09-27 | Ionescu Dan | Apparatus and system for interfacing with computers and other electronic devices through gestures by using depth sensing and methods of use |
US20130004016A1 (en) * | 2011-06-29 | 2013-01-03 | Karakotsios Kenneth M | User identification by gesture recognition |
US20130050069A1 (en) * | 2011-08-23 | 2013-02-28 | Sony Corporation, A Japanese Corporation | Method and system for use in providing three dimensional user interface |
US20130050258A1 (en) * | 2011-08-25 | 2013-02-28 | James Chia-Ming Liu | Portals: Registered Objects As Virtualized, Personalized Displays |
US20130174213A1 (en) * | 2011-08-23 | 2013-07-04 | James Liu | Implicit sharing and privacy control through physical behaviors using sensor-rich devices |
US20130207962A1 (en) * | 2012-02-10 | 2013-08-15 | Float Hybrid Entertainment Inc. | User interactive kiosk with three-dimensional display |
US20130272574A1 (en) * | 2000-11-06 | 2013-10-17 | Nant Holdings Ip, Llc | Interactivity Via Mobile Image Recognition |
US20130285894A1 (en) * | 2012-04-27 | 2013-10-31 | Stefan J. Marti | Processing image input to communicate a command to a remote display device |
US20130311329A1 (en) * | 2012-03-29 | 2013-11-21 | Digimarc Corporation | Image-related methods and arrangements |
US8646000B2 (en) * | 2009-12-04 | 2014-02-04 | Lg Electronics Inc. | Augmented remote controller and method for operating the same |
Family Cites Families (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE3578241D1 (en) | 1985-06-19 | 1990-07-19 | Ibm | METHOD FOR IDENTIFYING THREE-DIMENSIONAL OBJECTS BY MEANS OF TWO-DIMENSIONAL IMAGES. |
JP3869897B2 (en) | 1997-01-28 | 2007-01-17 | キヤノン株式会社 | Camera control system, video receiving apparatus, control method, and storage medium |
JP2001211372A (en) | 2000-01-27 | 2001-08-03 | Nippon Telegr & Teleph Corp <Ntt> | Video projecting device |
JP4009851B2 (en) | 2002-05-20 | 2007-11-21 | セイコーエプソン株式会社 | Projection-type image display system, projector, program, information storage medium, and image projection method |
JP4591720B2 (en) | 2002-05-20 | 2010-12-01 | セイコーエプソン株式会社 | Projection-type image display system, projector, program, information storage medium, and image projection method |
US6811264B2 (en) * | 2003-03-21 | 2004-11-02 | Mitsubishi Electric Research Laboratories, Inc. | Geometrically aware projector |
US7515756B2 (en) | 2003-06-23 | 2009-04-07 | Shoestring Research, Llc. | Region segmentation and characterization systems and methods for augmented reality |
JP2005313291A (en) | 2004-04-30 | 2005-11-10 | Mitsubishi Heavy Ind Ltd | Image display method linked with robot action, and device thereof |
US20070050468A1 (en) | 2005-08-09 | 2007-03-01 | Comverse, Ltd. | Reality context menu (RCM) |
JP2007142495A (en) | 2005-11-14 | 2007-06-07 | Nippon Telegr & Teleph Corp <Ntt> | Planar projector and planar projection program |
US7905610B1 (en) | 2006-08-29 | 2011-03-15 | Nvidia Corporation | Graphics processor system and associated method for projecting an image onto a three-dimensional object |
KR100775123B1 (en) | 2006-09-15 | 2007-11-08 | 삼성전자주식회사 | Method of indexing image object and image object indexing system using the same |
TWI433052B (en) | 2007-04-02 | 2014-04-01 | Primesense Ltd | Depth mapping using projected patterns |
US8228170B2 (en) | 2008-01-10 | 2012-07-24 | International Business Machines Corporation | Using sensors to identify objects placed on a surface |
JP5258399B2 (en) | 2008-06-06 | 2013-08-07 | キヤノン株式会社 | Image projection apparatus and control method thereof |
US7954953B2 (en) * | 2008-07-30 | 2011-06-07 | Microvision, Inc. | Scanned beam overlay projection |
US8385971B2 (en) | 2008-08-19 | 2013-02-26 | Digimarc Corporation | Methods and systems for content processing |
JP2010072025A (en) | 2008-09-16 | 2010-04-02 | Nikon Corp | Electronic device with projector |
US9569001B2 (en) * | 2009-02-03 | 2017-02-14 | Massachusetts Institute Of Technology | Wearable gestural interface |
KR20110071349A (en) * | 2009-12-21 | 2011-06-29 | 삼성전자주식회사 | Method and apparatus for controlling external output of a portable terminal |
US8549418B2 (en) * | 2009-12-23 | 2013-10-01 | Intel Corporation | Projected display to enhance computer device use |
US8631355B2 (en) * | 2010-01-08 | 2014-01-14 | Microsoft Corporation | Assigning gesture dictionaries |
CN101907459B (en) * | 2010-07-12 | 2012-01-04 | 清华大学 | Monocular video based real-time posture estimation and distance measurement method for three-dimensional rigid body object |
CN102959616B (en) | 2010-07-20 | 2015-06-10 | 苹果公司 | Interactive reality augmentation for natural interaction |
WO2012020410A2 (en) * | 2010-08-10 | 2012-02-16 | Pointgrab Ltd. | System and method for user interaction with projected content |
US9560314B2 (en) * | 2011-06-14 | 2017-01-31 | Microsoft Technology Licensing, Llc | Interactive and shared surfaces |
US20130044912A1 (en) | 2011-08-19 | 2013-02-21 | Qualcomm Incorporated | Use of association of an object detected in an image to obtain information to display to a user |
-
2012
- 2012-07-13 US US13/549,339 patent/US20130044912A1/en not_active Abandoned
- 2012-07-13 US US13/549,388 patent/US9245193B2/en active Active
- 2012-07-14 EP EP12741178.3A patent/EP2745237B1/en active Active
- 2012-07-14 WO PCT/US2012/046816 patent/WO2013028279A1/en active Application Filing
- 2012-07-14 KR KR1020147007213A patent/KR101575016B1/en not_active IP Right Cessation
- 2012-07-14 WO PCT/US2012/046817 patent/WO2013028280A2/en active Application Filing
- 2012-07-14 JP JP2014526005A patent/JP2014531787A/en active Pending
- 2012-07-14 CN CN201280048118.0A patent/CN103875004B/en active Active
-
2014
- 2014-02-19 IN IN316MUN2014 patent/IN2014MN00316A/en unknown
-
2016
- 2016-10-20 JP JP2016205796A patent/JP6273334B2/en not_active Expired - Fee Related
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130272574A1 (en) * | 2000-11-06 | 2013-10-17 | Nant Holdings Ip, Llc | Interactivity Via Mobile Image Recognition |
US20090237328A1 (en) * | 2008-03-20 | 2009-09-24 | Motorola, Inc. | Mobile virtual and augmented reality system |
US20100082629A1 (en) * | 2008-09-29 | 2010-04-01 | Yahoo! Inc. | System for associating data items with context |
US20100083373A1 (en) * | 2008-09-29 | 2010-04-01 | Scott White | Methods and apparatus for determining user authorization from motion of a gesture-based control unit |
US20100153457A1 (en) * | 2008-12-15 | 2010-06-17 | Grant Isaac W | Gestural Interface Device and Method |
US8646000B2 (en) * | 2009-12-04 | 2014-02-04 | Lg Electronics Inc. | Augmented remote controller and method for operating the same |
US20110154266A1 (en) * | 2009-12-17 | 2011-06-23 | Microsoft Corporation | Camera navigation for presentations |
US20110213664A1 (en) * | 2010-02-28 | 2011-09-01 | Osterhout Group, Inc. | Local advertising content on an interactive head-mounted eyepiece |
US20120162254A1 (en) * | 2010-12-22 | 2012-06-28 | Anderson Glen J | Object mapping techniques for mobile augmented reality applications |
US20120242800A1 (en) * | 2011-03-23 | 2012-09-27 | Ionescu Dan | Apparatus and system for interfacing with computers and other electronic devices through gestures by using depth sensing and methods of use |
US20130004016A1 (en) * | 2011-06-29 | 2013-01-03 | Karakotsios Kenneth M | User identification by gesture recognition |
US20130174213A1 (en) * | 2011-08-23 | 2013-07-04 | James Liu | Implicit sharing and privacy control through physical behaviors using sensor-rich devices |
US20130050069A1 (en) * | 2011-08-23 | 2013-02-28 | Sony Corporation, A Japanese Corporation | Method and system for use in providing three dimensional user interface |
US20130050258A1 (en) * | 2011-08-25 | 2013-02-28 | James Chia-Ming Liu | Portals: Registered Objects As Virtualized, Personalized Displays |
US20130207962A1 (en) * | 2012-02-10 | 2013-08-15 | Float Hybrid Entertainment Inc. | User interactive kiosk with three-dimensional display |
US20130311329A1 (en) * | 2012-03-29 | 2013-11-21 | Digimarc Corporation | Image-related methods and arrangements |
US20130285894A1 (en) * | 2012-04-27 | 2013-10-31 | Stefan J. Marti | Processing image input to communicate a command to a remote display device |
Cited By (93)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140325459A1 (en) * | 2004-02-06 | 2014-10-30 | Nokia Corporation | Gesture control system |
US20120051625A1 (en) * | 2010-08-23 | 2012-03-01 | Texas Instruments Incorporated | Method and Apparatus for 2D to 3D Conversion Using Scene Classification and Face Detection |
US8718356B2 (en) * | 2010-08-23 | 2014-05-06 | Texas Instruments Incorporated | Method and apparatus for 2D to 3D conversion using scene classification and face detection |
US9245193B2 (en) | 2011-08-19 | 2016-01-26 | Qualcomm Incorporated | Dynamic selection of surfaces in real world for projection of information thereon |
US20150022444A1 (en) * | 2012-02-06 | 2015-01-22 | Sony Corporation | Information processing apparatus, and information processing method |
US10401948B2 (en) * | 2012-02-06 | 2019-09-03 | Sony Corporation | Information processing apparatus, and information processing method to operate on virtual object using real object |
US9671566B2 (en) | 2012-06-11 | 2017-06-06 | Magic Leap, Inc. | Planar waveguide apparatus with diffraction element(s) and system employing same |
US20140075349A1 (en) * | 2012-09-10 | 2014-03-13 | Samsung Electronics Co., Ltd. | Transparent display apparatus and object selection method using the same |
US9965137B2 (en) * | 2012-09-10 | 2018-05-08 | Samsung Electronics Co., Ltd. | Transparent display apparatus and object selection method using the same |
US20150219500A1 (en) * | 2012-09-11 | 2015-08-06 | Barco N.V. | Projection system with safety detection |
US9677945B2 (en) * | 2012-09-11 | 2017-06-13 | Barco N.V. | Projection system with safety detection |
US9801068B2 (en) * | 2012-09-27 | 2017-10-24 | Kyocera Corporation | Terminal device |
US20150208244A1 (en) * | 2012-09-27 | 2015-07-23 | Kyocera Corporation | Terminal device |
US20150227198A1 (en) * | 2012-10-23 | 2015-08-13 | Tencent Technology (Shenzhen) Company Limited | Human-computer interaction method, terminal and system |
US9836128B2 (en) * | 2012-11-02 | 2017-12-05 | Samsung Electronics Co., Ltd. | Method and device for providing information regarding an object |
US20140125580A1 (en) * | 2012-11-02 | 2014-05-08 | Samsung Electronics Co., Ltd. | Method and device for providing information regarding an object |
US20140185872A1 (en) * | 2012-12-28 | 2014-07-03 | Hyundai Motor Company | Method and system for recognizing hand gesture using selective illumination |
US9373026B2 (en) * | 2012-12-28 | 2016-06-21 | Hyundai Motor Company | Method and system for recognizing hand gesture using selective illumination |
US10540863B2 (en) | 2013-03-14 | 2020-01-21 | Martigold Enterprises, Llc | Delayed message playback methods and apparatus |
US11568717B2 (en) | 2013-03-14 | 2023-01-31 | Martigold Enterprises, Llc | Varied apparatus for message playback |
WO2014153120A1 (en) * | 2013-03-14 | 2014-09-25 | George Martin | Methods and apparatus for message playback |
US9626843B2 (en) | 2013-03-14 | 2017-04-18 | Martigold Enterprises, Llc | Varied message playback methods and apparatus |
US10277945B2 (en) * | 2013-04-05 | 2019-04-30 | Lenovo (Singapore) Pte. Ltd. | Contextual queries for augmenting video display |
US10671846B1 (en) | 2013-05-15 | 2020-06-02 | Amazon Technologies, Inc. | Object recognition techniques |
US9563955B1 (en) * | 2013-05-15 | 2017-02-07 | Amazon Technologies, Inc. | Object tracking techniques |
US11412108B1 (en) | 2013-05-15 | 2022-08-09 | Amazon Technologies, Inc. | Object recognition techniques |
US9612403B2 (en) | 2013-06-11 | 2017-04-04 | Magic Leap, Inc. | Planar waveguide apparatus with diffraction element(s) and system employing same |
US9913117B2 (en) * | 2013-06-27 | 2018-03-06 | Samsung Electronics Co., Ltd | Electronic device and method for exchanging information using the same |
US10591286B2 (en) | 2013-07-12 | 2020-03-17 | Magic Leap, Inc. | Method and system for generating virtual rooms |
US10288419B2 (en) | 2013-07-12 | 2019-05-14 | Magic Leap, Inc. | Method and system for generating a virtual user interface related to a totem |
US10866093B2 (en) | 2013-07-12 | 2020-12-15 | Magic Leap, Inc. | Method and system for retrieving data in response to user input |
US9651368B2 (en) | 2013-07-12 | 2017-05-16 | Magic Leap, Inc. | Planar waveguide apparatus configured to return light therethrough |
US11221213B2 (en) | 2013-07-12 | 2022-01-11 | Magic Leap, Inc. | Method and system for generating a retail experience using an augmented reality system |
US10295338B2 (en) | 2013-07-12 | 2019-05-21 | Magic Leap, Inc. | Method and system for generating map data from an image |
US10767986B2 (en) | 2013-07-12 | 2020-09-08 | Magic Leap, Inc. | Method and system for interacting with user interfaces |
US11656677B2 (en) | 2013-07-12 | 2023-05-23 | Magic Leap, Inc. | Planar waveguide apparatus with diffraction element(s) and system employing same |
US10641603B2 (en) | 2013-07-12 | 2020-05-05 | Magic Leap, Inc. | Method and system for updating a virtual world |
US10352693B2 (en) | 2013-07-12 | 2019-07-16 | Magic Leap, Inc. | Method and system for obtaining texture data of a space |
US20150234477A1 (en) * | 2013-07-12 | 2015-08-20 | Magic Leap, Inc. | Method and system for determining user input based on gesture |
US9857170B2 (en) | 2013-07-12 | 2018-01-02 | Magic Leap, Inc. | Planar waveguide apparatus having a plurality of diffractive optical elements |
US10408613B2 (en) | 2013-07-12 | 2019-09-10 | Magic Leap, Inc. | Method and system for rendering virtual content |
US11029147B2 (en) | 2013-07-12 | 2021-06-08 | Magic Leap, Inc. | Method and system for facilitating surgery using an augmented reality system |
US10473459B2 (en) | 2013-07-12 | 2019-11-12 | Magic Leap, Inc. | Method and system for determining user input based on totem |
US9952042B2 (en) | 2013-07-12 | 2018-04-24 | Magic Leap, Inc. | Method and system for identifying a user location |
US10228242B2 (en) * | 2013-07-12 | 2019-03-12 | Magic Leap, Inc. | Method and system for determining user input based on gesture |
US10571263B2 (en) | 2013-07-12 | 2020-02-25 | Magic Leap, Inc. | User and object interaction with an augmented reality scenario |
US11060858B2 (en) | 2013-07-12 | 2021-07-13 | Magic Leap, Inc. | Method and system for generating a virtual user interface related to a totem |
US10533850B2 (en) | 2013-07-12 | 2020-01-14 | Magic Leap, Inc. | Method and system for inserting recognized object data into a virtual world |
US10495453B2 (en) | 2013-07-12 | 2019-12-03 | Magic Leap, Inc. | Augmented reality system totems and methods of using same |
US20150193088A1 (en) * | 2013-07-15 | 2015-07-09 | Intel Corporation | Hands-free assistance |
CN105308535A (en) * | 2013-07-15 | 2016-02-03 | 英特尔公司 | Hands-free assistance |
US20150062046A1 (en) * | 2013-09-03 | 2015-03-05 | Samsung Electronics Co., Ltd. | Apparatus and method of setting gesture in electronic device |
KR20150039252A (en) * | 2013-10-01 | 2015-04-10 | 한국전자통신연구원 | Apparatus and method for providing application service by using action recognition |
US20150092981A1 (en) * | 2013-10-01 | 2015-04-02 | Electronics And Telecommunications Research Institute | Apparatus and method for providing activity recognition based application service |
US9183431B2 (en) * | 2013-10-01 | 2015-11-10 | Electronics And Telecommunications Research Institute | Apparatus and method for providing activity recognition based application service |
KR102106135B1 (en) | 2013-10-01 | 2020-05-04 | 한국전자통신연구원 | Apparatus and method for providing application service by using action recognition |
CN105993038A (en) * | 2014-02-07 | 2016-10-05 | 皇家飞利浦有限公司 | Method of operating a control system and control system therefore |
US20170177073A1 (en) * | 2014-02-07 | 2017-06-22 | Koninklijke Philips N.V. | Method of operating a control system and control system therefore |
US10191536B2 (en) * | 2014-02-07 | 2019-01-29 | Koninklijke Philips N.V. | Method of operating a control system and control system therefore |
US9560272B2 (en) | 2014-03-24 | 2017-01-31 | Samsung Electronics Co., Ltd. | Electronic device and method for image data processing |
US11221819B2 (en) * | 2014-03-31 | 2022-01-11 | Amazon Technologies, Inc. | Extendable architecture for augmented reality system |
US11543933B2 (en) * | 2014-05-14 | 2023-01-03 | Purdue Research Foundation | Manipulating virtual environment using non-instrumented physical object |
US9626804B2 (en) * | 2014-05-26 | 2017-04-18 | Kyocera Document Solutions Inc. | Article information providing apparatus that provides information of article, article information providing system,and article information provision method |
US20150339860A1 (en) * | 2014-05-26 | 2015-11-26 | Kyocera Document Solutions Inc. | Article information providing apparatus that provides information of article, article information providing system,and article information provision method |
US10320799B2 (en) * | 2014-09-18 | 2019-06-11 | International Business Machines Corporation | Dynamic multi-user computer configuration settings |
US20170126691A1 (en) * | 2014-09-18 | 2017-05-04 | International Business Machines Corporation | Dynamic multi-user computer configuration settings |
US20160116983A1 (en) * | 2014-10-23 | 2016-04-28 | Samsung Electronics Co., Ltd. | User input method for use in portable device using virtual input area |
US9727131B2 (en) * | 2014-10-23 | 2017-08-08 | Samsung Electronics Co., Ltd. | User input method for use in portable device using virtual input area |
US10262465B2 (en) | 2014-11-19 | 2019-04-16 | Bae Systems Plc | Interactive control station |
US10096166B2 (en) | 2014-11-19 | 2018-10-09 | Bae Systems Plc | Apparatus and method for selectively displaying an operational environment |
US9886769B1 (en) * | 2014-12-09 | 2018-02-06 | Jamie Douglas Tremaine | Use of 3D depth map with low and high resolution 2D images for gesture recognition and object tracking systems |
US9497376B2 (en) * | 2015-01-26 | 2016-11-15 | International Business Machines Corporation | Discriminating visual recognition program for digital cameras |
US9344615B1 (en) * | 2015-01-26 | 2016-05-17 | International Business Machines Corporation | Discriminating visual recognition program for digital cameras |
US10216273B2 (en) * | 2015-02-25 | 2019-02-26 | Bae Systems Plc | Apparatus and method for effecting a control action in respect of system functions |
US11073901B2 (en) * | 2015-07-07 | 2021-07-27 | Seiko Epson Corporation | Display device, control method for display device, and computer program |
US11301034B2 (en) | 2015-07-07 | 2022-04-12 | Seiko Epson Corporation | Display device, control method for display device, and computer program |
US10665019B2 (en) * | 2016-03-24 | 2020-05-26 | Qualcomm Incorporated | Spatial relationships for integration of visual images of physical environment into virtual reality |
US20170278304A1 (en) * | 2016-03-24 | 2017-09-28 | Qualcomm Incorporated | Spatial relationships for integration of visual images of physical environment into virtual reality |
US20220075856A1 (en) * | 2016-05-19 | 2022-03-10 | Payfone Inc., D/B/A Prove | Identifying and authenticating users based on passive factors determined from sensor data |
US10506221B2 (en) | 2016-08-03 | 2019-12-10 | Adobe Inc. | Field of view rendering control of digital content |
US20180039479A1 (en) * | 2016-08-04 | 2018-02-08 | Adobe Systems Incorporated | Digital Content Search and Environmental Context |
US11461820B2 (en) | 2016-08-16 | 2022-10-04 | Adobe Inc. | Navigation and rewards involving physical goods and services |
US10139961B2 (en) * | 2016-08-18 | 2018-11-27 | Microsoft Technology Licensing, Llc | Touch detection using feature-vector dictionary |
US10521967B2 (en) | 2016-09-12 | 2019-12-31 | Adobe Inc. | Digital content interaction and navigation in virtual and augmented reality |
US10430559B2 (en) | 2016-10-18 | 2019-10-01 | Adobe Inc. | Digital rights management in virtual and augmented reality |
US11110343B2 (en) | 2017-09-08 | 2021-09-07 | Niantic, Inc. | Methods and systems for generating detailed datasets of an environment via gameplay |
US10300373B2 (en) * | 2017-09-08 | 2019-05-28 | Niantic, Inc. | Methods and systems for generating detailed datasets of an environment via gameplay |
US20190076731A1 (en) * | 2017-09-08 | 2019-03-14 | Niantic, Inc. | Methods and Systems for Generating Detailed Datasets of an Environment via Gameplay |
US10896219B2 (en) * | 2017-09-13 | 2021-01-19 | Fuji Xerox Co., Ltd. | Information processing apparatus, data structure of image file, and non-transitory computer readable medium |
CN112673393A (en) * | 2018-08-29 | 2021-04-16 | 天阶金融科技有限公司 | System and method for providing one or more services using augmented reality display |
US10825254B1 (en) * | 2019-05-30 | 2020-11-03 | International Business Machines Corporation | Augmented reality book selection-assist |
US11474671B2 (en) * | 2020-01-31 | 2022-10-18 | Salesforce.Com, Inc. | Neutralizing designs of user interfaces |
CN113577766A (en) * | 2021-08-05 | 2021-11-02 | 百度在线网络技术(北京)有限公司 | Object processing method and device |
Also Published As
Publication number | Publication date |
---|---|
JP6273334B2 (en) | 2018-01-31 |
JP2014531787A (en) | 2014-11-27 |
WO2013028280A2 (en) | 2013-02-28 |
US9245193B2 (en) | 2016-01-26 |
WO2013028279A1 (en) | 2013-02-28 |
CN103875004A (en) | 2014-06-18 |
CN103875004B (en) | 2017-12-08 |
KR101575016B1 (en) | 2015-12-07 |
EP2745237B1 (en) | 2022-09-07 |
US20130044193A1 (en) | 2013-02-21 |
WO2013028280A3 (en) | 2013-04-18 |
JP2017038397A (en) | 2017-02-16 |
EP2745237A2 (en) | 2014-06-25 |
IN2014MN00316A (en) | 2015-09-11 |
KR20140047733A (en) | 2014-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20130044912A1 (en) | Use of association of an object detected in an image to obtain information to display to a user | |
US20220334646A1 (en) | Systems and methods for extensions to alternative control of touch-based devices | |
US10761612B2 (en) | Gesture recognition techniques | |
US10936080B2 (en) | Systems and methods of creating a realistic displacement of a virtual object in virtual reality/augmented reality environments | |
US10120454B2 (en) | Gesture recognition control device | |
US9160993B1 (en) | Using projection for visual recognition | |
US9658695B2 (en) | Systems and methods for alternative control of touch-based devices | |
US8549418B2 (en) | Projected display to enhance computer device use | |
US20180292907A1 (en) | Gesture control system and method for smart home | |
US20180218545A1 (en) | Virtual content scaling with a hardware controller | |
US20170255450A1 (en) | Spatial cooperative programming language | |
US20130010207A1 (en) | Gesture based interactive control of electronic equipment | |
US8938124B2 (en) | Computer vision based tracking of a hand | |
US20200142495A1 (en) | Gesture recognition control device | |
US11709593B2 (en) | Electronic apparatus for providing a virtual keyboard and controlling method thereof | |
EP2702464B1 (en) | Laser diode modes | |
KR20210033394A (en) | Electronic apparatus and controlling method thereof | |
US11054941B2 (en) | Information processing system, information processing method, and program for correcting operation direction and operation amount | |
Bhowmik | Natural and intuitive user interfaces with perceptual computing technologies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KULKARNI, TEJAS DATTATRAYA;LIU, BOCONG;NANDWANI, ANKUR B;AND OTHERS;SIGNING DATES FROM 20120801 TO 20120807;REEL/FRAME:028761/0604 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |