US20170068326A1 - Imaging surround system for touch-free display control - Google Patents

Imaging surround system for touch-free display control Download PDF

Info

Publication number
US20170068326A1
US20170068326A1 US15/348,663 US201615348663A US2017068326A1 US 20170068326 A1 US20170068326 A1 US 20170068326A1 US 201615348663 A US201615348663 A US 201615348663A US 2017068326 A1 US2017068326 A1 US 2017068326A1
Authority
US
United States
Prior art keywords
camera
cameras
processor
computer
display
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/348,663
Inventor
Morteza Gharib
David Jeon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
California Institute of Technology CalTech
Original Assignee
California Institute of Technology CalTech
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by California Institute of Technology CalTech filed Critical California Institute of Technology CalTech
Priority to US15/348,663 priority Critical patent/US20170068326A1/en
Publication of US20170068326A1 publication Critical patent/US20170068326A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/002Specific input/output arrangements not covered by G06F3/01 - G06F3/16
    • G06F3/005Input arrangements through a video camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/0202Constructional details or processes of manufacture of the input device
    • G06F3/021Arrangements integrating additional peripherals in a keyboard, e.g. card or barcode reader, optical scanner
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/0227Cooperation and interconnection of the input arrangement with other functional units of a computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • G06K9/00335
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the PLAYSTATION MOVE is reported as a motion-sensing game controller platform for the PlayStation 3 (PS3). Based on the popular game play style of Nintendo's Wii console, the PlayStation Move uses a camera to track the position of a lighted wand with inertial sensors in the wand to detect its motion.
  • Another wand/object tracking system for video game control is disclosed in U.S. Pat. Nos. 7,843,429 and 8,068,095.
  • the cameras are fixed in a position known relative to one another. Over a limited angle range of angles, features (such as by SIFT programming) are extracted from each scene captured from first and second cameras.
  • the feature data is combined with calibration data to extract 3D coordinates from the features and then to coordinate user interface/control based on detected motion or otherwise.
  • U.S. Pat. No. 8,111,904 describes methods and apparatus for determining the pose, e.g., position along x-, y- and z-axes, pitch, roll and yaw (or one or more characteristics of that pose) of an object in three dimensions by triangulation of data obtained from multiple images of the object.
  • a method for 3D machine vision during a calibration step, multiple cameras disposed to acquire images of the object from different respective viewpoints are calibrated to discern a mapping function that identifies rays in 3D space emanating from each respective camera's lens that correspond to pixel locations in that camera's field of view.
  • functionality associated with the cameras is trained to recognize expected patterns in images to be acquired of the object.
  • a runtime step triangulates locations in 3D space of one or more of those patterns from pixel-wise positions of those patterns in images of the object and from the mappings discerned during calibration step.
  • each of the aforementioned imaging systems is limited in some fashion.
  • the PLAYSTATION MOVE requires a wand and the KINECT system offers limited resolution (as further defined herein) and depth of field.
  • All of the stereo imaging approaches further require feature matching and depth extraction by triangulation or other known effect—all leading to computational intensity.
  • the optical axis of the imaging systems e.g. dual cameras
  • structured-light projection system must be fixed with respect to each other with known calibration parameters. Any deviation from these fixed angles would result in poor depth construction and thus gesture recognition.
  • the subject system hardware and methodology can combine disparate cameras into a cohesive gesture recognition environment.
  • the objective of the gesture recognition process is to render or result in an intended control function, it can be achieved by positioning two or more cameras with non-coaxial axes to detect and lock onto an object image regardless of its depth coordinate.
  • Each camera “sees” one 2D view of the gesture, and the multitude of 2D gestures are combined to infer the 3D gesture.
  • the 3D object does not need to be measured to reconstruct the 3D gesture. For example, if one camera sees an object expanding and the second camera sees it moving towards the first camera, it can be inferred that the object is moving towards the first camera without knowing anything about the 3D shape of the object.
  • this disclosure presents systems, methodology and a set of tools for dynamic object and gesture recognition for applications such as:
  • the subject motion detection scheme offers equal (in the idealized case and —otherwise—substantially equalized) sensitivity in Z, X and Y motion detection once the same object is identified in both cameras and the shape and/or motion is only analyzed in the XY planes of the respected Z.
  • the z-motion detection for the first camera is provided by the XY-motion of the object viewed by the second camera and vice versa.
  • pure differentiated plan views (e.g., XY, YZ, XZ) occur when the cameras are at 90 degrees to one another.
  • different embodiments herein work with the cameras set at various angles, so the planes referenced above and elsewhere herein may instead be “hybridized” (i.e., include components of other views.)
  • a “surround” gesture recognition system as taught herein is provided with various features.
  • a first/primary camera may be located at the display plane with its Z axis parallel with a Z axis of the display (or—stated otherwise—with the camera Z axis generally perpendicular to the plane of the display).
  • the camera can be placed anywhere as long as it “sees” the object of interest.
  • a second/secondary camera is provided with a non-parallel axis with respect to the display (or the first/primary camera if it is otherwise situated). The angle between these axes would be optimally at 90 degrees.
  • the selected software may be triggered by a pre-set motion of the gesture tool.
  • the motion can be a rapid movement of the hand, fingers or any hand held object.
  • the predefined motion can be identified by a matching process (e.g. cross correlation scheme) conducted on sets of time sequenced images from the same or different cameras.
  • a look-up table of gestures can be used for efficient identification. Independent 2D gestures captured by each camera can be identified by such matching.
  • the multiple gestures may then be combined to determine the equivalent 3D gesture, without needing to determine features of/for the 3D object. More broadly, such an approach can be employed simply to calculated/determine a 3D path or trajectory of the tool. Such a determination may track like a mouse pointer for user input/interface control or may be otherwise employed.
  • the angle between the cameras need not be as great as 30 degrees because feature recognition is not required (see discussion of Lowe, below, where such activity is problematic—at best—with camera angles of less than 30 degrees). And while systems operating at over 30 degree angle between the cameras offer various advantages, meaningful/useful embodiments may be practiced with angular ranges between the primary and secondary cameras of about 10 to about 30 degrees. In these, no feature recognition is employed. Rather, a process as described above (or another) may be used.
  • the subject inventions include software adapted to manipulate data structures to embody the methodologies described. These methodologies necessarily involve computational intensity beyond human capacity. They must be carried out in connection with a processor (i.e., standard or custom/dedicated) in connection with electronic (i.e., magnetic, spintronic, etc.) memory. As such, the necessary hardware includes computer electronics.
  • the hardware further includes an optical system with one or more sensors as described above provided in an arrangement suitable for defocusing imaging.
  • the system hardware typically involves physically separate camera units.
  • the decision to separate the components is not a mere matter of design choice.
  • the pieces are separate, rather than integral, in order to satisfy the function of setup to provide adequate view axis differentiation for functioning, and, optionally, to do so without encumbering the working environment with gross support structure.
  • the cameras may be structurally connected as by a flex-hose or boom for an overhead camera interface, or the camera(s) may be integrated into kiosk panel(s) facing in different directions.
  • one camera may be embedded at or near the track pad of a laptop keyboard, and another in a video screen bezel.
  • one or more cameras are incorporated in audio surround-sound speakers.
  • One camera may be located on a viewing centerline as part of a display, and another camera with one side-channel surround speaker.
  • cameras can be located in each of two front channel audio speakers. Other options exist as well.
  • the difference in viewpoint between the cameras may be characterized by distance between the cameras as by their angle.
  • Two widely spaced cameras incorporated in speakers setup for stereo audio imaging e.g., 2 m each apart from a center
  • a large planar separation can be thus observed.
  • the embodiments herein are practiced with at least one camera centered on an axis of where image object movement and/or position will be concentrated.
  • the second camera can also be positioned with an orthogonal or perpendicular view. If not setup as such, the system may include capacity to account for deviation from perpendicular viewing angles.
  • the cameras comprise hardware incorporated in a pair (or trio) of smart phones. These may be setup by a user (perhaps employing custom stands/braces) each bringing one to a friendly game. With the addition of a monitor, a game of virtual table-tennis, dodge-ball, dance or anything else is possible. The game may be two-player as described or a single-player challenge, etc. In any case, an aspect of the invention involves camera repurposing as described.
  • the system may comprise a combination of one, two or more smart phones, laptop computers or tablet computers with cameras or webcams (arranged with non-coaxial optical axes) and wired or wireless image or data transmission capabilities (e.g., as in Bluetooth or WiFi, etc.), and a display system capable of receiving signals from the said computer or tablet computer with camera systems or a computer (laptop) with display system and software to receive images from paired camera(s).
  • the first (or primary) camera can be that of laptop computer or a designated camera embedded on/with a computer and/or television display.
  • the first camera can be a web cam attached to a display and having the same optical axis as that of the display.
  • the second (or secondary) camera can be any camera or camera-equipped device such as a webcam or smart cell phone (e.g., I PHONE or ANDROID) with wireless or wired connection to the computer or television display including a processor.
  • a device with front and back cameras can have one repurposed for in the subject approach, with the optics of one or the other re-routed to provide the second channel.
  • the given hardware setup operates according to principles as further described below.
  • the user can either use a gesture to perform a function, or actuate the interface in specific locations. For example, the user could wave his hand to scroll through icons, then select the desired icon by pointing at it directly.
  • three levels of calibration may be employed. Per above, if only relative control is needed, calibration is not necessary. The cameras only need to be pointed such they have a common area of view where the gestures will be performed. If the coordinate system of the primary camera is known (e.g., because the camera is fixed/integrated within a dedicated display system, then only the secondary camera needs to be calibrated to offer absolute 3D gesture control. Thus, moving a known target to different points in space is enough to establish the coordinate system of the secondary camera. The apparent position of those points in both cameras can be used to determine the unique view angle of the secondary camera. If all of the cameras are unknown, then the coordinate systems of both cameras are “locked” to the display in order to deliver absolute 3D gesture control. To do this, one moves a known target towards specified targets on the display. The vector of the target motion provides an absolute reference to determine the view angle. This allows the system to calculate the relative coordinate systems between the cameras, each other, and the display.
  • inventive embodiments are provided herein. These may be individually claimed or claimed in combination. In so doing, it is specifically contemplated that they may be distinguished from the noted BACKGROUND material by way of any form of “negative” limitation, including claim-specific disclaimer of the noted technology as falling outside the subject claim scope.
  • FIG. 1 illustrates a prior art process for a typical stereo imaging approach.
  • the cameras need to be fixed relative to one another, and to not have too large of an angle between cameras that will be used to extract 3D data.
  • Limitation on angle depends on a number of factors including shadowing or obscuring of features. In any case, depending on the subject to be imaged, angles greater than 30 degrees can become quite problematic. Largely, the limitation derives from steps 102 and 104 in which a first camera is used at 102 to extract features of an object (e.g., SIFT, SURF or other programming methodology) and a second camera is used to extract the same feature.). See Lowe, “Object Recognition from Local Scale - Invariant Features”, Proc.
  • FIG. 2 is a flowchart illustrating operation of the subject hardware according to a method 200 . It assumes two cameras, with Camera 1 facing toward the user and Camera 2 from another angle—above/below or to the side. In contrast to the above, the features captured from Camera 1 and Camera 2 are not necessarily the same. For hand-gesture based control, the intent is to identify hand-like areas, but the recognized points need not be identical—merely related.
  • Camera 1 obtains x-y position of relevant x-y features at 202 .
  • Such features may be deemed “relevant” in that they are characterized as hand gestures, for example, in accordance with one or more of the Examples presented below.
  • Camera 2 obtains z position of different but related features or gestures. Per above, such features or gestures may be deemed “related” in that they are also associated with hand position or motion.
  • a processor manipulating the captured camera data from 204 and 206 uses shared information to help identify which features are useful.
  • One way to use calibration is to take advantage of the fact that the 2D images will (in the sense intended above) “share” a dimension. For example, a camera facing a person and one facing down from the ceiling would share the dimension running horizontal with respect to the person. Thus, once the front facing camera has captured data in a region of interest, relevant points from the other camera could be restricted to a thin horizontal band or slice of the imaged area.
  • the processor uses x-y features for x-y dimensional control, such as captured hand motion to emulate the movement of a mouse on a display screen.
  • the processor uses z features or gestures for z-dimensional control, such as captured in-out motion of a user's hand to emulate the clicking of a mouse.
  • the process may integrate x-y and z control information for desired interface control and/or display.
  • FIGS. 3A and 3B Suitable hardware for large (or small) display control, associated gaming, etc. is shown in FIGS. 3A and 3B .
  • a display 300 is shown, incorporating a camera 302 with a field of view over a range a.
  • An optical axis of the camera is indicated by axis Z 1 .
  • a second camera is depicted as a wireless web cam 310 . It offers a field of view over a range ⁇ and an optical axis Z 2 .
  • Axes Z 1 and Z 2 may be set perpendicular or at another angle. They may be at the same level or however else convenient. The robust nature of the subject system allows for significant angular variation. This variation may be a result of either inaccurate or fairly arbitrary user setup.
  • smart phone camera could communicate with the display and any associated electronics through built-in WiFi or Bluetooth communication. It may do so in a role where it merely forwards camera image data that is processed in conjunction with hardware associated with (integrally or connected as a peripheral—not shown) the display.
  • the processor on-board the smart phone may be used for processing steps (i.e., any of acts 206 , 208 , 210 and/or 210 ) in the method above or otherwise.
  • FIG. 4 illustrates another set of hardware options in connection with a laptop computer 400 .
  • the computer could be used for such processing.
  • optional smart phone 410 can share or handle such processing.
  • a second (or third) camera 412 may be provided overhead in association with an adjustable boom 414 .
  • Such a boom by be installed on a desk, and even integrate lamp components to disguise its presence and/or function with utility secondary to defining optical axis Z 2 ′′ as employed in the subject method(s).
  • Yet another option is to integrate an upward facing camera 420 in the hand rest 422 area of the computer. All said, a number of options exist.
  • FIGS. 5A and 5B illustrate still other hardware options.
  • a first camera 500 is provided in connection with a housing 502 set atop a non-custom display 504 .
  • a second camera 510 is integrated in a side-positioned surround-sound speaker 512 (the speaker examples illustrated as in-wall units).
  • Housing 502 advantageously includes the requisite processor and other electronic components.
  • Speaker units 512 may be wireless (for each of the camera and music-center signals) or may be connected by wiring.
  • FIGS. 5A and 5B illustrate options in this regard.
  • FIG. 5A depicts determination of a coordinate system for absolute gesture recognition for the secondary camera 510 where a coordinate system of the primary camera 500 is known. Comparing the apparent position of common objects (stars—as potentially embodied by a LED-lit wand held and “clicked” to indicate next point by a user) in both cameras allows the system enables determination of the view angle for the secondary camera.
  • FIG. 5B depicts a final system implementation in which both cameras are calibrated to a coordinate system.
  • determination of a coordinate system is made where the coordinate system of neither of the cameras is known.
  • the system is able to determine the relative view angles of both cameras together (as indicated by inset image 520 illustrating angle ⁇ ).
  • Camera 1 faced the user with its captured image data processed using SURF features along with Kalman Filters and processing of group flows of points to detect regions of interest.
  • the motion of keypoints in the regions of interest e.g., a user's hand
  • the motion of keypoints in the regions of interest where then transformed into the x-y motion of the mouse on the computer screen.
  • Camera 2 faced down towards a desk with its captured image data processed using background subtraction and a back histogram projection, then a connectedness checking to determine where the hand was positioned. It then used the center of mass and extremum checks to detect motion translated to user input translated to left mouse “clicks” in the interface, loosely gestured in the “z” dimension of the system.
  • HSV Hue, Saturation, and Value
  • HSV can be employed for hand identification.
  • a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • the processor can be part of a computer system that also has a user interface port that communicates with a user interface, and which receives commands entered by a user, has at least one memory (e.g., hard drive or other comparable storage, and random access memory) that stores electronic information including a program that operates under control of the processor and with communication via the user interface port, and a video output that produces its output via any kind of video output format, e.g., VGA, DVI, HDMI, displayport, or any other form.
  • a memory e.g., hard drive or other comparable storage, and random access memory
  • a processor may also be implemented as a combination of computing devices, e.g. a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. These devices may also be used to select values for devices as described herein.
  • a software module may reside in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium.
  • the storage medium may be integral to the processor.
  • the processor and the storage medium may reside in an ASIC.
  • the ASIC may reside in a user terminal.
  • the processor and the storage medium may reside as discrete components in a user terminal.
  • the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
  • Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a storage media may be any available media that can be accessed by a computer.
  • such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory storage can also be rotating magnetic hard disk drives, optical disk drives, or flash memory based storage drives or other such solid state, magnetic, or optical storage devices.
  • any connection is properly termed a computer-readable medium.
  • the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
  • DSL digital subscriber line
  • Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • the computer readable media can be an article comprising a machine-readable non-transitory tangible medium embodying information indicative of instructions that when performed by one or more machines result in computer implemented operations comprising the actions described throughout this specification.
  • Operations as described herein can be carried out on or over a website.
  • the website can be operated on a server computer, or operated locally, e.g., by being downloaded to the client computer, or operated via a server farm.
  • the website can be accessed over a mobile phone or a PDA, or on any other client.
  • the website can use HTML code in any form, e.g., MHTML, or XML, and via any form such as cascading style sheets (“CSS”) or other or client-side runtime languages such as Flash, HTMLS or Silverlight.
  • CSS cascading style sheets
  • the computers described herein may be any kind of computer, either general purpose, or some specific purpose computer such as a workstation.
  • the programs may be written in C, or Java, Brew or any other programming language.
  • the programs may be resident on a storage medium, e.g., magnetic or optical, e.g. the computer hard drive, a removable disk or media such as a memory stick or SD media, or other removable medium.
  • the programs may also be run over a network, for example, with a server or other machine sending signals to the local machine, which allows the local machine to carry out the operations described herein.

Abstract

The subject system hardware and methodology combine disparate cameras into a cohesive gesture recognition environment. To render an intended computer, gaming, display, etc. control function, two or more cameras with non-coaxial axes are trained on a space to detect and lock onto an object image regardless of its depth coordinate. Each camera captures one 2D view of the gesture and the plurality of 2D gestures are combined to infer the 3D input.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 14/319,653, filed Jun. 30, 2014, which is a continuation of PCT Application No. PCT/US2012/059077, filed Oct. 5, 2012, which claims priority to U.S. Provisional Application No. 61/583,539, filed Jan. 5, 2012, U.S. Provisional Application No. 61/589,182, filed Jan. 20, 2012, and U.S. Provisional Application No. 61/590,251, filed Jan. 24, 2012, all of which are incorporated herein by reference in their entirety for all purposes.
  • BACKGROUND
  • Imaging systems introduced in the computer gaming and associated display control field have made a tremendous impact. Systems offered by Microsoft (KINECT) and Sony (PLAYSTATION MOVE) have been disruptive in the marketplace in creating massive sales numbers of new gaming systems. The tremendous popularity of especially the KINECT system can be traced to the root of expanded play and control capacity of hardware within the game environment. Now instead of simply manipulating a keypad, a character (avatar) within a game environment runs, jumps or dances in coordinated action with a player's own digitized body movements.
  • The PLAYSTATION MOVE is reported as a motion-sensing game controller platform for the PlayStation 3 (PS3). Based on the popular game play style of Nintendo's Wii console, the PlayStation Move uses a camera to track the position of a lighted wand with inertial sensors in the wand to detect its motion. Another wand/object tracking system for video game control is disclosed in U.S. Pat. Nos. 7,843,429 and 8,068,095.
  • Unique to the KINECT system is the ability to capture and control video (including video game) function by gesture recognition. The KINECT system reported to employ a color camera and depth sensor, where the depth sensor employs an infrared projector and a monochrome sensor. Various patents assigned to Microsoft Corp (e.g., US Publication Nos. 20120050157; 20120047468 and 20110310007) further detail applicable hardware and software for image analysis and capture applied to for purpose of computer or video game navigation or control. US Publication No. 20110301934 addresses gesture capture for the purpose of performing sign language translation. Further examples incorporated by reference in this last publication include:
      • U.S. patent application Ser. No. 12/475,094 entitled “Environment and/or Target Segmentation”, filed on May 29, 2009; U.S. patent application Ser. No. 12/511,850, entitled “Auto Generating a Visual Representation”, filed on Jul. 29, 2009; U.S. patent application Ser. No. 12/474,655, “Gesture Tool”, filed on May 29, 2009; U.S. patent application Ser. No. 12/603,437, “Pose Tracking Pipeline”, filed on Oct. 21, 2009; U.S. patent application Ser. No. 12/475,308, “Device for Identifying and Tracking Multiple Humans Over Time”, filed on May 29, 2009, U.S. patent application Ser. No. 12/641,788, “Motion Detection Using Depth Images”, filed on Dec. 18, 2009, U.S. patent application Ser. No. 12/575,388, “Human Tracking System”, filed on Oct. 7, 2009; U.S. patent application Ser. No. 12/422,661, “Gesture Recognizer System Architecture”, filed on Apr. 13, 2009; U.S. patent application Ser. No. 12/391, 150, “Standard Gestures”, filed on Feb. 23, 2009, and U.S. patent application Ser. No. 12/474,655, “Gesture Tool”, filed on May 29, 2009.
        Whether employing a form of stereo imaging, or using the aforementioned depth sensor to map various z-axis planes on a full color captured image, none of the referenced systems contemplate hardware and software systems as provided herein.
  • Indeed, given that the commercial embodiment of the KINECT relies on structured light projection technology its 3D depth detection sensitivity is quite limited. The system requires large movements of the hands or body in order to render correct gesture recognition—as do time-of-flight based sensors.
  • Systems such as the KINECT or others relying on dynamic or passive stereoscopic arrangement rely on feature matching and triangulation calculations to recognize 3D coordinates of objects such as hand or arms. Furthermore, it is important to note that in order for stereo systems to perform the triangulation process, the optical axis of the imaging systems (e.g. dual cameras) and structured-light projection system must be fixed with respect to each other with known calibration parameters. Any deviation from these fixed angles would result in poor depth construction and thus gesture recognition.
  • Consequently, the elements of the KINECT system are arranged in fixed position, separated across the face of bar-shaped housing. Likewise, other systems used for stereo imaging where various hardware components are combined are designed to be connected to establish a fixed and predetermined relationship between the different camera components. See U.S. Pat. Nos. 7,102,686, 7,667,768; 7,466,336, 7,843,487; 8,068,095 and U.S. Pat. No. 8,111,239.
  • More generally, in a typical stereo imaging system, the cameras are fixed in a position known relative to one another. Over a limited angle range of angles, features (such as by SIFT programming) are extracted from each scene captured from first and second cameras. The feature data is combined with calibration data to extract 3D coordinates from the features and then to coordinate user interface/control based on detected motion or otherwise.
  • In another multi-camera system, U.S. Pat. No. 8,111,904 describes methods and apparatus for determining the pose, e.g., position along x-, y- and z-axes, pitch, roll and yaw (or one or more characteristics of that pose) of an object in three dimensions by triangulation of data obtained from multiple images of the object. In a method for 3D machine vision, during a calibration step, multiple cameras disposed to acquire images of the object from different respective viewpoints are calibrated to discern a mapping function that identifies rays in 3D space emanating from each respective camera's lens that correspond to pixel locations in that camera's field of view. In a training step, functionality associated with the cameras is trained to recognize expected patterns in images to be acquired of the object. A runtime step triangulates locations in 3D space of one or more of those patterns from pixel-wise positions of those patterns in images of the object and from the mappings discerned during calibration step.
  • Various multi-camera and/or single-camera, multi-aperture “defocusing” imaging systems to the inventor hereof are also described in the patent literature. See U.S. Pat. Nos. 6,278,847; 7,006,132; 7,612,869 and U.S. Pat. No. 7,612,870. These operate in a manner such that the recorded position of matched point/feature doublets, triplets, etc. are measure in relation to one another against a fixed calibration set or otherwise know relationship between/within the image capture means to generate Z-axis values from imaged X-Y coordinate information.
  • Each of the aforementioned imaging systems is limited in some fashion. Of the commercially-available systems, the PLAYSTATION MOVE requires a wand and the KINECT system offers limited resolution (as further defined herein) and depth of field. All of the stereo imaging approaches further require feature matching and depth extraction by triangulation or other known effect—all leading to computational intensity. Furthermore, it is important to note that in order for stereo systems to perform the triangulation process, the optical axis of the imaging systems (e.g. dual cameras) and structured-light projection system must be fixed with respect to each other with known calibration parameters. Any deviation from these fixed angles would result in poor depth construction and thus gesture recognition. Whether based on stereo imaging triangulation or time-of-flight, such systems require careful calibration, complicated optics, and intensive computational resources to measure the 3D objects in order to capture 3D gestures. Defocusing approaches can also be computationally intensive and may in some cases require marker feature application (e.g., by contrast, projected light, etc.) for best accuracy. The multi-camera pose-based approach described in the '904 patent is perhaps the most computationally intensive approach of all.
  • In addition to the above, because of their inherent constraints, current systems do not allow for the pairing of arbitrary cameras and display systems in order to render depth measurements, thereby requiring the purchase of a system in its entirety as opposed to creation of a gesture recognition system from separate hardware components. Without the teachings herein, it is not currently possible to take advantage of a gesture recognition system using separate camera hardware components such as a computer and a smartphone or a smart television and a networked camera that can be set up in a matter of minutes with few limits on the placement of the various camera components.
  • Systems are provided that operate outside of stereo imaging principles, complex computation or calibration requirements and/or noted hardware limitations. Thus, the systems offer advantages as described below and as further may be apparent to those with skill in the art in review of the subject disclosure.
  • SUMMARY
  • Inventive aspects herein include each of hardware and various system configurations, methods of software controlling the hardware, methods of user interaction with the hardware, data structures created and manipulated with the hardware and other aspects as will be appreciated by those with skill in the art upon review of the subject disclosure.
  • In one aspect, in a manner similar to how a surround sound audio system combines disparate speakers into a cohesive audio environment, the subject system hardware and methodology can combine disparate cameras into a cohesive gesture recognition environment. As long as the objective of the gesture recognition process is to render or result in an intended control function, it can be achieved by positioning two or more cameras with non-coaxial axes to detect and lock onto an object image regardless of its depth coordinate. Each camera “sees” one 2D view of the gesture, and the multitude of 2D gestures are combined to infer the 3D gesture. Notably, the 3D object does not need to be measured to reconstruct the 3D gesture. For example, if one camera sees an object expanding and the second camera sees it moving towards the first camera, it can be inferred that the object is moving towards the first camera without knowing anything about the 3D shape of the object.
  • More specifically, this disclosure presents systems, methodology and a set of tools for dynamic object and gesture recognition for applications such as:
      • touch-free control of large area displays (e.g. moving a cursor on the screen or turning TV functions on or off) through a 3D “surround” gesture recognition system;
      • creating touch-free laptop or desktop monitors by creating a micro surround gesture recognition system; and/or
      • making an interactive kiosk or gambling machine display that works without physical contact.
        Still, these examples are provided in a non-limiting sense.
  • Currently, there are many off-the-shelf image recognition programs that can identify an object from its shape regardless of its scale. Such routines work almost flawlessly as long as the initial planar (XY) location of the object is narrowed to a finite small area. In contrast, operation according to one aspect of the present invention eschews operation within such a limited area/volume. Instead, the crucial step of object recognition can be achieved via a certain predefined motion of the object as the signal for the onset of the object and gesture detection. Exemplary gestures include a rapid hand waving or a circular motion of index finger. Therefore, initial recognition of the object may be based on the type of the motion that it goes through rather than its actual shape or physical configuration. However, in some cases shape recognition of an object might be preferred for triggering the gesture recognition process.
  • In instances where shape recognition is employed, once two non-coaxially located cameras see a similar motion or shape (in 2D) from their respective Z-direction (i.e., on a XY plane normal to their Z axis), they trigger a software program that accurately categorizes the shape and its XY motion from each respected z direction. There are many scale invariant image processing software applications that can conduct the aforementioned function, once the initial approximate location of the object with a given predefined motion is identified.
  • By way of comparison, detection of motion in Z (depth) is usually a difficult task for stereo and defocusing camera systems where the Z information is embedded in XY plane with ambiguities that would result from scale change that results in non-deterministic motion of boundaries of the object. Considering the cross-sectional image of a cone that moves along its axis (Z-view), at any instant it would not be clear to the observer if the expansion of the circular cross section is due to movement along Z-axis or expansion (inflation, deformation) of the cone at that given cross section.
  • However, according to the non-stereo, non-defocusing imaging approaches taught herein, ambiguity can be resolved once a non-coaxial view is provided. Therefore, in stereo and defocusing techniques sensitivity in Z-motion detection is usually far inferior to their XY motion detection capabilities.
  • Instead, the subject motion detection scheme offers equal (in the idealized case and —otherwise—substantially equalized) sensitivity in Z, X and Y motion detection once the same object is identified in both cameras and the shape and/or motion is only analyzed in the XY planes of the respected Z. As contemplated, the z-motion detection for the first camera is provided by the XY-motion of the object viewed by the second camera and vice versa. (Notably, in the present context is should be understood that “pure” differentiated plan views (e.g., XY, YZ, XZ) occur when the cameras are at 90 degrees to one another. However, different embodiments herein work with the cameras set at various angles, so the planes referenced above and elsewhere herein may instead be “hybridized” (i.e., include components of other views.)
  • Accordingly, a “surround” gesture recognition system as taught herein is provided with various features. In it, a first/primary camera may be located at the display plane with its Z axis parallel with a Z axis of the display (or—stated otherwise—with the camera Z axis generally perpendicular to the plane of the display). However, in general, the camera can be placed anywhere as long as it “sees” the object of interest. In addition, a second/secondary camera is provided with a non-parallel axis with respect to the display (or the first/primary camera if it is otherwise situated). The angle between these axes would be optimally at 90 degrees. However, as noted above, the scheme works for many non-zero angles so long as sufficient information along the primary camera's optical axis can be obtained. In those instances where SIFT (or related shape recognition processing) is employed the angle between cameras is preferably at least 30 or above, more preferably it is above about 45 degrees and most preferably above about 60 degrees. Visible or IR illumination is provided for of imaged volume. This illumination may be pulsed/intermittent or continuous. As such, the system may include its own provision for lighting or rely on external/ambient lighting.
  • An input tool employed can be a hand, finger or a hand-held object that is moved during system use. Optionally, a Scale-Invariant Feature Transform (SIFT) recognition software or related Speeded Up Robust Features (SURF) software equivalent may be employed to recognize the gesture “tool” in one or both camera views (or further multiple views if more than two cameras are used).
  • In any case, the selected software may be triggered by a pre-set motion of the gesture tool. The motion can be a rapid movement of the hand, fingers or any hand held object. The predefined motion can be identified by a matching process (e.g. cross correlation scheme) conducted on sets of time sequenced images from the same or different cameras. A look-up table of gestures can be used for efficient identification. Independent 2D gestures captured by each camera can be identified by such matching. The multiple gestures may then be combined to determine the equivalent 3D gesture, without needing to determine features of/for the 3D object. More broadly, such an approach can be employed simply to calculated/determine a 3D path or trajectory of the tool. Such a determination may track like a mouse pointer for user input/interface control or may be otherwise employed.
  • Notably, in this example, the angle between the cameras need not be as great as 30 degrees because feature recognition is not required (see discussion of Lowe, below, where such activity is problematic—at best—with camera angles of less than 30 degrees). And while systems operating at over 30 degree angle between the cameras offer various advantages, meaningful/useful embodiments may be practiced with angular ranges between the primary and secondary cameras of about 10 to about 30 degrees. In these, no feature recognition is employed. Rather, a process as described above (or another) may be used.
  • The subject inventions include software adapted to manipulate data structures to embody the methodologies described. These methodologies necessarily involve computational intensity beyond human capacity. They must be carried out in connection with a processor (i.e., standard or custom/dedicated) in connection with electronic (i.e., magnetic, spintronic, etc.) memory. As such, the necessary hardware includes computer electronics. The hardware further includes an optical system with one or more sensors as described above provided in an arrangement suitable for defocusing imaging.
  • Additional software (optionally encrypted) may be provided to connect the various cameras. Encrypted communication between the camera(s) and the primary unit(s) may be of use insure user privacy and authenticity. Encrypted communication can also be used to insure the secondary (e.g., additional cameras) is approved/validated (e.g., a licensed application).
  • Concerning the hardware and various systems, many options are presented. The system hardware typically involves physically separate camera units. The decision to separate the components is not a mere matter of design choice. The pieces are separate, rather than integral, in order to satisfy the function of setup to provide adequate view axis differentiation for functioning, and, optionally, to do so without encumbering the working environment with gross support structure.
  • Still, in certain embodiments, the cameras may be structurally connected as by a flex-hose or boom for an overhead camera interface, or the camera(s) may be integrated into kiosk panel(s) facing in different directions. In another example, one camera may be embedded at or near the track pad of a laptop keyboard, and another in a video screen bezel.
  • In yet another example, one or more cameras are incorporated in audio surround-sound speakers. One camera may be located on a viewing centerline as part of a display, and another camera with one side-channel surround speaker. Alternatively, cameras can be located in each of two front channel audio speakers. Other options exist as well.
  • The difference in viewpoint between the cameras may be characterized by distance between the cameras as by their angle. Two widely spaced cameras incorporated in speakers setup for stereo audio imaging (e.g., 2 m each apart from a center) can be used in the subject imaging—which is non-stereo in nature. Depending on the focal position, a large planar separation can be thus observed.
  • However, it is more common that the embodiments herein are practiced with at least one camera centered on an axis of where image object movement and/or position will be concentrated. The second camera can also be positioned with an orthogonal or perpendicular view. If not setup as such, the system may include capacity to account for deviation from perpendicular viewing angles.
  • Each of the latter considerations may be important in an advantageous embodiment in which the cameras comprise hardware incorporated in a pair (or trio) of smart phones. These may be setup by a user (perhaps employing custom stands/braces) each bringing one to a friendly game. With the addition of a monitor, a game of virtual table-tennis, dodge-ball, dance or anything else is possible. The game may be two-player as described or a single-player challenge, etc. In any case, an aspect of the invention involves camera repurposing as described.
  • Such a system may comprise two or more cellphone cameras communicating through WiFi (local WiFi or cellphone service provider) or Bluetooth. Other options for wireless image transfer exists as well including—but not limited—to WiFi Direct, WiDi, and wireless USB. One can also envision many variations and combinations of cellphones, web cams and embedded computer monitor cameras and display cameras as components of the surround gesture recognition system. The system software can be run by a general purpose computer, an Application Specific Integrated Circuit (ASIC) or run by software embedded in the components included in the referenced cameras, tablets, or otherwise.
  • More generally, the system may comprise a combination of one, two or more smart phones, laptop computers or tablet computers with cameras or webcams (arranged with non-coaxial optical axes) and wired or wireless image or data transmission capabilities (e.g., as in Bluetooth or WiFi, etc.), and a display system capable of receiving signals from the said computer or tablet computer with camera systems or a computer (laptop) with display system and software to receive images from paired camera(s). The first (or primary) camera can be that of laptop computer or a designated camera embedded on/with a computer and/or television display. Alternatively, the first camera can be a web cam attached to a display and having the same optical axis as that of the display. The second (or secondary) camera can be any camera or camera-equipped device such as a webcam or smart cell phone (e.g., I PHONE or ANDROID) with wireless or wired connection to the computer or television display including a processor. Likewise, a device with front and back cameras can have one repurposed for in the subject approach, with the optics of one or the other re-routed to provide the second channel. In any case, the given hardware setup operates according to principles as further described below.
  • The proposed systems offer various potential advantages over other systems depending on configuration including:
      • optionally, not requiring fixed viewing angles between cameras;
      • optionally, not requiring fixed light projection angle with respect to the imaging cameras' axes;
      • optionally, not requiring calibration;
      • optionally, not requiring extensive computations associated with the triangulation schemes;
      • optionally, configuration allowing quick setup like an audio (or in conjunction with) surround sound system because exact positioning of the cameras is not required;
      • optionally, covering much larger volume than a comparable stereo imaging or defocusing imaging system; and
      • optionally, offering flexibility for combining various hardware components to create a three dimensional system (e.g., Smart phone and computer or Webcam and television in various permutations.)
        Yet, while the subject surround gesture control is sufficient for relative gestures as described above, it may be of limited use for recording/effecting/translating absolute 3D gestures. This means that while the user can move a cursor a desired amount and direction, the user cannot may not be able to move the cursor to an absolute location. Consequently, the gesture control is more like using a mouse than a touchscreen. To gain absolute 3D gesture control, the coordinate systems of the various cameras used in system must be locked. This can be done through calibration.
  • After calibration, the user can either use a gesture to perform a function, or actuate the interface in specific locations. For example, the user could wave his hand to scroll through icons, then select the desired icon by pointing at it directly.
  • Depending on the level of absolute control desired, three levels of calibration may be employed. Per above, if only relative control is needed, calibration is not necessary. The cameras only need to be pointed such they have a common area of view where the gestures will be performed. If the coordinate system of the primary camera is known (e.g., because the camera is fixed/integrated within a dedicated display system, then only the secondary camera needs to be calibrated to offer absolute 3D gesture control. Thus, moving a known target to different points in space is enough to establish the coordinate system of the secondary camera. The apparent position of those points in both cameras can be used to determine the unique view angle of the secondary camera. If all of the cameras are unknown, then the coordinate systems of both cameras are “locked” to the display in order to deliver absolute 3D gesture control. To do this, one moves a known target towards specified targets on the display. The vector of the target motion provides an absolute reference to determine the view angle. This allows the system to calculate the relative coordinate systems between the cameras, each other, and the display.
  • In any case, numerous inventive embodiments are provided herein. These may be individually claimed or claimed in combination. In so doing, it is specifically contemplated that they may be distinguished from the noted BACKGROUND material by way of any form of “negative” limitation, including claim-specific disclaimer of the noted technology as falling outside the subject claim scope.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The figures provided are diagrammatic and not drawn to scale. Variations from the embodiments pictured are contemplated. Accordingly, depictions in the figures are not intended to limit the scope of the invention.
  • FIG. 1 is a flowchart illustrating a general approach know for accomplishing stereo imaging; FIG. 2 is a flowchart illustrating operation of the subject method as distinguished from stereo imaging; FIGS. 3A and 3B illustrate related hardware system implementations for large field imaging; FIG. 4 illustrates hardware implementations for smaller field imaging; FIG. 5A depicts another system implementation concerning second/secondary camera calibration; and FIG. 5B depicts a final system implementation in which both/all cameras are calibrated to a coordinate system.
  • DETAILED DESCRIPTION
  • Various exemplary embodiments are described below. Reference is made to these examples in a non-limiting sense. They are provided to illustrate more broadly applicable aspects of the present inventions. Various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the inventions. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process act(s) or step(s) to the objective(s), spirit or scope of the present inventions. All such modifications are intended to be within the scope of the claims made herein.
  • FIG. 1 illustrates a prior art process for a typical stereo imaging approach. In method 100, as in any typical stereo system, the cameras need to be fixed relative to one another, and to not have too large of an angle between cameras that will be used to extract 3D data. Limitation on angle depends on a number of factors including shadowing or obscuring of features. In any case, depending on the subject to be imaged, angles greater than 30 degrees can become quite problematic. Largely, the limitation derives from steps 102 and 104 in which a first camera is used at 102 to extract features of an object (e.g., SIFT, SURF or other programming methodology) and a second camera is used to extract the same feature.). See Lowe, “Object Recognition from Local Scale-Invariant Features”, Proc. of the International Conference on Computer Vision, Corfu (Sept. 1999) discussing the limitations of SIFT in performing robust recognition of 3D objects to over/across only about a 20 degree range of rotation. As noted above, feature match-up is required. With matching features, that data is combined with calibration data at 106 to extract 3D coordinates (data points) for those features. Then, at 108, the 3D points are employed in control or display for the user interface.
  • FIG. 2 is a flowchart illustrating operation of the subject hardware according to a method 200. It assumes two cameras, with Camera 1 facing toward the user and Camera 2 from another angle—above/below or to the side. In contrast to the above, the features captured from Camera 1 and Camera 2 are not necessarily the same. For hand-gesture based control, the intent is to identify hand-like areas, but the recognized points need not be identical—merely related.
  • In method 200, Camera 1 obtains x-y position of relevant x-y features at 202. Such features may be deemed “relevant” in that they are characterized as hand gestures, for example, in accordance with one or more of the Examples presented below. At 204, Camera 2 obtains z position of different but related features or gestures. Per above, such features or gestures may be deemed “related” in that they are also associated with hand position or motion.
  • If the cameras (or associated system hardware such as included lens optics, sensors, etc.) are calibrated, at 206, a processor manipulating the captured camera data from 204 and 206 uses shared information to help identify which features are useful. One way to use calibration is to take advantage of the fact that the 2D images will (in the sense intended above) “share” a dimension. For example, a camera facing a person and one facing down from the ceiling would share the dimension running horizontal with respect to the person. Thus, once the front facing camera has captured data in a region of interest, relevant points from the other camera could be restricted to a thin horizontal band or slice of the imaged area.
  • Regardless, at 208, the processor uses x-y features for x-y dimensional control, such as captured hand motion to emulate the movement of a mouse on a display screen. At 210, the processor uses z features or gestures for z-dimensional control, such as captured in-out motion of a user's hand to emulate the clicking of a mouse. Ultimately, at 212, the process may integrate x-y and z control information for desired interface control and/or display.
  • Suitable hardware for large (or small) display control, associated gaming, etc. is shown in FIGS. 3A and 3B. In each figure, a display 300 is shown, incorporating a camera 302 with a field of view over a range a. An optical axis of the camera is indicated by axis Z1. In FIG. 3A, a second camera is depicted as a wireless web cam 310. It offers a field of view over a range β and an optical axis Z2.
  • Axes Z1 and Z2 may be set perpendicular or at another angle. They may be at the same level or however else convenient. The robust nature of the subject system allows for significant angular variation. This variation may be a result of either inaccurate or fairly arbitrary user setup.
  • In connection with a web cam 310 made in a more-or-less permanent installation, relatively little angular variation may be expected. However, in a setup such as shown in FIG. 3B, variability in setup may be expected as the norm. The reason being: camera 2 in the setup shown in FIG. 3B is provided in connection with a smart phone 320. Such a device may simply be carried around by a user until setup spontaneously on a furniture ledge. Alternatively, it can be docked in a cradle affixed to a wall when planned for use in creating a desired game space or other display environment.
  • In any case, it is contemplated that smart phone camera could communicate with the display and any associated electronics through built-in WiFi or Bluetooth communication. It may do so in a role where it merely forwards camera image data that is processed in conjunction with hardware associated with (integrally or connected as a peripheral—not shown) the display. Alternatively, given the robust methodology described, the processor on-board the smart phone may be used for processing steps (i.e., any of acts 206, 208, 210 and/or 210) in the method above or otherwise.
  • FIG. 4 illustrates another set of hardware options in connection with a laptop computer 400. Clearly, the computer could be used for such processing. Alternatively, optional smart phone 410 (whether located to the side with axis Z2 or with its camera facing upward with axis Z2′) can share or handle such processing.
  • Alternatively, a second (or third) camera 412 may be provided overhead in association with an adjustable boom 414. Such a boom by be installed on a desk, and even integrate lamp components to disguise its presence and/or function with utility secondary to defining optical axis Z2″ as employed in the subject method(s). Yet another option is to integrate an upward facing camera 420 in the hand rest 422 area of the computer. All said, a number of options exist.
  • FIGS. 5A and 5B illustrate still other hardware options. In these, a first camera 500 is provided in connection with a housing 502 set atop a non-custom display 504. A second camera 510 is integrated in a side-positioned surround-sound speaker 512 (the speaker examples illustrated as in-wall units). Housing 502 advantageously includes the requisite processor and other electronic components. Speaker units 512 may be wireless (for each of the camera and music-center signals) or may be connected by wiring.
  • As referenced above, to gain absolute 3D gesture control, the coordinate systems of the various cameras used in system need to be locked through calibration. FIGS. 5A and 5B illustrate options in this regard.
  • FIG. 5A depicts determination of a coordinate system for absolute gesture recognition for the secondary camera 510 where a coordinate system of the primary camera 500 is known. Comparing the apparent position of common objects (stars—as potentially embodied by a LED-lit wand held and “clicked” to indicate next point by a user) in both cameras allows the system enables determination of the view angle for the secondary camera.
  • FIG. 5B depicts a final system implementation in which both cameras are calibrated to a coordinate system. In this figure, determination of a coordinate system is made where the coordinate system of neither of the cameras is known. By processing a user pointing (as indicated by arrows) at designated targets (sun icons) on the display, the system is able to determine the relative view angles of both cameras together (as indicated by inset image 520 illustrating angle γ).
  • Embodiments operating according to and/or expanding upon the principles described above are described below. These have been reduced to practice in varying degree. In any case, they are intended to present non-limiting exemplary variations within the scope of different inventive aspects.
  • EXAMPLE 1
  • Camera 1 faced the user with its captured image data processed using SURF features along with Kalman Filters and processing of group flows of points to detect regions of interest. The motion of keypoints in the regions of interest (e.g., a user's hand) where then transformed into the x-y motion of the mouse on the computer screen.
  • Camera 2 faced down towards a desk with its captured image data processed using background subtraction and a back histogram projection, then a connectedness checking to determine where the hand was positioned. It then used the center of mass and extremum checks to detect motion translated to user input translated to left mouse “clicks” in the interface, loosely gestured in the “z” dimension of the system.
  • EXAMPLE 2
  • Motion is used as in Example 1, but only to identify regions of interest. Then, a back-histogram projection in HSV (i.e., Hue, Saturation, and Value) space is used to do a color filtering for skin color, and hand reconstruction from that data. Keypoints and motion are extracted using other techniques, such as edge detectors and background subtraction respectively.
  • EXAMPLE 3
  • With a priori knowledge that the user's hand will cover a large portion of the field of view of a camera (such as from a smart phone camera facing upwards as in connection with the laptop embodiment above) then HSV can be employed for hand identification.
  • Variations
  • Although only a few embodiments have been disclosed in detail above, other embodiments are possible and the inventors intend these to be encompassed within this specification. The specification describes specific examples to accomplish a more general goal that may be accomplished in another way. This disclosure is intended to be exemplary, and the claims are intended to cover any modification or alternative which might be predictable to a person having ordinary skill in the art. For example, other shapes of apertures can be used, including round, oval, triangular, and/or elongated. The above devices can be used with color filters for coding different apertures, but can also be used with polarization or other coding schemes.
  • Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. Indeed, given the type of pixel-to-pixel matching for imaged points and associated calculations required with the data structures recorded and manipulated, computer use is necessary. In imaging any object, vast sets of data are collected and stored in a data structure requiring significant manipulation in accordance with imaging principles—including defocusing principles/equations—as noted herein and as incorporated by reference.
  • To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the exemplary embodiments of the invention.
  • The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein, may be implemented or performed with a general purpose processor, a Graphics Processor Unit (GPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. The processor can be part of a computer system that also has a user interface port that communicates with a user interface, and which receives commands entered by a user, has at least one memory (e.g., hard drive or other comparable storage, and random access memory) that stores electronic information including a program that operates under control of the processor and with communication via the user interface port, and a video output that produces its output via any kind of video output format, e.g., VGA, DVI, HDMI, displayport, or any other form.
  • A processor may also be implemented as a combination of computing devices, e.g. a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. These devices may also be used to select values for devices as described herein.
  • The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
  • In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory storage can also be rotating magnetic hard disk drives, optical disk drives, or flash memory based storage drives or other such solid state, magnetic, or optical storage devices. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. The computer readable media can be an article comprising a machine-readable non-transitory tangible medium embodying information indicative of instructions that when performed by one or more machines result in computer implemented operations comprising the actions described throughout this specification.
  • Operations as described herein can be carried out on or over a website. The website can be operated on a server computer, or operated locally, e.g., by being downloaded to the client computer, or operated via a server farm. The website can be accessed over a mobile phone or a PDA, or on any other client. The website can use HTML code in any form, e.g., MHTML, or XML, and via any form such as cascading style sheets (“CSS”) or other or client-side runtime languages such as Flash, HTMLS or Silverlight.
  • Also, the inventors intend that only those claims which use the words “means for” are intended to be interpreted under 35 USC 112, sixth paragraph. Moreover, no limitations from the specification are intended to be read into any claims, unless those limitations are expressly included in the claims. The computers described herein may be any kind of computer, either general purpose, or some specific purpose computer such as a workstation. The programs may be written in C, or Java, Brew or any other programming language. The programs may be resident on a storage medium, e.g., magnetic or optical, e.g. the computer hard drive, a removable disk or media such as a memory stick or SD media, or other removable medium. The programs may also be run over a network, for example, with a server or other machine sending signals to the local machine, which allows the local machine to carry out the operations described herein.
  • Where a specific numerical value is mentioned herein, it should be considered that the value may be increased or decreased by 5%, while still staying within the teachings of the present application, unless some different range is specifically mentioned. Where a specified logical sense is used, the opposite logical sense is also intended to be encompassed.
  • The previous description of the disclosed exemplary embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these exemplary embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (21)

1. A system for input gesture recognition comprising:
a first camera having a first optical axis; and
a second camera second optical axis;
the first and second cameras connected for data transmission to a computer processor,
wherein the computer processor is configured to compare images from the first and second cameras to recognize user input gestures without performing stereo imaging calculations, and
wherein the first and second optical axes are angled with respect to one another by at least about 30 degrees.
2. The system of claim 1, wherein the optical axes are angled by at least about 60 degrees.
3. The system of claim 1, wherein the optical axes are angle by about 90 degrees.
4. The system of claim 1, wherein the no calculations are preformed including triangulation of matched features within the images.
5. The system of claim 1, wherein the axes are setup at an angle unknown to the user.
6. The system of claim 5, wherein image comparison occurs within the processor with an unknown angle.
7. The system of claim 5, wherein the first and second camera are set in separate housings.
8. The system of claim 7, wherein the first camera is set in a housing with a primary display.
9. The system of claim 8, wherein the second camera is set within a smart phone.
10. The system of claim 8, wherein the second camera is set within a tablet computer.
11. The system of claim 7, wherein the first and second cameras are each set within a smart phone.
12. The system of claim 11, wherein the processor is a processor of one or both of the smart phones.
13. The system of claim 8, wherein the primary display is the display of a laptop computer further comprising a keyboard panel section.
14. The system of claim 13, wherein the second camera is housed in the keyboard panel section.
15. The system of claim 13, wherein the second camera is housed in a smart phone positioned on a plane parallel with and in front of the keyboard panel section.
16. The system of claim 8, wherein the primary display is the display of a laptop computer and the second camera is positioned above the laptop computer by a boom.
17. The system of claim 16, wherein the boom is connected to a laptop computer docking station.
18. The system of claim 1, wherein the cameras are wirelessly connected.
19. The system of claim 1, wherein the processor is configured to perform scale invariant feature recognition for successive images from each camera and generate an output signal corresponding to a gesture from a look-up table corresponding to the combination of the recognized features from both cameras.
20. The system of claim 1, wherein the successive images are not necessarily sequential.
21-26. (canceled)
US15/348,663 2012-01-05 2016-11-10 Imaging surround system for touch-free display control Abandoned US20170068326A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/348,663 US20170068326A1 (en) 2012-01-05 2016-11-10 Imaging surround system for touch-free display control

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201261583539P 2012-01-05 2012-01-05
US201261589182P 2012-01-20 2012-01-20
US201261590251P 2012-01-24 2012-01-24
PCT/US2012/059077 WO2013103410A1 (en) 2012-01-05 2012-10-05 Imaging surround systems for touch-free display control
US14/319,653 US9524021B2 (en) 2012-01-05 2014-06-30 Imaging surround system for touch-free display control
US15/348,663 US20170068326A1 (en) 2012-01-05 2016-11-10 Imaging surround system for touch-free display control

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/319,653 Continuation US9524021B2 (en) 2012-01-05 2014-06-30 Imaging surround system for touch-free display control

Publications (1)

Publication Number Publication Date
US20170068326A1 true US20170068326A1 (en) 2017-03-09

Family

ID=48745359

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/319,653 Active US9524021B2 (en) 2012-01-05 2014-06-30 Imaging surround system for touch-free display control
US15/348,663 Abandoned US20170068326A1 (en) 2012-01-05 2016-11-10 Imaging surround system for touch-free display control

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/319,653 Active US9524021B2 (en) 2012-01-05 2014-06-30 Imaging surround system for touch-free display control

Country Status (2)

Country Link
US (2) US9524021B2 (en)
WO (1) WO2013103410A1 (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013103410A1 (en) 2012-01-05 2013-07-11 California Institute Of Technology Imaging surround systems for touch-free display control
US10691219B2 (en) 2012-01-17 2020-06-23 Ultrahaptics IP Two Limited Systems and methods for machine control
US11493998B2 (en) 2012-01-17 2022-11-08 Ultrahaptics IP Two Limited Systems and methods for machine control
US9501152B2 (en) 2013-01-15 2016-11-22 Leap Motion, Inc. Free-space user interface and control using virtual constructs
US9679215B2 (en) 2012-01-17 2017-06-13 Leap Motion, Inc. Systems and methods for machine control
US8638989B2 (en) 2012-01-17 2014-01-28 Leap Motion, Inc. Systems and methods for capturing motion in three-dimensional space
US8693731B2 (en) 2012-01-17 2014-04-08 Leap Motion, Inc. Enhanced contrast for object detection and characterization by optical imaging
US9733713B2 (en) * 2012-12-26 2017-08-15 Futurewei Technologies, Inc. Laser beam based gesture control interface for mobile devices
US9530213B2 (en) 2013-01-02 2016-12-27 California Institute Of Technology Single-sensor system for extracting depth information from image blur
US9459697B2 (en) 2013-01-15 2016-10-04 Leap Motion, Inc. Dynamic, free-space user interactions for machine control
US9702977B2 (en) 2013-03-15 2017-07-11 Leap Motion, Inc. Determining positional information of an object in space
US9916009B2 (en) 2013-04-26 2018-03-13 Leap Motion, Inc. Non-tactile interface systems and methods
US10281987B1 (en) 2013-08-09 2019-05-07 Leap Motion, Inc. Systems and methods of free-space gestural interaction
US9261966B2 (en) * 2013-08-22 2016-02-16 Sony Corporation Close range natural user interface system and method of operation thereof
US10846942B1 (en) 2013-08-29 2020-11-24 Ultrahaptics IP Two Limited Predictive information for free space gesture control and communication
US9632572B2 (en) 2013-10-03 2017-04-25 Leap Motion, Inc. Enhanced field of view to augment three-dimensional (3D) sensory space for free-space gesture interpretation
US9996638B1 (en) 2013-10-31 2018-06-12 Leap Motion, Inc. Predictive information for free space gesture control and communication
JP2016038889A (en) 2014-08-08 2016-03-22 リープ モーション, インコーポレーテッドLeap Motion, Inc. Extended reality followed by motion sensing
CN104436642A (en) * 2014-12-17 2015-03-25 常州市勤业新村幼儿园 Kinect based children dance motion sensing game system and working method thereof
CN104623910B (en) * 2015-01-15 2016-08-24 西安电子科技大学 Dancing auxiliary specially good effect partner system and implementation method
CN104932692B (en) * 2015-06-24 2017-12-08 京东方科技集团股份有限公司 Three-dimensional tactile method for sensing, three-dimensional display apparatus, wearable device
EP3395066B1 (en) * 2015-12-25 2022-08-03 BOE Technology Group Co., Ltd. Depth map generation apparatus, method and non-transitory computer-readable medium therefor
DE102016224095A1 (en) * 2016-12-05 2018-06-07 Robert Bosch Gmbh Method for calibrating a camera and calibration system
US10037458B1 (en) * 2017-05-02 2018-07-31 King Fahd University Of Petroleum And Minerals Automated sign language recognition
US10267383B2 (en) * 2017-05-03 2019-04-23 The Boeing Company Self-aligning virtual elliptical drive
US10489639B2 (en) * 2018-02-12 2019-11-26 Avodah Labs, Inc. Automated sign language translation and communication using multiple input and output modalities
US11461907B2 (en) * 2019-02-15 2022-10-04 EchoPixel, Inc. Glasses-free determination of absolute motion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020024500A1 (en) * 1997-03-06 2002-02-28 Robert Bruce Howard Wireless control device
US20040136083A1 (en) * 2002-10-31 2004-07-15 Microsoft Corporation Optical system design for a universal computing device
US20090231278A1 (en) * 2006-02-08 2009-09-17 Oblong Industries, Inc. Gesture Based Control Using Three-Dimensional Information Extracted Over an Extended Depth of Field
US20100138785A1 (en) * 2006-09-07 2010-06-03 Hirotaka Uoi Gesture input system, method and program

Family Cites Families (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4644390A (en) 1984-11-19 1987-02-17 Fuji Photo Film Co. Ltd. Photoelectric sensor array support package
JPS61135280A (en) 1984-12-06 1986-06-23 Toshiba Corp 3-dimensional image pickup element
EP0585098B1 (en) * 1992-08-24 2000-03-22 Hitachi, Ltd. Sign recognition apparatus and method and sign translation system using same
JP3481631B2 (en) 1995-06-07 2003-12-22 ザ トラスティース オブ コロンビア ユニヴァーシティー イン ザ シティー オブ ニューヨーク Apparatus and method for determining a three-dimensional shape of an object using relative blur in an image due to active illumination and defocus
AU8141198A (en) * 1997-06-20 1999-01-04 Holoplex, Inc. Methods and apparatus for gesture recognition
US20020036617A1 (en) 1998-08-21 2002-03-28 Timothy R. Pryor Novel man machine interfaces and applications
US6720949B1 (en) 1997-08-22 2004-04-13 Timothy R. Pryor Man machine interfaces and applications
JPH11132748A (en) 1997-10-24 1999-05-21 Hitachi Ltd Multi-focal point concurrent detecting device, stereoscopic shape detecting device, external appearance inspecting device, and its method
US6278847B1 (en) 1998-02-25 2001-08-21 California Institute Of Technology Aperture coded camera for three dimensional imaging
US7006132B2 (en) 1998-02-25 2006-02-28 California Institute Of Technology Aperture coded camera for three dimensional imaging
US7612870B2 (en) 1998-02-25 2009-11-03 California Institute Of Technology Single-lens aperture-coded camera for three dimensional imaging in small volumes
JPH11355624A (en) 1998-06-05 1999-12-24 Fuji Photo Film Co Ltd Photographing device
DE19825950C1 (en) 1998-06-12 2000-02-17 Armin Grasnick Arrangement for three-dimensional representation
JP2000207549A (en) 1999-01-11 2000-07-28 Olympus Optical Co Ltd Image processor
US6538255B1 (en) 1999-02-22 2003-03-25 Nikon Corporation Electron gun and electron-beam optical systems and methods including detecting and adjusting transverse beam-intensity profile, and device manufacturing methods including same
US6993179B1 (en) * 2000-08-07 2006-01-31 Koninklijke Philips Electronics N.V. Strapdown system for three-dimensional reconstruction
US6701181B2 (en) 2001-05-31 2004-03-02 Infraredx, Inc. Multi-path optical catheter
US6873868B2 (en) 2001-12-31 2005-03-29 Infraredx, Inc. Multi-fiber catheter probe arrangement for tissue analysis or treatment
US7466336B2 (en) 2002-09-05 2008-12-16 Eastman Kodak Company Camera and method for composing multi-perspective images
US20050251116A1 (en) 2004-05-05 2005-11-10 Minnow Medical, Llc Imaging and eccentric atherosclerotic material laser remodeling and/or ablation catheter
WO2006009786A2 (en) 2004-06-18 2006-01-26 Elmaleh David R Intravascular imaging device and uses thereof
KR101134208B1 (en) 2004-10-01 2012-04-09 더 보드 어브 트러스티스 어브 더 리랜드 스탠포드 주니어 유니버시티 Imaging arrangements and methods therefor
DE102005036486A1 (en) 2005-07-20 2007-01-25 Leica Microsystems (Schweiz) Ag Optical device with increased depth of field
US7742635B2 (en) 2005-09-22 2010-06-22 3M Innovative Properties Company Artifact mitigation in three-dimensional imaging
US20070078500A1 (en) 2005-09-30 2007-04-05 Cornova, Inc. Systems and methods for analysis and treatment of a body lumen
US8111904B2 (en) 2005-10-07 2012-02-07 Cognex Technology And Investment Corp. Methods and apparatus for practical 3D vision system
US7819591B2 (en) 2006-02-13 2010-10-26 3M Innovative Properties Company Monocular three-dimensional imaging
US20070189750A1 (en) 2006-02-16 2007-08-16 Sony Corporation Method of and apparatus for simultaneously capturing and generating multiple blurred images
US7711201B2 (en) 2006-06-22 2010-05-04 Sony Corporation Method of and apparatus for generating a depth map utilized in autofocusing
US7843487B2 (en) 2006-08-28 2010-11-30 Panasonic Corporation System of linkable cameras, each receiving, contributing to the encoding of, and transmitting an image
US8077964B2 (en) 2007-03-19 2011-12-13 Sony Corporation Two dimensional/three dimensional digital information acquisition and display device
JP2010526481A (en) 2007-04-23 2010-07-29 カリフォルニア インスティテュート オブ テクノロジー Single lens 3D imaging device using a diaphragm aperture mask encoded with polarization combined with a sensor sensitive to polarization
US20100094138A1 (en) 2008-07-25 2010-04-15 Morteza Gharib Imaging catheter using laser profile for plaque depth measurement
WO2010028176A1 (en) * 2008-09-03 2010-03-11 Oblong Industries, Inc. Control system for navigating a principal dimension of a data space
US8487938B2 (en) 2009-01-30 2013-07-16 Microsoft Corporation Standard Gestures
US7996793B2 (en) 2009-01-30 2011-08-09 Microsoft Corporation Gesture recognizer system architecture
US8295546B2 (en) 2009-01-30 2012-10-23 Microsoft Corporation Pose tracking pipeline
EP2243523A1 (en) * 2009-04-21 2010-10-27 Research In Motion Limited A method and portable electronic device for golf swing detection for scoring assistance
US8856691B2 (en) 2009-05-29 2014-10-07 Microsoft Corporation Gesture tool
US8744121B2 (en) 2009-05-29 2014-06-03 Microsoft Corporation Device for identifying and tracking multiple humans over time
US8379101B2 (en) 2009-05-29 2013-02-19 Microsoft Corporation Environment and/or target segmentation
US20110025689A1 (en) 2009-07-29 2011-02-03 Microsoft Corporation Auto-Generating A Visual Representation
US8564534B2 (en) 2009-10-07 2013-10-22 Microsoft Corporation Human tracking system
US20110150271A1 (en) 2009-12-18 2011-06-23 Microsoft Corporation Motion detection using depth images
JP5468404B2 (en) 2010-02-02 2014-04-09 パナソニック株式会社 Imaging apparatus and imaging method, and image processing method for the imaging apparatus
US8751215B2 (en) 2010-06-04 2014-06-10 Microsoft Corporation Machine based sign language interpreter
EP2584310A4 (en) 2010-06-17 2014-07-09 Panasonic Corp Image processing device, and image processing method
US8416187B2 (en) 2010-06-22 2013-04-09 Microsoft Corporation Item navigation using motion-capture data
US9075434B2 (en) 2010-08-20 2015-07-07 Microsoft Technology Licensing, Llc Translating user motion into multiple object responses
US8582820B2 (en) 2010-09-24 2013-11-12 Apple Inc. Coded aperture camera with adaptive image processing
KR20120041839A (en) 2010-10-22 2012-05-03 충북대학교 산학협력단 Method for obtaining 3d images using chromatic aberration of lens and 3d microscope using there of
US8564712B2 (en) 2011-03-15 2013-10-22 Sony Corporation Blur difference estimation using multi-kernel convolution
KR101801355B1 (en) 2011-03-25 2017-11-24 엘지전자 주식회사 Apparatus for recognizing distance of object using diffracting element and light source
WO2013103410A1 (en) 2012-01-05 2013-07-11 California Institute Of Technology Imaging surround systems for touch-free display control
US9530213B2 (en) 2013-01-02 2016-12-27 California Institute Of Technology Single-sensor system for extracting depth information from image blur

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020024500A1 (en) * 1997-03-06 2002-02-28 Robert Bruce Howard Wireless control device
US20040136083A1 (en) * 2002-10-31 2004-07-15 Microsoft Corporation Optical system design for a universal computing device
US20090231278A1 (en) * 2006-02-08 2009-09-17 Oblong Industries, Inc. Gesture Based Control Using Three-Dimensional Information Extracted Over an Extended Depth of Field
US20100138785A1 (en) * 2006-09-07 2010-06-03 Hirotaka Uoi Gesture input system, method and program

Also Published As

Publication number Publication date
US9524021B2 (en) 2016-12-20
US20150009149A1 (en) 2015-01-08
WO2013103410A1 (en) 2013-07-11

Similar Documents

Publication Publication Date Title
US9524021B2 (en) Imaging surround system for touch-free display control
US11099637B2 (en) Dynamic adjustment of user interface
US8660362B2 (en) Combined depth filtering and super resolution
US9491441B2 (en) Method to extend laser depth map range
US9293118B2 (en) Client device
TWI531929B (en) Identifying a target touch region of a touch-sensitive surface based on an image
TWI649675B (en) Display device
US8388146B2 (en) Anamorphic projection device
CN105493154B (en) System and method for determining the range of the plane in augmented reality environment
US20120198353A1 (en) Transferring data using a physical gesture
US10220304B2 (en) Boolean/float controller and gesture recognition system
CN111566596B (en) Real world portal for virtual reality displays
US20100201808A1 (en) Camera based motion sensing system
US20140139429A1 (en) System and method for computer vision based hand gesture identification
US11854147B2 (en) Augmented reality guidance that generates guidance markers
KR20150082358A (en) Reference coordinate system determination
JP6609640B2 (en) Managing feature data for environment mapping on electronic devices
US11582409B2 (en) Visual-inertial tracking using rolling shutter cameras
US11689877B2 (en) Immersive augmented reality experiences using spatial audio
JP2016503915A (en) Aim and press natural user input
EP2557482A2 (en) Input device, system and method
US20200311403A1 (en) Tracking System and Method
US10444825B2 (en) Drift cancelation for portable object detection and tracking
WO2018006481A1 (en) Motion-sensing operation method and device for mobile terminal
US20230367118A1 (en) Augmented reality gaming using virtual eyewear beams

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION