US20140307056A1 - Multimodal Foreground Background Segmentation - Google Patents

Multimodal Foreground Background Segmentation Download PDF

Info

Publication number
US20140307056A1
US20140307056A1 US13/918,747 US201313918747A US2014307056A1 US 20140307056 A1 US20140307056 A1 US 20140307056A1 US 201313918747 A US201313918747 A US 201313918747A US 2014307056 A1 US2014307056 A1 US 2014307056A1
Authority
US
United States
Prior art keywords
data
background
pixel
depth
foreground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/918,747
Inventor
Alvaro Collet Romea
Bao Zhang
Adam G. Kirk
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Priority to US13/918,747 priority Critical patent/US20140307056A1/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: COLLET ROMEA, ALVARO, ZHANG, BAO, KIRK, Adam G.
Priority to EP14726262.0A priority patent/EP2987139A1/en
Priority to CN201480021522.8A priority patent/CN105229697B/en
Priority to PCT/US2014/033914 priority patent/WO2014172226A1/en
Publication of US20140307056A1 publication Critical patent/US20140307056A1/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Priority to US16/214,027 priority patent/US11546567B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • H04N13/0007
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/22Measuring arrangements characterised by the use of optical techniques for measuring depth
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • H04N9/74Circuits for processing colour signals for obtaining special effects
    • H04N9/75Chroma key
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • G06T7/0079
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/174Segmentation; Edge detection involving the use of two or more images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • G06T7/596Depth or shape recovery from multiple images from stereo images from three or more stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Definitions

  • segmentation is used to separate foreground objects (e.g., people) from the background.
  • foreground objects e.g., people
  • segmentation allows video of a foreground person to captured and placed in front of a different background.
  • chroma keying typically a screen of a known color such as green or sometimes blue is placed in the original background.
  • greenscreening typically a green screen is typically used in the background, whereby pixels that are not that shade of green are considered foreground pixels.
  • Another segmentation technique is based upon background subtraction, where the background is first captured without anything in the foreground, whereby when a foreground object (or objects) is present, the before and after difference is used to remove the background.
  • Recent developments in depth sensing also have resulted in attempts to use depth data to separate foreground objects from a background.
  • a foreground background segmentation framework including a multimodal segmentation algorithm configured to accept contribution factors from different segmentation modalities.
  • the multimodal segmentation algorithm processes the contribution factors to determine foreground versus background data for each element (e.g., pixel) of an image, whereby the data is useable by a segmentation algorithm to determine whether that element is a foreground or background element.
  • One or more aspects are directed towards processing a frame of image data, and processing depth data computed from a corresponding depth-related image.
  • Background subtraction is performed on an element of the image data to obtain a background subtraction contribution factor for that element.
  • One or more other depth-based contribution factors may be determined based upon the depth data associated with that element.
  • a combined data term based at least in part upon a contribution from the background subtraction contribution factor and a contribution from each of the one or more other depth-based contribution factors is computed.
  • the data term in conjunction with other data terms as input to a global binary segmentation mechanism to obtain a segmented image.
  • One or more aspects are directed towards steps selecting a pixel as a selected pixel, and processing pixel data, including processing RGB pixel data of one or more images to determine one or more RGB contributing factors indicative of whether the selected pixel is likely a foreground or background pixel in a current image.
  • Infrared pixel data of one or more infrared images may be processed to determine one or more IR contributing factors
  • pixel depth data may be processed to determine one or more depth-based contributing factors.
  • the contributing factors are combined into a data term for the selected pixel, which is maintained for the selected pixel independent of other data terms for any other pixels.
  • the steps are repeated to obtain data terms for a plurality of pixels.
  • FIG. 1 is a block diagram representing example components that may be used to perform multimodal foreground background segmentation, according to one or more example implementations.
  • FIG. 2 is a representation of how a multimodal segmentation framework may be used in a multiple camera set scenario, according to one or more example implementations.
  • FIG. 3 is a representation of how RGB and infrared background subtraction modalities may be used to obtain contribution factors related to foreground versus background pixel data, according to one or more example implementations.
  • FIG. 4 is a representation of how a chroma keying modality may be used to obtain a contribution factor related to foreground versus background pixel data, according to one or more example implementations.
  • FIG. 5 is a representation of how current image depth data versus known background depth data may be used to obtain a contribution factor related to foreground versus background pixel data, according to one or more example implementations.
  • FIG. 6 is a representation of how current image depth data versus threshold depth data may be used to obtain a contribution factor related to foreground versus background pixel data, according to one or more example implementations.
  • FIG. 7 is a flow diagram showing example steps that may be taken by a framework to combine various modality inputs into segmentation-related data according to one or more example implementations.
  • FIG. 8 is a flow diagram showing example steps that may be taken to use segmentation-related data corresponding to one or more other cameras to compute segmentation-related data of a camera, according to one or more example implementations.
  • FIG. 9 is a block diagram representing an exemplary non-limiting computing system or operating environment into which one or more aspects of various embodiments described herein can be implemented.
  • Various aspects of the technology described herein are generally directed towards a framework that allows using a combination of image-based factors, depth-based factors, and domain knowledge of a scene to perform foreground/background segmentation.
  • the framework is configured to exploit different modalities of information to achieve more robust and accurate foreground/background segmentation results relative to existing solutions.
  • a red, green and blue (RGB) image, an infrared (IR) image and a depth map for that image may be obtained.
  • the data in the various images may be processed on a per-element (e.g., per-pixel) basis to determine a set of factors.
  • the factors are mathematically combined into a probability value indicative of whether the element, (referred to hereinafter as a “pixel” except where otherwise noted), is in the foreground or the background.
  • a probability function that provides a probability of a given pixel being foreground or background based upon multimodal information.
  • the probability data for the image pixels may be fed into a Global Binary Segmentation algorithm, e.g., graph cuts algorithm, to obtain foreground/background segmentation of an image frame that is highly robust as a result of the multimodal, multi-cue probability function.
  • any of the examples herein are non-limiting.
  • RGB red, green blue
  • CMYK typically used in printing or 3D printing
  • the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in segmentation and/or image processing in general.
  • FIG. 1 shows an example system in which a pod 100 comprising stereo IR cameras 101 and 102 , stereo RGB cameras 103 and 104 , and a projector 106 (e.g., an IR laser diffracted into many thousands of dots) captures one or more frames of stereo (e.g., clean) IR images 108 , RGB images 109 and depth data 110 (e.g., stereo images of the projected light pattern).
  • a projector 106 e.g., an IR laser diffracted into many thousands of dots
  • the exemplified pod 100 is only one example arrangement, and that in other arrangements, the cameras 101 - 104 may be arranged in any order relative to one another. Indeed, in one implementation the projector is positioned above the cameras. Further, any of the cameras and/or the projector may be separated from one another, rather than being part of any pod configuration; no pod is needed. Thus, FIG. 1 is only showing components for purposes of explanation, and no scale, relative dimensions, relative positions, combinations of devices within a housing/pod device and so on should be inferred from FIG. 1 .
  • the pod 100 is coupled to (or combined with) an image capturing system or subsystem 112 .
  • the stereo cameras 101 and 102 , and 103 and 104 are generally controlled, e.g., via camera interface 114 and controller 116 , to capture stereo images synchronized in time (e.g., the cameras are “genlocked”).
  • the cameras 101 and 102 capture infrared (IR) depth data 110 , as IR is highly effective in depth estimation in varying light conditions and does not affect the visible appearance of the scene.
  • IR infrared
  • a projector 106 projects an IR pattern onto a scene, such as a pattern of spots (e.g., dots) or a line pattern, although other spot shapes and/or pattern types may be used.
  • spots e.g., dots
  • line pattern e.g., dots
  • dots are generally described hereinafter.
  • the IR cameras 102 and 103 capture texture data as part of the infrared depth image data 110 .
  • the projector 106 is shown as coupled to the controller 116 via a projector interface 118 ; any such control may be as simple as turning the projector on and off or using energy saving modes, however more complex control such as pulsing, changing dot distribution, changing intensity and/or the like is feasible.
  • the images 108 - 110 captured by the cameras 101 - 104 are provided to an image processing system (or subsystem) 120 .
  • the image processing system 120 and image capturing system or subsystem 104 may be combined into a single device.
  • a home entertainment device may include all of the components shown in FIG. 1 (as well as others not shown).
  • parts (or all) of the image capturing system or subsystem 104 such as the cameras and projector may be a separate device that couples to a gaming console, personal computer, mobile device, dedicated processing device and/or the like, which may include some or all of the image processing functionality.
  • the image processing system or subsystem 120 includes a processor 121 and a memory 122 containing one or more image processing algorithms, including a multimodal, multi-cue foreground background segmentation algorithm 124 as described herein.
  • the segmentation algorithm 124 outputs a set of per-pixel probability data 126 , representative of whether each pixel is likely to be a foreground or background pixel.
  • the pixel probability data 126 is input into a global binary segmentation algorithm 128 (e.g., a Graph Cuts algorithm), which uses the pixel probability data 126 as a data term to segment the image into a segmented image 130 , e.g., the foreground only as part of a stream of segmented images.
  • the stream of images 130 is generally used by another internal or external image processing component, such as for special effects.
  • an interface 132 to the image processing system or subsystem 120 such as for connecting a keyboard, game controller, display, pointing device microphone for speech commands and/or the like as appropriate for a user to interact with an application or the like.
  • FIG. 2 shows a plurality of pods 200 1 - 200 4 arranged to capture images of an object (e.g., a person) from different perspectives. Note that while four such pods are depicted in FIG. 2 , it is understood that any practical number may be present in a given configuration. For example, one such studio-like configuration uses nine pods, with two sets of four pods at different heights surrounding a space plus one pod above the space.
  • the IR and RGB image data captured from each of the four (or more) pods may be used to form an RGB point cloud and an IR point cloud.
  • the point cloud data may be based upon the foreground data segmented into the image 130 ( FIG. 1 ), e.g., by combining the foreground image 130 with a similar foreground image segmented based upon the data captured at each pod.
  • the projectors capture IR and RGB images of a foreground object, e.g., person 230 , (as well as the background) at each pod. Further, each pod may project the light pattern (IR dots) onto the scene. The reflected IR light is captured at each pod 200 1 - 200 4 , as the depth data image, and may be used via known stereo matching techniques to determine a depth map.
  • a foreground object e.g., person 230
  • each pod may project the light pattern (IR dots) onto the scene.
  • the reflected IR light is captured at each pod 200 1 - 200 4 , as the depth data image, and may be used via known stereo matching techniques to determine a depth map.
  • each pod may have its own image processing system, or the pods may feed images to a centralized image processing system.
  • any data related to segmentation e.g., the pixel probability data
  • the pixel probability data may be communicated among the image processing systems, such as represented in FIG. 4 by data D 200 1 -D 200 4 being sent to and from the image processing system 120 .
  • the probability of each pixel for each pod is known in one location. The use pixel probability data corresponding to other pods is described below.
  • the multimodal, multi-cue foreground background segmentation algorithm 124 provides a framework for combining the contributions of different color separation mechanisms that are available in a given scenario. These include any contribution (D 1 ) obtained via chroma keying, any contribution (D 2 ) obtained via RGB background subtraction, any contribution (D 3 ) obtained via IR background subtraction, any contribution (D 4 ) obtained via distinguishing a frame's depth values from previously captured background depth value, and any contribution (D 5 ) obtained via prior knowledge of the background (e.g., known background depth). In one implementation these contributions may be weighted relative to one another and summed, whereby the order of computing such contributions is irrelevant.
  • pixels are exemplified, however “element” represents one pixel, a set of two or more pixels, and/or one or more sub-pixels that are used to obtain the contribution of each individual segmentation mechanism/modality, even if an element is different for a different segmentation mechanism/modality.
  • element represents one pixel, a set of two or more pixels, and/or one or more sub-pixels that are used to obtain the contribution of each individual segmentation mechanism/modality, even if an element is different for a different segmentation mechanism/modality.
  • individual pixels are the elements in one implementation, and thus used hereinafter as a typical example.
  • a suitable computation for determining a pixel's probability of being foreground or background is:
  • D e (D 1 +D 2 +D 3 + ⁇ D 4 + ⁇ D 5 ) .
  • the value may be normalized such as to be between zero and one, e.g., with closer to zero meaning the more likely a background pixel (or vice-versa).
  • D e (vD 1 +wD 2 +xD 3 +yD 4 +zD 5 ) .
  • the depth-related factors may have a different weight or weights (e.g., the same weight a for depth, which may be a fractional value) from the non-depth factors, e.g.,:
  • D e (D 1 +D 2 +D 3 +aD 4 +aD 5 ) .
  • any of the weight values may be user configurable with a default if not chosen by a user.
  • sets of weights may be provided for different scenarios, e.g., one weight set for dim visible light, another weight set for bright visible light, and so on.
  • a weight or a contribution may be set to zero, such as if no contribution is available.
  • chroma keying may not always be available for a scenario, and/or or for a particular pod among many pods, such as in a studio setup.
  • the weights need not be the same between pods.
  • a pod facing a greenscreen “straight on” may have a stronger (D 2 ) chroma keying weight than a pod that captures the greenscreen at an angle.
  • a stereo camera that computes depth data via stereo differencing using IR illumination may be given a higher weight a for D 4 and D 5 computations, for example, than a time-of-flight depth camera.
  • the weights for a given camera set or pod may be learned and calibrated on a per-camera set/pod basis.
  • weights may be used based upon different conditions. For example, as visible light gets dimmer and dimmer, more and more weight may be given to infrared-based contributions, e.g., D 3 , D 4 and D 5 than in bright light.
  • the framework thus may be adapted to whatever external decision such as lighting decision is used to select parameters for the weights, the capabilities of the cameras, scenarios such as whether a greenscreen may be used for a given camera, and so on.
  • FIG. 3 shows how the contributions D 1 and D 3 may be obtained based upon background subtraction.
  • An initial RGB background image is captured, as well as an initial (e.g., clean) IR background image and a depth image for processing into depth data, provided the appropriate cameras are available. Rather than capturing one image per type, it is appreciated that these may be sets of stereo images.
  • Block 330 represents any or all of these possibilities.
  • a foreground object 331 is captured in a current frame (represented by 332 )
  • RGB RGB
  • IR IR
  • depth which may be stereo images.
  • current refers to the frame being processed for segmentation, and need not be a frame of “live” video.
  • the blocks 330 and 332 in FIG. 3 shows one visible image as an example, but it is understood that blocks 330 and 332 also represents any IR image and depth imaging data, as well as stereo images for each.
  • Background subtraction of RGB is a well known technique, and may be used with IR as well.
  • background subtraction 334 by performing background subtraction 334 with the before (only background) and after (background plus foreground) RGB images, which may be on more than one before-and-after set (such as in the case of stereo) the contribution factor D 1 is obtained for each pixel.
  • background subtraction 334 is performed on the before and after IR images to obtain the contribution factor D 3 for each pixel.
  • the values for D 1 and/or D 3 need not be binary “foreground or background” results 336 , but may be a value that indicates some uncertainty. For example, if a pixel being evaluated known to be in an area where the foreground and background are similar and/or blurry (e.g., as determined by a previous path-type processing algorithm), a value between zero and one may be the result, for example; indeed, an entire patch of pixels can be classified as uncertain. A pixel in a blurred area may have one value that differs from a value for a pixel in an area deemed similar, which may differ from an area that is deemed both blurry and similar.
  • Blur and similarity areas may be determined via the IR and/or RGB images, or a combination of both, and possibly even by processing the depth image. As can be readily appreciated, the uncertainty reduces the factor's contribution to the other factors (independent of other weighting).
  • FIG. 4 shows the use of chroma keying to obtain this factor's D 2 contribution.
  • the a priori known values, e.g., of a greenscreen are represented as lowercase rgb (to distinguish from the current frame's RGB, represented in uppercase), and in general may be the same throughout the entire background, but may differ if desired, as long as each background pixel's color values are known.
  • the pixels behind the foreground object 441 are significantly smaller than represented, and block 440 is not intended to convey any sizes, relative sizes, number of pixels and/or the like.
  • Block 442 represents chroma key separation, with the result represented in block 444 .
  • the result need not be a binary foreground or background decision, but may include uncertainty.
  • the D 2 value may represent this uncertainty, because it may be the background changed slightly caused by differences in lighting/reflection off of the foreground object, or may be caused by a foreground object having a similar color, e.g., a human is wearing a necktie with a pattern that includes some closely colored material.
  • this is not as significant as with chroma key separation alone, because the D 2 value at any pixel is only one contributing factor to the framework.
  • the framework processes the same stream of data per image type, e.g., the RGB data only be captured once per camera frame to be used with RGB processing mechanisms (background subtraction and chroma keying) described herein.
  • FIG. 5 shows how the (current computed depth with previously captured/computed depth) D 4 factor may be obtained by “background depth subtraction” 552 , namely by comparing current foreground depth values (represented symbolically by “1” in block 550 against previously captured background depth values represented by various other single digit numbers. Note that some errors/noise may occur, e.g., there are some “1 s” in the background and a “5” in the foreground. However, D 4 is only one contributing factor rather than a determinative one, and thus such noise ultimately may be insignificant. Some level of uncertainty also may be indicated by a non-binary value, e.g., if the difference appears as an outlier compared to other pixels' differences, possibly in a patch-based scheme.
  • FIG. 6 shows the use of depth data (block 660 ) against a known, fixed depth or threshold to make a decision (block 662 ) that becomes the D 5 result (block 664 ).
  • a studio may be set up such that a person is instructed to stand within 4.0 meters relative to a camera location. Any depth captured over 5.0 meters is considered background during the per-pixel processing. Again, there may be noise, but D 5 is only one contributing factor.
  • an “uncertain” decision may be indicated in the result (block 664 ), be present in the value, e.g., a pixel at 4.5 meters may be considered uncertain.
  • Te actual value may be indicative of the uncertainty, e.g., a score between zero (0) and one (1) that is proportional to the computed difference between 4.0 and 5.0 meters.
  • FIG. 7 is a flow diagram showing example steps that may be taken to obtain the contributing factors and use them for segmentation.
  • Step 702 represents capturing the background information, including RGB, IR (e.g., clean IR) and depth (IR with projected light pattern) images.
  • Step 704 computes the background depth.
  • Step 706 captures the current frame of RGB and IR (e.g., clean and for depth) images.
  • Step 708 computes the current depth.
  • Step 709 selects a pixel (e.g., the relevant pixel values at the same pixel location in each of the three images).
  • Step 710 uses the current RGB values at this pixel location to get D1 via background subtraction with a counterpart pixel in the background RGB image.
  • Step 712 represents determining whether chroma-keying is active; if so, step 714 gets the D2 contribution factor value. If not, e.g., there is no greenscreen for this camera set, whereby the D 2 value (or the corresponding weight) may be set to zero in the framework so there is no contribution from this modality. Note that any of the other modalities similarly may not be active, in which event the contribution for such a modality may be set to zero for all current pixels corresponding to that modality; however the chroma key active versus inactive modality is used as an example in FIG. 7 because this modality is likely quite variable in many scenarios. Indeed, even in a carefully controlled multi-camera studio environment, a greenscreen may not entirely surround a foreground object, whereby one or more cameras may not have chroma keying active.
  • Steps 716 and 718 use IR background subtraction on the corresponding background only and background plus foreground IR image and “depth background subtraction” on the corresponding background only and background plus foreground depth data, respectively. This provides values for the D 3 and D 4 contributions.
  • Step 720 is the measured current depth versus “threshold” depth evaluation to obtain a D5 value for this pixel, as described above. At this time, the contributing factor values are obtained for this pixel, which are computed into the pixel probability value D, as described above.
  • Step 724 repeats for the next pixel (location) in the images. Note that in one implementation, any of steps 709 - 724 may be done in parallel with similar steps performed on another pixel or pixels. Note that some of the steps may be performed in GPU hardware, which is highly parallel.
  • this data may be fed as data terms into a graph cuts algorithm (with an attractive potential for the smoothness term of Graph Cuts used) or another global binary segmentation technique (e.g. maximum likelihood graphical model, Markov random field and so on).
  • the output segmented image can either be a binary segmentation into foreground/background, or a soft boundary, in which edge pixels can be partially in the foreground/background (e.g., alpha matting techniques).
  • the segmented image may be output as part of a stream, for example.
  • knowledge about a pixel from one or more other cameras may be known and used as part of the current pixel processing.
  • a given pixel has a highly uncertain probability value, such as close to 0.5 (halfway between background and foreground).
  • Another camera with a different angle and possibly additional information e.g., the other camera had chroma keying active, while the one with the highly uncertain probability value did not
  • This information may be used to change or bias the uncertain probability value to a more certain value.
  • another camera can provide its full set of D1-D5 values, or some lesser set thereof.
  • depth information is needed at each other camera to leverage one or more other cameras' data.
  • One way the use of such other information may be accomplished is by using the other information (e.g., the computed D probability) as another contributing factor, e.g., as a “D6” value, with an appropriate weight.
  • the other information e.g., the computed D probability
  • the process may be iterative, as the D value corresponding to one camera may change the D value corresponding to another, which then may change the other one, and so on.
  • the iterations may be limited for practical reasons.
  • a simpler way is to use only the initial D values computed at each camera with another camera's D value, in some way that biases the initial D value. For example, consider for simplicity that there is only one other camera that provides D′ as its initially computed probability. D′ may be used once to possibly alter D, rather than iteratively.
  • FIG. 8 shows such an example, beginning at step 802 where the probability data D is computed for a current camera, (corresponding to step 722 of FIG. 7 ).
  • This D value for this pixel is “sent” to other camera locations for their use (where “sent” in a centralized processing scenario refers to maintaining that value in association with each other camera's probability data.
  • Step 806 “receives” the other's probability data (each a D′ value) for use.
  • Steps 808 , 810 and 812 represent one way the other D′ values may be used. For example, if the local D is already certain above or below a threshold uncertainty range, then D is used as is. Otherwise via steps 810 and 812 , D is biased with the average of the other D′values, or some other combination of the other D′values, e.g., a consensus. The bias may increase or decrease the initial D value, and may be weighted to reduce or increase the influence of the other cameras. These D′ values from the other cameras may have different weights relative to each another so that all other cameras need not be treated equally.
  • an uncertain probability may be replaced by the most certain one among other probabilities, or replaced with an average or consensus thereof of multiple probabilities for this pixel, and so on.
  • a given camera may not even have any of its images processed for segmentation, but rely on the data (e.g., probability data) computed from other camera locations. For example, consider that in FIG. 2 three of the four cameras capture a greenscreen in the background, capture infrared data and so one, while a fourth camera does not. Indeed, at an extreme, the fourth camera may be a simple RGB camera for which no previous background data or a priori background knowledge exists. Segmentation may be performed with this camera's images using only the foreground-background data corresponding to one or more other cameras.
  • the data e.g., probability data
  • Another aspect is image processing to detect information in the image as a whole or in patches. For example, as set forth above, blur and similarity detection may be employed. Other detection such as object recognizers may be leveraged. For example, often foreground objects are people (even if close to the background), whereby face/person detection may be used as another factor. Certain objects such as a company's commercial items while capturing a commercial advertisement may be recognized so as to bias them toward the foreground or force them into the foreground.
  • FIG. 9 illustrates an example of a suitable computing and networking environment 900 into which computer-related examples and implementations described herein may be implemented, for example.
  • the computing system environment 900 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 900 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the example operating environment 900 .
  • the invention is operational with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
  • the invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types.
  • the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in local and/or remote computer storage media including memory storage devices.
  • an example system for implementing various aspects of the invention may include a general purpose computing device in the form of a computer 910 .
  • Components of the computer 910 may include, but are not limited to, a processing unit 920 , a system memory 930 , and a system bus 921 that couples various system components including the system memory to the processing unit 920 .
  • the system bus 921 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
  • such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • EISA Enhanced ISA
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • the computer 910 typically includes a variety of computer-readable media.
  • Computer-readable media can be any available media that can be accessed by the computer 910 and includes both volatile and nonvolatile media, and removable and non-removable media.
  • Computer-readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 910 .
  • Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above may also be included within the scope of computer-readable media.
  • the system memory 930 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 931 and random access memory (RAM) 932 .
  • ROM read only memory
  • RAM random access memory
  • BIOS basic input/output system
  • RAM 932 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 920 .
  • FIG. 9 illustrates operating system 934 , application programs 935 , other program modules 936 and program data 937 .
  • the computer 910 may also include other removable/non-removable, volatile/nonvolatile computer storage media.
  • FIG. 9 illustrates a hard disk drive 941 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 951 that reads from or writes to a removable, nonvolatile magnetic disk 952 , and an optical disk drive 955 that reads from or writes to a removable, nonvolatile optical disk 956 such as a CD ROM or other optical media.
  • removable/non-removable, volatile/nonvolatile computer storage media that can be used in the example operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
  • the hard disk drive 941 is typically connected to the system bus 921 through a non-removable memory interface such as interface 940
  • magnetic disk drive 951 and optical disk drive 955 are typically connected to the system bus 921 by a removable memory interface, such as interface 950 .
  • the drives and their associated computer storage media provide storage of computer-readable instructions, data structures, program modules and other data for the computer 910 .
  • hard disk drive 941 is illustrated as storing operating system 944 , application programs 945 , other program modules 946 and program data 947 .
  • operating system 944 application programs 945 , other program modules 946 and program data 947 are given different numbers herein to illustrate that, at a minimum, they are different copies.
  • a user may enter commands and information into the computer 910 through input devices such as a tablet, or electronic digitizer, 964 , a microphone 963 , a keyboard 962 and pointing device 961 , commonly referred to as mouse, trackball or touch pad.
  • Other input devices not shown in FIG. 9 may include a joystick, game pad, satellite dish, scanner, or the like.
  • These and other input devices are often connected to the processing unit 920 through a user input interface 960 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB).
  • a monitor 991 or other type of display device is also connected to the system bus 921 via an interface, such as a video interface 990 .
  • the monitor 991 may also be integrated with a touch-screen panel or the like. Note that the monitor and/or touch screen panel can be physically coupled to a housing in which the computing device 910 is incorporated, such as in a tablet-type personal computer. In addition, computers such as the computing device 910 may also include other peripheral output devices such as speakers 995 and printer 996 , which may be connected through an output peripheral interface 994 or the like.
  • the computer 910 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 980 .
  • the remote computer 980 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 910 , although only a memory storage device 981 has been illustrated in FIG. 9 .
  • the logical connections depicted in FIG. 9 include one or more local area networks (LAN) 971 and one or more wide area networks (WAN) 973 , but may also include other networks.
  • LAN local area network
  • WAN wide area network
  • Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
  • the computer 910 When used in a LAN networking environment, the computer 910 is connected to the LAN 971 through a network interface or adapter 970 .
  • the computer 910 When used in a WAN networking environment, the computer 910 typically includes a modem 972 or other means for establishing communications over the WAN 973 , such as the Internet.
  • the modem 972 which may be internal or external, may be connected to the system bus 921 via the user input interface 960 or other appropriate mechanism.
  • a wireless networking component 974 such as comprising an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a WAN or LAN.
  • program modules depicted relative to the computer 910 may be stored in the remote memory storage device.
  • FIG. 9 illustrates remote application programs 985 as residing on memory device 981 . It may be appreciated that the network connections shown are examples and other means of establishing a communications link between the computers may be used.
  • An auxiliary subsystem 999 (e.g., for auxiliary display of content) may be connected via the user interface 960 to allow data such as program content, system status and event notifications to be provided to the user, even if the main portions of the computer system are in a low power state.
  • the auxiliary subsystem 999 may be connected to the modem 972 and/or network interface 970 to allow communication between these systems while the main processing unit 920 is in a low power state.
  • the functionally described herein can be performed, at least in part, by one or more hardware logic components.
  • illustrative types of hardware logic components include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System on chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.

Abstract

The subject disclosure is directed towards a framework that is configured to allow different background-foreground segmentation modalities to contribute towards segmentation. In one aspect, pixels are processed based upon RGB background separation, chroma keying, IR background separation, current depth versus background depth and current depth versus threshold background depth modalities. Each modality may contribute as a factor that the framework combines to determine a probability as to whether a pixel is foreground or background. The probabilities are fed into a global segmentation framework to obtain a segmented image.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • The present application claims priority to U.S. provisional patent application Ser. No. 61/812,233, filed Apr. 15, 2013.
  • BACKGROUND
  • In video processing, segmentation is used to separate foreground objects (e.g., people) from the background. As one example often used in movies and television, segmentation allows video of a foreground person to captured and placed in front of a different background.
  • One well-known existing segmentation technique is based upon chroma key segmentation (chroma keying), where typically a screen of a known color such as green or sometimes blue is placed in the original background. When a foreground object appears in front of the screen, anything that does not match that screen color is considered foreground; (this is often referred to as “greenscreening” because a green screen is typically used in the background, whereby pixels that are not that shade of green are considered foreground pixels).
  • Another segmentation technique is based upon background subtraction, where the background is first captured without anything in the foreground, whereby when a foreground object (or objects) is present, the before and after difference is used to remove the background. Recent developments in depth sensing also have resulted in attempts to use depth data to separate foreground objects from a background.
  • However, while existing solutions provide segmentation in certain situations, they are not particularly robust. Indeed, as scenarios such as multiple camera studios are used to capture three-dimensional point clouds of a foreground object from all viewpoints, these solutions are generally inadequate. For example, chroma key segmentation generally needs very controlled conditions, whereby any change in illumination or background color hinders the performance. Further, chroma keying is limited to situations where a screen can be placed in the background, which is often not practical or possible. Background subtraction has problems in disambiguating areas in which the foreground and background are similar, and areas in which the image is imperfect (e.g., blurry). Depth data is subject to noise, and thus depth-based segmentation is not sufficient in many scenarios.
  • SUMMARY
  • This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.
  • Briefly, one or more of various aspects of the subject matter described herein are directed towards a foreground background segmentation framework, including a multimodal segmentation algorithm configured to accept contribution factors from different segmentation modalities. The multimodal segmentation algorithm processes the contribution factors to determine foreground versus background data for each element (e.g., pixel) of an image, whereby the data is useable by a segmentation algorithm to determine whether that element is a foreground or background element.
  • One or more aspects are directed towards processing a frame of image data, and processing depth data computed from a corresponding depth-related image. Background subtraction is performed on an element of the image data to obtain a background subtraction contribution factor for that element. One or more other depth-based contribution factors may be determined based upon the depth data associated with that element. A combined data term based at least in part upon a contribution from the background subtraction contribution factor and a contribution from each of the one or more other depth-based contribution factors is computed. The data term in conjunction with other data terms as input to a global binary segmentation mechanism to obtain a segmented image.
  • One or more aspects are directed towards steps selecting a pixel as a selected pixel, and processing pixel data, including processing RGB pixel data of one or more images to determine one or more RGB contributing factors indicative of whether the selected pixel is likely a foreground or background pixel in a current image. Infrared pixel data of one or more infrared images may be processed to determine one or more IR contributing factors, and pixel depth data may be processed to determine one or more depth-based contributing factors. The contributing factors are combined into a data term for the selected pixel, which is maintained for the selected pixel independent of other data terms for any other pixels. The steps are repeated to obtain data terms for a plurality of pixels.
  • Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
  • FIG. 1 is a block diagram representing example components that may be used to perform multimodal foreground background segmentation, according to one or more example implementations.
  • FIG. 2 is a representation of how a multimodal segmentation framework may be used in a multiple camera set scenario, according to one or more example implementations.
  • FIG. 3 is a representation of how RGB and infrared background subtraction modalities may be used to obtain contribution factors related to foreground versus background pixel data, according to one or more example implementations.
  • FIG. 4 is a representation of how a chroma keying modality may be used to obtain a contribution factor related to foreground versus background pixel data, according to one or more example implementations.
  • FIG. 5 is a representation of how current image depth data versus known background depth data may be used to obtain a contribution factor related to foreground versus background pixel data, according to one or more example implementations.
  • FIG. 6 is a representation of how current image depth data versus threshold depth data may be used to obtain a contribution factor related to foreground versus background pixel data, according to one or more example implementations.
  • FIG. 7 is a flow diagram showing example steps that may be taken by a framework to combine various modality inputs into segmentation-related data according to one or more example implementations.
  • FIG. 8 is a flow diagram showing example steps that may be taken to use segmentation-related data corresponding to one or more other cameras to compute segmentation-related data of a camera, according to one or more example implementations.
  • FIG. 9 is a block diagram representing an exemplary non-limiting computing system or operating environment into which one or more aspects of various embodiments described herein can be implemented.
  • DETAILED DESCRIPTION
  • Various aspects of the technology described herein are generally directed towards a framework that allows using a combination of image-based factors, depth-based factors, and domain knowledge of a scene to perform foreground/background segmentation. Unlike existing techniques based upon single mode solutions, the framework is configured to exploit different modalities of information to achieve more robust and accurate foreground/background segmentation results relative to existing solutions.
  • In one aspect, for each frame of a video stream, a red, green and blue (RGB) image, an infrared (IR) image and a depth map for that image may be obtained. The data in the various images may be processed on a per-element (e.g., per-pixel) basis to determine a set of factors. The factors are mathematically combined into a probability value indicative of whether the element, (referred to hereinafter as a “pixel” except where otherwise noted), is in the foreground or the background.
  • Thus, instead of a single mode solution, a probability function that provides a probability of a given pixel being foreground or background based upon multimodal information. The probability data for the image pixels may be fed into a Global Binary Segmentation algorithm, e.g., graph cuts algorithm, to obtain foreground/background segmentation of an image frame that is highly robust as a result of the multimodal, multi-cue probability function.
  • It should be understood that any of the examples herein are non-limiting. For example, while RGB (red, green blue) color component data is described, data based upon other color schemes such as CMYK typically used in printing or 3D printing may be used. Further, not all exemplified modalities may be present in a given configuration. As such, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in segmentation and/or image processing in general.
  • FIG. 1 shows an example system in which a pod 100 comprising stereo IR cameras 101 and 102, stereo RGB cameras 103 and 104, and a projector 106 (e.g., an IR laser diffracted into many thousands of dots) captures one or more frames of stereo (e.g., clean) IR images 108, RGB images 109 and depth data 110 (e.g., stereo images of the projected light pattern). Single images may benefit from the technology described herein, but generally a stream of images is processed for segmentation.
  • Note that the exemplified pod 100 is only one example arrangement, and that in other arrangements, the cameras 101-104 may be arranged in any order relative to one another. Indeed, in one implementation the projector is positioned above the cameras. Further, any of the cameras and/or the projector may be separated from one another, rather than being part of any pod configuration; no pod is needed. Thus, FIG. 1 is only showing components for purposes of explanation, and no scale, relative dimensions, relative positions, combinations of devices within a housing/pod device and so on should be inferred from FIG. 1.
  • In the example of FIG. 1, the pod 100 is coupled to (or combined with) an image capturing system or subsystem 112. The stereo cameras 101 and 102, and 103 and 104 are generally controlled, e.g., via camera interface 114 and controller 116, to capture stereo images synchronized in time (e.g., the cameras are “genlocked”). In one implementation the cameras 101 and 102 capture infrared (IR) depth data 110, as IR is highly effective in depth estimation in varying light conditions and does not affect the visible appearance of the scene. As can be readily appreciated and as exemplified below, in some scenarios such as studio environments, more than one such pod and image capturing system/subsystem may be present.
  • In FIG. 1, a projector 106 is shown that projects an IR pattern onto a scene, such as a pattern of spots (e.g., dots) or a line pattern, although other spot shapes and/or pattern types may be used. For purposes of brevity, dots are generally described hereinafter. By illuminating the scene with a relatively large number of distributed infrared dots, the IR cameras 102 and 103 capture texture data as part of the infrared depth image data 110. Note that the projector 106 is shown as coupled to the controller 116 via a projector interface 118; any such control may be as simple as turning the projector on and off or using energy saving modes, however more complex control such as pulsing, changing dot distribution, changing intensity and/or the like is feasible.
  • The images 108-110 captured by the cameras 101-104 are provided to an image processing system (or subsystem) 120. In some implementations, the image processing system 120 and image capturing system or subsystem 104, or parts thereof, may be combined into a single device. For example a home entertainment device may include all of the components shown in FIG. 1 (as well as others not shown). In other implementations, parts (or all) of the image capturing system or subsystem 104, such as the cameras and projector may be a separate device that couples to a gaming console, personal computer, mobile device, dedicated processing device and/or the like, which may include some or all of the image processing functionality.
  • The image processing system or subsystem 120 includes a processor 121 and a memory 122 containing one or more image processing algorithms, including a multimodal, multi-cue foreground background segmentation algorithm 124 as described herein. In general, the segmentation algorithm 124 outputs a set of per-pixel probability data 126, representative of whether each pixel is likely to be a foreground or background pixel. The pixel probability data 126 is input into a global binary segmentation algorithm 128 (e.g., a Graph Cuts algorithm), which uses the pixel probability data 126 as a data term to segment the image into a segmented image 130, e.g., the foreground only as part of a stream of segmented images. The stream of images 130 is generally used by another internal or external image processing component, such as for special effects.
  • Also shown in FIG. 1 is an interface 132 to the image processing system or subsystem 120, such as for connecting a keyboard, game controller, display, pointing device microphone for speech commands and/or the like as appropriate for a user to interact with an application or the like.
  • FIG. 2 shows a plurality of pods 200 1-200 4 arranged to capture images of an object (e.g., a person) from different perspectives. Note that while four such pods are depicted in FIG. 2, it is understood that any practical number may be present in a given configuration. For example, one such studio-like configuration uses nine pods, with two sets of four pods at different heights surrounding a space plus one pod above the space.
  • In the example of FIG. 2, the IR and RGB image data captured from each of the four (or more) pods may be used to form an RGB point cloud and an IR point cloud. The point cloud data may be based upon the foreground data segmented into the image 130 (FIG. 1), e.g., by combining the foreground image 130 with a similar foreground image segmented based upon the data captured at each pod.
  • As generally represented in FIG. 2, the projectors capture IR and RGB images of a foreground object, e.g., person 230, (as well as the background) at each pod. Further, each pod may project the light pattern (IR dots) onto the scene. The reflected IR light is captured at each pod 200 1-200 4, as the depth data image, and may be used via known stereo matching techniques to determine a depth map.
  • Note that each pod may have its own image processing system, or the pods may feed images to a centralized image processing system. In the former configuration, any data related to segmentation, e.g., the pixel probability data, may be communicated among the image processing systems, such as represented in FIG. 4 by data D200 1 -D200 4 being sent to and from the image processing system 120. In the latter (centralized) configuration, the probability of each pixel for each pod is known in one location. The use pixel probability data corresponding to other pods is described below.
  • The multimodal, multi-cue foreground background segmentation algorithm 124 provides a framework for combining the contributions of different color separation mechanisms that are available in a given scenario. These include any contribution (D1) obtained via chroma keying, any contribution (D2) obtained via RGB background subtraction, any contribution (D3) obtained via IR background subtraction, any contribution (D4) obtained via distinguishing a frame's depth values from previously captured background depth value, and any contribution (D5) obtained via prior knowledge of the background (e.g., known background depth). In one implementation these contributions may be weighted relative to one another and summed, whereby the order of computing such contributions is irrelevant.
  • Note that the contributions are determined per pixel for the images obtained by a camera set, e.g., two stereo RGB and IR cameras per set). However, it is feasible to compute the contributions at different level (e.g., sets of two-by-two pixels, and so on; note that depth can be estimated at sub-pixel levels as well). Thus, as used herein, pixels are exemplified, however “element” represents one pixel, a set of two or more pixels, and/or one or more sub-pixels that are used to obtain the contribution of each individual segmentation mechanism/modality, even if an element is different for a different segmentation mechanism/modality. Notwithstanding, individual pixels are the elements in one implementation, and thus used hereinafter as a typical example.
  • A suitable computation for determining a pixel's probability of being foreground or background is:

  • D=e (D 1 +D 2 +D 3 +αD 4 +αD 5 ).
  • Note that the value may be normalized such as to be between zero and one, e.g., with closer to zero meaning the more likely a background pixel (or vice-versa).
  • As set forth above, these contributions may be individually weighted:

  • D=e (vD 1 +wD 2 +xD 3 +yD 4 +zD 5 ).
  • Alternatively, some of the weights may be grouped or set to one, e.g., the depth-related factors may have a different weight or weights (e.g., the same weight a for depth, which may be a fractional value) from the non-depth factors, e.g.,:

  • D=e (D 1 +D 2 +D 3 +aD 4 +aD 5 ).
  • Note that any of the weight values (including the above depth weight a) may be user configurable with a default if not chosen by a user. Alternatively, sets of weights may be provided for different scenarios, e.g., one weight set for dim visible light, another weight set for bright visible light, and so on.
  • In the framework, a weight or a contribution may be set to zero, such as if no contribution is available. For example, chroma keying may not always be available for a scenario, and/or or for a particular pod among many pods, such as in a studio setup.
  • Further, even if present, the weights need not be the same between pods. For example, a pod facing a greenscreen “straight on” may have a stronger (D2) chroma keying weight than a pod that captures the greenscreen at an angle. A stereo camera that computes depth data via stereo differencing using IR illumination may be given a higher weight a for D4 and D5 computations, for example, than a time-of-flight depth camera. The weights for a given camera set or pod may be learned and calibrated on a per-camera set/pod basis.
  • Different sets of weights may be used based upon different conditions. For example, as visible light gets dimmer and dimmer, more and more weight may be given to infrared-based contributions, e.g., D3, D4 and D5 than in bright light. The framework thus may be adapted to whatever external decision such as lighting decision is used to select parameters for the weights, the capabilities of the cameras, scenarios such as whether a greenscreen may be used for a given camera, and so on.
  • FIG. 3 shows how the contributions D1 and D3 may be obtained based upon background subtraction. An initial RGB background image is captured, as well as an initial (e.g., clean) IR background image and a depth image for processing into depth data, provided the appropriate cameras are available. Rather than capturing one image per type, it is appreciated that these may be sets of stereo images. Block 330 represents any or all of these possibilities.
  • When a foreground object 331 is captured in a current frame (represented by 332), the same types of images are captured, RGB, IR and depth, which may be stereo images. Note that “current” refers to the frame being processed for segmentation, and need not be a frame of “live” video. For viewability purposes, the blocks 330 and 332 in FIG. 3 shows one visible image as an example, but it is understood that blocks 330 and 332 also represents any IR image and depth imaging data, as well as stereo images for each.
  • Background subtraction of RGB is a well known technique, and may be used with IR as well. Thus, by performing background subtraction 334 with the before (only background) and after (background plus foreground) RGB images, which may be on more than one before-and-after set (such as in the case of stereo) the contribution factor D1 is obtained for each pixel. Similarly, background subtraction 334 is performed on the before and after IR images to obtain the contribution factor D3 for each pixel.
  • The values for D1 and/or D3 need not be binary “foreground or background” results 336, but may be a value that indicates some uncertainty. For example, if a pixel being evaluated known to be in an area where the foreground and background are similar and/or blurry (e.g., as determined by a previous path-type processing algorithm), a value between zero and one may be the result, for example; indeed, an entire patch of pixels can be classified as uncertain. A pixel in a blurred area may have one value that differs from a value for a pixel in an area deemed similar, which may differ from an area that is deemed both blurry and similar. Blur and similarity areas (or other uncertain areas) may be determined via the IR and/or RGB images, or a combination of both, and possibly even by processing the depth image. As can be readily appreciated, the uncertainty reduces the factor's contribution to the other factors (independent of other weighting).
  • FIG. 4 shows the use of chroma keying to obtain this factor's D2 contribution. In FIG. 4 the a priori known values, e.g., of a greenscreen are represented as lowercase rgb (to distinguish from the current frame's RGB, represented in uppercase), and in general may be the same throughout the entire background, but may differ if desired, as long as each background pixel's color values are known. Note that in block 440 the pixels behind the foreground object 441 are significantly smaller than represented, and block 440 is not intended to convey any sizes, relative sizes, number of pixels and/or the like.
  • Block 442 represents chroma key separation, with the result represented in block 444. As with other decisions, the result need not be a binary foreground or background decision, but may include uncertainty. For example, if a pixel's RGB values are close to what the background pixel value is known to be, but not exact, then the D2 value may represent this uncertainty, because it may be the background changed slightly caused by differences in lighting/reflection off of the foreground object, or may be caused by a foreground object having a similar color, e.g., a human is wearing a necktie with a pattern that includes some closely colored material. Again, this is not as significant as with chroma key separation alone, because the D2 value at any pixel is only one contributing factor to the framework.
  • Note that the framework processes the same stream of data per image type, e.g., the RGB data only be captured once per camera frame to be used with RGB processing mechanisms (background subtraction and chroma keying) described herein.
  • FIG. 5 shows how the (current computed depth with previously captured/computed depth) D4 factor may be obtained by “background depth subtraction” 552, namely by comparing current foreground depth values (represented symbolically by “1” in block 550 against previously captured background depth values represented by various other single digit numbers. Note that some errors/noise may occur, e.g., there are some “1 s” in the background and a “5” in the foreground. However, D4 is only one contributing factor rather than a determinative one, and thus such noise ultimately may be insignificant. Some level of uncertainty also may be indicated by a non-binary value, e.g., if the difference appears as an outlier compared to other pixels' differences, possibly in a patch-based scheme.
  • FIG. 6 shows the use of depth data (block 660) against a known, fixed depth or threshold to make a decision (block 662) that becomes the D5 result (block 664). For example, a studio may be set up such that a person is instructed to stand within 4.0 meters relative to a camera location. Any depth captured over 5.0 meters is considered background during the per-pixel processing. Again, there may be noise, but D5 is only one contributing factor. Further, as with other decisions described herein, an “uncertain” decision may be indicated in the result (block 664), be present in the value, e.g., a pixel at 4.5 meters may be considered uncertain. Te actual value may be indicative of the uncertainty, e.g., a score between zero (0) and one (1) that is proportional to the computed difference between 4.0 and 5.0 meters.
  • FIG. 7 is a flow diagram showing example steps that may be taken to obtain the contributing factors and use them for segmentation. Step 702 represents capturing the background information, including RGB, IR (e.g., clean IR) and depth (IR with projected light pattern) images. Step 704 computes the background depth.
  • Sometime later, a foreground image is captured for segmentation. Step 706 captures the current frame of RGB and IR (e.g., clean and for depth) images. Step 708 computes the current depth.
  • Step 709 selects a pixel (e.g., the relevant pixel values at the same pixel location in each of the three images). Step 710 uses the current RGB values at this pixel location to get D1 via background subtraction with a counterpart pixel in the background RGB image.
  • Step 712 represents determining whether chroma-keying is active; if so, step 714 gets the D2 contribution factor value. If not, e.g., there is no greenscreen for this camera set, whereby the D2 value (or the corresponding weight) may be set to zero in the framework so there is no contribution from this modality. Note that any of the other modalities similarly may not be active, in which event the contribution for such a modality may be set to zero for all current pixels corresponding to that modality; however the chroma key active versus inactive modality is used as an example in FIG. 7 because this modality is likely quite variable in many scenarios. Indeed, even in a carefully controlled multi-camera studio environment, a greenscreen may not entirely surround a foreground object, whereby one or more cameras may not have chroma keying active.
  • Steps 716 and 718 use IR background subtraction on the corresponding background only and background plus foreground IR image and “depth background subtraction” on the corresponding background only and background plus foreground depth data, respectively. This provides values for the D3 and D4 contributions.
  • Step 720 is the measured current depth versus “threshold” depth evaluation to obtain a D5 value for this pixel, as described above. At this time, the contributing factor values are obtained for this pixel, which are computed into the pixel probability value D, as described above.
  • Step 724 repeats for the next pixel (location) in the images. Note that in one implementation, any of steps 709-724 may be done in parallel with similar steps performed on another pixel or pixels. Note that some of the steps may be performed in GPU hardware, which is highly parallel.
  • When the pixels each have a respective D probability, at step 726 this data may be fed as data terms into a graph cuts algorithm (with an attractive potential for the smoothness term of Graph Cuts used) or another global binary segmentation technique (e.g. maximum likelihood graphical model, Markov random field and so on). The output segmented image can either be a binary segmentation into foreground/background, or a soft boundary, in which edge pixels can be partially in the foreground/background (e.g., alpha matting techniques). At step 728 the segmented image may be output as part of a stream, for example.
  • Turning to another aspect, generally represented in FIG. 8, as set forth above knowledge about a pixel from one or more other cameras (including the other half of a stereo pair or an entirely different camera set) may be known and used as part of the current pixel processing. For example, consider that a given pixel has a highly uncertain probability value, such as close to 0.5 (halfway between background and foreground). Another camera with a different angle and possibly additional information (e.g., the other camera had chroma keying active, while the one with the highly uncertain probability value did not) may have a far more certain probability, e.g., 0.9. This information may be used to change or bias the uncertain probability value to a more certain value. Note that instead of providing the D value, another camera can provide its full set of D1-D5 values, or some lesser set thereof. However, depth information is needed at each other camera to leverage one or more other cameras' data.
  • One way the use of such other information may be accomplished is by using the other information (e.g., the computed D probability) as another contributing factor, e.g., as a “D6” value, with an appropriate weight. There may be one other factor per other camera pixel, e.g., D6, D7, D8 and so on, or one or more may be combined; these other cameras may have their other information combined into as little as one single additional contributing D6 factor, for example. However, this means that there is an initial D probability used by others, because a final D value is not yet known until each other's probability information is obtained.
  • Thus, the process may be iterative, as the D value corresponding to one camera may change the D value corresponding to another, which then may change the other one, and so on. The iterations may be limited for practical reasons.
  • A simpler way is to use only the initial D values computed at each camera with another camera's D value, in some way that biases the initial D value. For example, consider for simplicity that there is only one other camera that provides D′ as its initially computed probability. D′ may be used once to possibly alter D, rather than iteratively.
  • FIG. 8 shows such an example, beginning at step 802 where the probability data D is computed for a current camera, (corresponding to step 722 of FIG. 7). This D value for this pixel is “sent” to other camera locations for their use (where “sent” in a centralized processing scenario refers to maintaining that value in association with each other camera's probability data. Step 806 “receives” the other's probability data (each a D′ value) for use.
  • Steps 808, 810 and 812 represent one way the other D′ values may be used. For example, if the local D is already certain above or below a threshold uncertainty range, then D is used as is. Otherwise via steps 810 and 812, D is biased with the average of the other D′values, or some other combination of the other D′values, e.g., a consensus. The bias may increase or decrease the initial D value, and may be weighted to reduce or increase the influence of the other cameras. These D′ values from the other cameras may have different weights relative to each another so that all other cameras need not be treated equally.
  • As can be readily appreciated, there are numerous ways to use other camera data. For example, rather than (or after) biasing, an uncertain probability may be replaced by the most certain one among other probabilities, or replaced with an average or consensus thereof of multiple probabilities for this pixel, and so on.
  • Indeed, a given camera may not even have any of its images processed for segmentation, but rely on the data (e.g., probability data) computed from other camera locations. For example, consider that in FIG. 2 three of the four cameras capture a greenscreen in the background, capture infrared data and so one, while a fourth camera does not. Indeed, at an extreme, the fourth camera may be a simple RGB camera for which no previous background data or a priori background knowledge exists. Segmentation may be performed with this camera's images using only the foreground-background data corresponding to one or more other cameras.
  • Another aspect is image processing to detect information in the image as a whole or in patches. For example, as set forth above, blur and similarity detection may be employed. Other detection such as object recognizers may be leveraged. For example, often foreground objects are people (even if close to the background), whereby face/person detection may be used as another factor. Certain objects such as a company's commercial items while capturing a commercial advertisement may be recognized so as to bias them toward the foreground or force them into the foreground.
  • Example Operating Environment
  • FIG. 9 illustrates an example of a suitable computing and networking environment 900 into which computer-related examples and implementations described herein may be implemented, for example. The computing system environment 900 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 900 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the example operating environment 900.
  • The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
  • The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
  • With reference to FIG. 9, an example system for implementing various aspects of the invention may include a general purpose computing device in the form of a computer 910. Components of the computer 910 may include, but are not limited to, a processing unit 920, a system memory 930, and a system bus 921 that couples various system components including the system memory to the processing unit 920. The system bus 921 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
  • The computer 910 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 910 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 910.
  • Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above may also be included within the scope of computer-readable media.
  • The system memory 930 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 931 and random access memory (RAM) 932. A basic input/output system 933 (BIOS), containing the basic routines that help to transfer information between elements within computer 910, such as during start-up, is typically stored in ROM 931. RAM 932 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 920. By way of example, and not limitation, FIG. 9 illustrates operating system 934, application programs 935, other program modules 936 and program data 937.
  • The computer 910 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 9 illustrates a hard disk drive 941 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 951 that reads from or writes to a removable, nonvolatile magnetic disk 952, and an optical disk drive 955 that reads from or writes to a removable, nonvolatile optical disk 956 such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the example operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 941 is typically connected to the system bus 921 through a non-removable memory interface such as interface 940, and magnetic disk drive 951 and optical disk drive 955 are typically connected to the system bus 921 by a removable memory interface, such as interface 950.
  • The drives and their associated computer storage media, described above and illustrated in FIG. 9, provide storage of computer-readable instructions, data structures, program modules and other data for the computer 910. In FIG. 9, for example, hard disk drive 941 is illustrated as storing operating system 944, application programs 945, other program modules 946 and program data 947. Note that these components can either be the same as or different from operating system 934, application programs 935, other program modules 936, and program data 937. Operating system 944, application programs 945, other program modules 946, and program data 947 are given different numbers herein to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 910 through input devices such as a tablet, or electronic digitizer, 964, a microphone 963, a keyboard 962 and pointing device 961, commonly referred to as mouse, trackball or touch pad. Other input devices not shown in FIG. 9 may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 920 through a user input interface 960 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 991 or other type of display device is also connected to the system bus 921 via an interface, such as a video interface 990. The monitor 991 may also be integrated with a touch-screen panel or the like. Note that the monitor and/or touch screen panel can be physically coupled to a housing in which the computing device 910 is incorporated, such as in a tablet-type personal computer. In addition, computers such as the computing device 910 may also include other peripheral output devices such as speakers 995 and printer 996, which may be connected through an output peripheral interface 994 or the like.
  • The computer 910 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 980. The remote computer 980 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 910, although only a memory storage device 981 has been illustrated in FIG. 9. The logical connections depicted in FIG. 9 include one or more local area networks (LAN) 971 and one or more wide area networks (WAN) 973, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
  • When used in a LAN networking environment, the computer 910 is connected to the LAN 971 through a network interface or adapter 970. When used in a WAN networking environment, the computer 910 typically includes a modem 972 or other means for establishing communications over the WAN 973, such as the Internet. The modem 972, which may be internal or external, may be connected to the system bus 921 via the user input interface 960 or other appropriate mechanism. A wireless networking component 974 such as comprising an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a WAN or LAN. In a networked environment, program modules depicted relative to the computer 910, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 9 illustrates remote application programs 985 as residing on memory device 981. It may be appreciated that the network connections shown are examples and other means of establishing a communications link between the computers may be used.
  • An auxiliary subsystem 999 (e.g., for auxiliary display of content) may be connected via the user interface 960 to allow data such as program content, system status and event notifications to be provided to the user, even if the main portions of the computer system are in a low power state. The auxiliary subsystem 999 may be connected to the modem 972 and/or network interface 970 to allow communication between these systems while the main processing unit 920 is in a low power state.
  • Alternatively, or in addition, the functionally described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System on chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.
  • CONCLUSION
  • While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.

Claims (20)

What is claimed is:
1. A system comprising, a foreground background segmentation framework, including a multimodal segmentation algorithm configured to accept contribution factors from different segmentation modalities and process the contribution factors to determine foreground versus background data for each element of an image that is useable to determine whether that element is a foreground or background element.
2. The system of claim 1 wherein at least one element comprises a pixel.
3. The system of claim 1 wherein the foreground versus background data comprises a probability score.
4. The system of claim 1 wherein the different segmentation modalities correspond to any of: a red, green blue (RGB) background subtraction, chroma keying, infrared (IR) background subtraction, a current computed depth versus previously computed background depth evaluation, or a current depth versus threshold depth evaluation.
5. The system of claim 1 wherein the foreground background segmentation framework is further configured to output the foreground versus background data for each element to a global binary segmentation algorithm.
6. The system of claim 1 wherein the framework is configured to apply a weight for each contribution factor.
7. The system of claim 6 wherein the framework is configured to select a weight set from among a plurality of weight sets to apply the weight for each contribution factor.
8. The system of claim 6 wherein the framework is coupled to a multiple camera set environment, and wherein the framework is configured to apply a weight set to one camera set that is different from a weight set applied to another camera set.
9. The system of claim 1 wherein the framework is coupled to a multiple camera set environment, and wherein the framework is configured to determine the foreground versus background data based on zero or more contribution factors in conjunction with information that corresponds to other camera foreground versus background data.
10. The system of claim 1 wherein the framework is configured to determine the foreground versus background data based on zero or more contribution factors and detection information processed from an image.
11. A method, comprising, processing a frame of image data and processing depth data computed from a corresponding depth-related image, including performing background subtraction on an element of the image data to obtain a background subtraction contribution factor for that element, determining one or more other depth-based contribution factors based upon the depth data associated with that element, computing a combined data term based at least in part upon a contribution from the background contribution factor and a contribution from each of the one or more other depth-based contribution factors, and using the data term in conjunction with other data terms as input to a global binary segmentation mechanism to obtain a segmented image.
12. The method of claim 11 further comprising processing a frame of image data using chroma keying to obtain a chroma keying contribution factor, for the element and wherein computing the combined data term further comprises using a contribution from the chroma keying contribution factor.
13. The method of claim 11 wherein performing the background subtraction on an element of the image data comprises performing infrared background subtraction using captured infrared image data for a current element and previously captured background infrared image data.
14. The method of claim 11 wherein determining the one or more other depth-based contribution factors comprises evaluating a difference between currently captured depth data corresponding to the element and previously captured background depth data corresponding to the element.
15. The method of claim 11 wherein determining the one or more other depth-based contribution factors comprises evaluating currently captured depth data corresponding to the element and threshold depth data
16. The method of claim 11 further comprising, using information corresponding to background versus foreground information corresponding to at least one other cameras as in computing the combined data term.
17. One or more machine-readable storage media or logic having executable instructions, which when executed perform steps, comprising:
(a) selecting a pixel as a selected pixel;
(b) processing pixel data, including:
processing red, green and blue (RGB) pixel data of one or more images to determine one or more RGB contributing factors indicative of whether the selected pixel is likely a foreground or background pixel in a current image;
processing infrared (IR) pixel data of one or more infrared images to determine one or more IR contributing factors indicative of whether the selected pixel is likely a foreground or background pixel in the current image;
processing pixel depth data to determine one or more depth-based contributing factors indicative of whether the selected pixel is likely a foreground or background pixel in the current image;
(c) combining the contributing factors into a data term for the selected pixel;
(d) maintaining the data term for the selected pixel independent of other data terms for any other pixels;
(e) selecting a different pixel as the selected pixel; and
(f) returning to step (b) for a plurality of pixels to obtain a plurality of data terms.
18. The one or more machine-readable storage media or logic of claim 17 wherein processing the RGB pixel data of the one or more images to determine the one or more RGB contributing factors comprises performing at least one of: background subtraction based on a previous RGB background image and a current RGB image, or performing chroma keying based on known background data and a current RGB image.
19. The one or more machine-readable storage media or logic of claim 17 wherein processing the IR pixel data of the one or more images to determine the one or more IR contributing factors comprises performing background subtraction based on a previous IR background image and a current IR image.
20. The one or more machine-readable storage media or logic of claim 17 wherein processing the pixel depth data to determine the one or more depth-based contributing factors comprises performing at least one of: evaluating current pixel depth data against previous background pixel data, or evaluating current pixel depth data against threshold depth data.
US13/918,747 2013-04-15 2013-06-14 Multimodal Foreground Background Segmentation Abandoned US20140307056A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US13/918,747 US20140307056A1 (en) 2013-04-15 2013-06-14 Multimodal Foreground Background Segmentation
EP14726262.0A EP2987139A1 (en) 2013-04-15 2014-04-14 Multimodal foreground background segmentation
CN201480021522.8A CN105229697B (en) 2013-04-15 2014-04-14 Multi-modal prospect background segmentation
PCT/US2014/033914 WO2014172226A1 (en) 2013-04-15 2014-04-14 Multimodal foreground background segmentation
US16/214,027 US11546567B2 (en) 2013-04-15 2018-12-07 Multimodal foreground background segmentation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361812233P 2013-04-15 2013-04-15
US13/918,747 US20140307056A1 (en) 2013-04-15 2013-06-14 Multimodal Foreground Background Segmentation

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/214,027 Continuation US11546567B2 (en) 2013-04-15 2018-12-07 Multimodal foreground background segmentation

Publications (1)

Publication Number Publication Date
US20140307056A1 true US20140307056A1 (en) 2014-10-16

Family

ID=51686526

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/913,454 Active 2033-08-14 US9191643B2 (en) 2013-04-15 2013-06-09 Mixing infrared and color component data point clouds
US13/918,747 Abandoned US20140307056A1 (en) 2013-04-15 2013-06-14 Multimodal Foreground Background Segmentation
US16/214,027 Active US11546567B2 (en) 2013-04-15 2018-12-07 Multimodal foreground background segmentation

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/913,454 Active 2033-08-14 US9191643B2 (en) 2013-04-15 2013-06-09 Mixing infrared and color component data point clouds

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/214,027 Active US11546567B2 (en) 2013-04-15 2018-12-07 Multimodal foreground background segmentation

Country Status (11)

Country Link
US (3) US9191643B2 (en)
EP (2) EP2987140B1 (en)
JP (1) JP6562900B2 (en)
KR (1) KR102171231B1 (en)
CN (2) CN105706143B (en)
AU (1) AU2014254218B2 (en)
BR (1) BR112015025974B1 (en)
CA (1) CA2908689C (en)
MX (1) MX352449B (en)
RU (1) RU2660596C2 (en)
WO (2) WO2014172230A1 (en)

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150186341A1 (en) * 2013-12-26 2015-07-02 Joao Redol Automated unobtrusive scene sensitive information dynamic insertion into web-page image
US20150350608A1 (en) * 2014-05-30 2015-12-03 Placemeter Inc. System and method for activity monitoring using video data
US20160105636A1 (en) * 2013-08-19 2016-04-14 Huawei Technologies Co., Ltd. Image Processing Method and Device
EP3029633A1 (en) * 2014-12-02 2016-06-08 Honeywell International Inc. System and method of foreground extraction for digital cameras
US9414016B2 (en) * 2013-12-31 2016-08-09 Personify, Inc. System and methods for persona identification using combined probability maps
US20160314369A1 (en) * 2013-12-31 2016-10-27 Personify, Inc. Transmitting video and sharing content via a network
US9485433B2 (en) 2013-12-31 2016-11-01 Personify, Inc. Systems and methods for iterative adjustment of video-capture settings based on identified persona
US20170032531A1 (en) * 2013-12-27 2017-02-02 Sony Corporation Image processing device and image processing method
US9607397B2 (en) 2015-09-01 2017-03-28 Personify, Inc. Methods and systems for generating a user-hair-color model
US9628722B2 (en) 2010-03-30 2017-04-18 Personify, Inc. Systems and methods for embedding a foreground video into a background feed based on a control input
CN107169995A (en) * 2017-05-05 2017-09-15 武汉理工大学 A kind of adaptive moving target visible detection method
US9792676B2 (en) 2010-08-30 2017-10-17 The Board Of Trustees Of The University Of Illinois System for background subtraction with 3D camera
US20170372483A1 (en) * 2016-06-28 2017-12-28 Foresite Healthcare, Llc Systems and Methods for Use in Detecting Falls Utilizing Thermal Sensing
CN107636728A (en) * 2015-05-21 2018-01-26 皇家飞利浦有限公司 For the method and apparatus for the depth map for determining image
US9881207B1 (en) 2016-10-25 2018-01-30 Personify, Inc. Methods and systems for real-time user extraction using deep learning networks
US9883155B2 (en) 2016-06-14 2018-01-30 Personify, Inc. Methods and systems for combining foreground video and background video using chromatic matching
US9916668B2 (en) 2015-05-19 2018-03-13 Personify, Inc. Methods and systems for identifying background in video data using geometric primitives
US9953223B2 (en) 2015-05-19 2018-04-24 Personify, Inc. Methods and systems for assigning pixels distance-cost values using a flood fill technique
EP3351958A1 (en) * 2017-01-19 2018-07-25 Hitachi-LG Data Storage, Inc. Object position detection apparatus
US10043078B2 (en) * 2015-04-21 2018-08-07 Placemeter LLC Virtual turnstile system and method
EP3276951A4 (en) * 2015-03-26 2018-09-12 Sony Corporation Image processing system, image processing method, and program
US10270986B2 (en) 2017-09-22 2019-04-23 Feedback, LLC Near-infrared video compositing
US20190230342A1 (en) * 2016-06-03 2019-07-25 Utku Buyuksahin A system and a method for capturing and generating 3d image
US10380431B2 (en) 2015-06-01 2019-08-13 Placemeter LLC Systems and methods for processing video streams
US10560645B2 (en) 2017-09-22 2020-02-11 Feedback, LLC Immersive video environment using near-infrared video compositing
US10674096B2 (en) 2017-09-22 2020-06-02 Feedback, LLC Near-infrared video compositing
US10692192B2 (en) * 2014-10-21 2020-06-23 Connaught Electronics Ltd. Method for providing image data from a camera system, camera system and motor vehicle
US10803596B2 (en) * 2018-01-29 2020-10-13 HypeVR Fully automated alpha matting for virtual reality systems
US10878577B2 (en) * 2018-12-14 2020-12-29 Canon Kabushiki Kaisha Method, system and apparatus for segmenting an image of a scene
US10902282B2 (en) 2012-09-19 2021-01-26 Placemeter Inc. System and method for processing image data
US11004207B2 (en) * 2017-12-06 2021-05-11 Blueprint Reality Inc. Multi-modal data fusion for scene segmentation
CN112912896A (en) * 2018-12-14 2021-06-04 苹果公司 Machine learning assisted image prediction
CN113066115A (en) * 2021-04-28 2021-07-02 北京的卢深视科技有限公司 Deep prediction network training method, device, server and readable storage medium
US20210366096A1 (en) * 2020-05-22 2021-11-25 Robert Bosch Gmbh Hazard detection ensemble architecture system and method
US11257238B2 (en) * 2019-09-27 2022-02-22 Sigma Technologies, S.L. Unsupervised object sizing method for single camera viewing
US11334751B2 (en) 2015-04-21 2022-05-17 Placemeter Inc. Systems and methods for processing video data for activity monitoring
US20220172401A1 (en) * 2020-11-27 2022-06-02 Canon Kabushiki Kaisha Image processing apparatus, image generation method, and storage medium
US20220272245A1 (en) * 2021-02-24 2022-08-25 Logitech Europe S.A. Image generating system
US11481915B2 (en) * 2018-05-04 2022-10-25 Packsize Llc Systems and methods for three-dimensional data acquisition and processing under timing constraints
US11493931B2 (en) * 2019-05-14 2022-11-08 Lg Electronics Inc. Method of extracting feature from image using laser pattern and device and robot of extracting feature thereof
EP4055564A4 (en) * 2019-12-13 2023-01-11 Sony Group Corporation Multi-spectral volumetric capture
US11800056B2 (en) 2021-02-11 2023-10-24 Logitech Europe S.A. Smart webcam system
US11819344B2 (en) 2015-08-28 2023-11-21 Foresite Healthcare, Llc Systems for automatic assessment of fall risk
US11864926B2 (en) 2015-08-28 2024-01-09 Foresite Healthcare, Llc Systems and methods for detecting attempted bed exit

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140307055A1 (en) 2013-04-15 2014-10-16 Microsoft Corporation Intensity-modulated light pattern for active stereo
US9589359B2 (en) * 2014-04-24 2017-03-07 Intel Corporation Structured stereo
CN105469447A (en) * 2014-09-11 2016-04-06 富泰华工业(深圳)有限公司 Point-cloud boundary right-angle side repairing system and method
US9519061B2 (en) * 2014-12-26 2016-12-13 Here Global B.V. Geometric fingerprinting for localization of a device
JP6642970B2 (en) * 2015-03-05 2020-02-12 キヤノン株式会社 Attention area detection device, attention area detection method, and program
CN107851322B (en) * 2015-07-13 2022-04-19 皇家飞利浦有限公司 Method and apparatus for determining a depth map for an image
KR20180101496A (en) 2016-02-18 2018-09-12 애플 인크. Head-mounted display for virtual and mixed reality with inside-out location, user body and environment tracking
CN105827998A (en) * 2016-04-14 2016-08-03 广州市英途信息技术有限公司 Image matting system and image matting method
JP6754610B2 (en) * 2016-05-18 2020-09-16 株式会社デンソーアイティーラボラトリ Arithmetic processing unit, arithmetic processing method, and program
CN106296728B (en) * 2016-07-27 2019-05-14 昆明理工大学 A kind of Segmentation of Moving Object method in the unrestricted scene based on full convolutional network
CN106200249A (en) * 2016-08-30 2016-12-07 辽宁中蓝电子科技有限公司 Structure light and RGB sensor module monoblock type integrated system 3D camera
US10701244B2 (en) * 2016-09-30 2020-06-30 Microsoft Technology Licensing, Llc Recolorization of infrared image streams
WO2018129104A1 (en) * 2017-01-03 2018-07-12 Owlii Inc. Processing holographic videos
US10943100B2 (en) 2017-01-19 2021-03-09 Mindmaze Holding Sa Systems, methods, devices and apparatuses for detecting facial expression
CN110892408A (en) * 2017-02-07 2020-03-17 迈恩德玛泽控股股份有限公司 Systems, methods, and apparatus for stereo vision and tracking
WO2018191648A1 (en) 2017-04-14 2018-10-18 Yang Liu System and apparatus for co-registration and correlation between multi-modal imagery and method for same
CN108961316B (en) * 2017-05-23 2022-05-31 华为技术有限公司 Image processing method and device and server
US10586383B2 (en) 2017-06-20 2020-03-10 Microsoft Technology Licensing, Llc Three-dimensional object scan using data from infrared sensor
US10612912B1 (en) 2017-10-31 2020-04-07 Facebook Technologies, Llc Tileable structured light projection system
CN109889799B (en) * 2017-12-06 2020-08-25 西安交通大学 Monocular structure light depth perception method and device based on RGBIR camera
US10783668B2 (en) 2017-12-22 2020-09-22 Samsung Electronics Co., Ltd. Handling duplicate points in point cloud compression
JP2021510227A (en) * 2018-01-08 2021-04-15 フォーサイト オートモーティブ リミテッド Multispectral system for providing pre-collision alerts
US11328533B1 (en) 2018-01-09 2022-05-10 Mindmaze Holding Sa System, method and apparatus for detecting facial expression for motion capture
WO2019143688A1 (en) * 2018-01-19 2019-07-25 Pcms Holdings, Inc. Multi-focal planes with varying positions
CN108537814B (en) * 2018-03-14 2019-09-03 浙江大学 A kind of three-dimensional sonar point cloud chart based on ViBe is as dividing method
US10521926B1 (en) 2018-03-21 2019-12-31 Facebook Technologies, Llc Tileable non-planar structured light patterns for wide field-of-view depth sensing
CN108648225B (en) * 2018-03-31 2022-08-02 奥比中光科技集团股份有限公司 Target image acquisition system and method
US11330090B2 (en) 2018-04-10 2022-05-10 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Bracket, input/output assembly and terminal
CN108564613A (en) * 2018-04-12 2018-09-21 维沃移动通信有限公司 A kind of depth data acquisition methods and mobile terminal
JP7211835B2 (en) * 2019-02-04 2023-01-24 i-PRO株式会社 IMAGING SYSTEM AND SYNCHRONIZATION CONTROL METHOD
CN109949347B (en) * 2019-03-15 2021-09-17 百度在线网络技术(北京)有限公司 Human body tracking method, device, system, electronic equipment and storage medium
JP2020204856A (en) * 2019-06-17 2020-12-24 株式会社バンダイナムコアミューズメント Image generation system and program
US11076111B1 (en) * 2019-11-13 2021-07-27 Twitch Interactive, Inc. Smart color-based background replacement
JP7460282B2 (en) 2020-07-02 2024-04-02 アルプスアルパイン株式会社 Obstacle detection device, obstacle detection method, and obstacle detection program
US11763570B2 (en) * 2020-07-02 2023-09-19 Alps Alpine Co., Ltd. Obstacle detection device, obstacle detection method, and storage medium storing obstacle detection program
US11574484B1 (en) * 2021-01-13 2023-02-07 Ambarella International Lp High resolution infrared image generation using image data from an RGB-IR sensor and visible light interpolation
CN113470049B (en) * 2021-07-06 2022-05-20 吉林省田车科技有限公司 Complete target extraction method based on structured color point cloud segmentation
KR102516008B1 (en) 2022-09-21 2023-03-30 (주)이노시뮬레이션 Method of down sampling voxel using point cloud data and apparatus performing thereof

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040169873A1 (en) * 2003-02-28 2004-09-02 Xerox Corporation Automatic determination of custom parameters based on scanned image data
US20070031028A1 (en) * 2005-06-20 2007-02-08 Thomas Vetter Estimating 3d shape and texture of a 3d object based on a 2d image of the 3d object
US20070036432A1 (en) * 2003-11-12 2007-02-15 Li-Qun Xu Object detection in images
US7227893B1 (en) * 2002-08-22 2007-06-05 Xlabs Holdings, Llc Application-specific object-based segmentation and recognition system
US20090315978A1 (en) * 2006-06-02 2009-12-24 Eidgenossische Technische Hochschule Zurich Method and system for generating a 3d representation of a dynamically changing 3d scene
US7680314B2 (en) * 2005-10-17 2010-03-16 Siemens Medical Solutions Usa, Inc. Devices, systems, and methods for improving image consistency
US20100165112A1 (en) * 2006-03-28 2010-07-01 Objectvideo, Inc. Automatic extraction of secondary video streams
US20110175984A1 (en) * 2010-01-21 2011-07-21 Samsung Electronics Co., Ltd. Method and system of extracting the target object data on the basis of data concerning the color and depth
US20110282140A1 (en) * 2010-05-14 2011-11-17 Intuitive Surgical Operations, Inc. Method and system of hand segmentation and overlay using depth data
US20110293180A1 (en) * 2010-05-28 2011-12-01 Microsoft Corporation Foreground and Background Image Segmentation
US20120051631A1 (en) * 2010-08-30 2012-03-01 The Board Of Trustees Of The University Of Illinois System for background subtraction with 3d camera
US8520027B2 (en) * 2010-05-14 2013-08-27 Intuitive Surgical Operations, Inc. Method and system of see-through console overlay
US20130243313A1 (en) * 2010-10-01 2013-09-19 Telefonica, S.A. Method and system for images foreground segmentation in real-time
US20140029788A1 (en) * 2012-07-26 2014-01-30 Jinman Kang Detecting objects with a depth sensor
US20140294237A1 (en) * 2010-03-01 2014-10-02 Primesense Ltd. Combined color image and depth processing
US9117281B2 (en) * 2011-11-02 2015-08-25 Microsoft Corporation Surface segmentation from RGB and depth images

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812787A (en) 1995-06-30 1998-09-22 Intel Corporation Video coding scheme with foreground/background separation
US20030063185A1 (en) * 2001-09-28 2003-04-03 Bell Cynthia S. Three-dimensional imaging with complementary color filter arrays
GB0308943D0 (en) 2003-04-17 2003-05-28 Univ Dundee A system for determining the body pose of a person from images
US7606417B2 (en) 2004-08-16 2009-10-20 Fotonation Vision Limited Foreground/background segmentation in digital images with differential exposure calculations
US8330831B2 (en) 2003-08-05 2012-12-11 DigitalOptics Corporation Europe Limited Method of gathering visual meta data using a reference image
US7602942B2 (en) * 2004-11-12 2009-10-13 Honeywell International Inc. Infrared and visible fusion face recognition system
US7657126B2 (en) 2005-05-09 2010-02-02 Like.Com System and method for search portions of objects in images and features thereof
US7676081B2 (en) * 2005-06-17 2010-03-09 Microsoft Corporation Image segmentation of foreground from background layers
US7885463B2 (en) * 2006-03-30 2011-02-08 Microsoft Corp. Image segmentation using spatial-color Gaussian mixture models
KR100829581B1 (en) 2006-11-28 2008-05-14 삼성전자주식회사 Image processing method, medium and apparatus
US20080181507A1 (en) * 2007-01-29 2008-07-31 Intellivision Technologies Corp. Image manipulation for videos and still images
JP2009199284A (en) * 2008-02-21 2009-09-03 Univ Of Tokyo Road object recognition method
US8249349B2 (en) 2008-11-25 2012-08-21 Microsoft Corporation Labeling image elements
US8681216B2 (en) 2009-03-12 2014-03-25 Hewlett-Packard Development Company, L.P. Depth-sensing camera system
RU2421933C2 (en) * 2009-03-24 2011-06-20 Корпорация "САМСУНГ ЭЛЕКТРОНИКС Ко., Лтд." System and method to generate and reproduce 3d video image
US20100293179A1 (en) * 2009-05-14 2010-11-18 Microsoft Corporation Identifying synonyms of entities using web search
JP5322789B2 (en) * 2009-06-15 2013-10-23 三菱電機株式会社 Model generation apparatus, model generation method, model generation program, point cloud image generation method, and point cloud image generation program
US8537200B2 (en) 2009-10-23 2013-09-17 Qualcomm Incorporated Depth map generation techniques for conversion of 2D video data to 3D video data
US8355565B1 (en) 2009-10-29 2013-01-15 Hewlett-Packard Development Company, L.P. Producing high quality depth maps
US9008457B2 (en) * 2010-05-31 2015-04-14 Pesonify, Inc. Systems and methods for illumination correction of an image
CN101882314B (en) * 2010-07-20 2012-06-20 上海交通大学 Infrared small target detection method based on overcomplete sparse representation
US9247238B2 (en) 2011-01-31 2016-01-26 Microsoft Technology Licensing, Llc Reducing interference between multiple infra-red depth cameras
GB2490872B (en) 2011-05-09 2015-07-29 Toshiba Res Europ Ltd Methods and systems for capturing 3d surface geometry
US8823745B2 (en) * 2011-06-02 2014-09-02 Yoostar Entertainment Group, Inc. Image processing based on depth information and color data of a scene
US8824797B2 (en) * 2011-10-03 2014-09-02 Xerox Corporation Graph-based segmentation integrating visible and NIR information
US20130095920A1 (en) 2011-10-13 2013-04-18 Microsoft Corporation Generating free viewpoint video using stereo imaging
US9098908B2 (en) 2011-10-21 2015-08-04 Microsoft Technology Licensing, Llc Generating a depth map
US20130177296A1 (en) 2011-11-15 2013-07-11 Kevin A. Geisner Generating metadata for user experiences
US9171393B2 (en) 2011-12-07 2015-10-27 Microsoft Technology Licensing, Llc Three-dimensional texture reprojection
US9043186B2 (en) 2011-12-08 2015-05-26 Microsoft Technology Licensing, Llc Surface normal computation on noisy sample of points
US8971612B2 (en) 2011-12-15 2015-03-03 Microsoft Corporation Learning image processing tasks from scene reconstructions
US9846960B2 (en) 2012-05-31 2017-12-19 Microsoft Technology Licensing, Llc Automated camera array calibration
US20130321564A1 (en) 2012-05-31 2013-12-05 Microsoft Corporation Perspective-correct communication window with motion parallax
CN102799856A (en) * 2012-06-15 2012-11-28 天津大学 Human action recognition method based on two-channel infrared information fusion
CN102750533A (en) * 2012-07-05 2012-10-24 重庆大学 Infrared small and weak target detection method based on morphological component sparse representation
CN103279987B (en) 2013-06-18 2016-05-18 厦门理工学院 Object quick three-dimensional modeling method based on Kinect

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7227893B1 (en) * 2002-08-22 2007-06-05 Xlabs Holdings, Llc Application-specific object-based segmentation and recognition system
US20040169873A1 (en) * 2003-02-28 2004-09-02 Xerox Corporation Automatic determination of custom parameters based on scanned image data
US20070036432A1 (en) * 2003-11-12 2007-02-15 Li-Qun Xu Object detection in images
US20070031028A1 (en) * 2005-06-20 2007-02-08 Thomas Vetter Estimating 3d shape and texture of a 3d object based on a 2d image of the 3d object
US7680314B2 (en) * 2005-10-17 2010-03-16 Siemens Medical Solutions Usa, Inc. Devices, systems, and methods for improving image consistency
US20100165112A1 (en) * 2006-03-28 2010-07-01 Objectvideo, Inc. Automatic extraction of secondary video streams
US20090315978A1 (en) * 2006-06-02 2009-12-24 Eidgenossische Technische Hochschule Zurich Method and system for generating a 3d representation of a dynamically changing 3d scene
US20110175984A1 (en) * 2010-01-21 2011-07-21 Samsung Electronics Co., Ltd. Method and system of extracting the target object data on the basis of data concerning the color and depth
US20140294237A1 (en) * 2010-03-01 2014-10-02 Primesense Ltd. Combined color image and depth processing
US20110282140A1 (en) * 2010-05-14 2011-11-17 Intuitive Surgical Operations, Inc. Method and system of hand segmentation and overlay using depth data
US8520027B2 (en) * 2010-05-14 2013-08-27 Intuitive Surgical Operations, Inc. Method and system of see-through console overlay
US8625897B2 (en) * 2010-05-28 2014-01-07 Microsoft Corporation Foreground and background image segmentation
US20110293180A1 (en) * 2010-05-28 2011-12-01 Microsoft Corporation Foreground and Background Image Segmentation
US20120051631A1 (en) * 2010-08-30 2012-03-01 The Board Of Trustees Of The University Of Illinois System for background subtraction with 3d camera
US20130243313A1 (en) * 2010-10-01 2013-09-19 Telefonica, S.A. Method and system for images foreground segmentation in real-time
US9117281B2 (en) * 2011-11-02 2015-08-25 Microsoft Corporation Surface segmentation from RGB and depth images
US20140029788A1 (en) * 2012-07-26 2014-01-30 Jinman Kang Detecting objects with a depth sensor

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Harville et al.; "Foreground Segmentation Using Adaptive Mixture Models in Color and Depth"; Proceedings IEEE Workshop on Detection and Recognition of Eventsin Video, IEEE, US, 8 July 2001, pages 3-11. *
Harville et al.; "Foreground Segmentation Using Adaptive Mixture Models in Color and Depth"; Proceedings IEEE Workshop on Detection and Recognition of Events in Video, IEEE, US, 8 July 2001, pages 3-11. *
Harville et al: "Foreground Segmentation Using Adaptive Mixture Models in Color and Depth"; Proceedings IEEE Workshop on Detection and Recognition of Events in Video, IEEE, US, 8 July 2001, pages 3-11. *

Cited By (68)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9628722B2 (en) 2010-03-30 2017-04-18 Personify, Inc. Systems and methods for embedding a foreground video into a background feed based on a control input
US10325360B2 (en) 2010-08-30 2019-06-18 The Board Of Trustees Of The University Of Illinois System for background subtraction with 3D camera
US9792676B2 (en) 2010-08-30 2017-10-17 The Board Of Trustees Of The University Of Illinois System for background subtraction with 3D camera
US10902282B2 (en) 2012-09-19 2021-01-26 Placemeter Inc. System and method for processing image data
US20160105636A1 (en) * 2013-08-19 2016-04-14 Huawei Technologies Co., Ltd. Image Processing Method and Device
US9392218B2 (en) * 2013-08-19 2016-07-12 Huawei Technologies Co., Ltd. Image processing method and device
US20150186341A1 (en) * 2013-12-26 2015-07-02 Joao Redol Automated unobtrusive scene sensitive information dynamic insertion into web-page image
US20170032531A1 (en) * 2013-12-27 2017-02-02 Sony Corporation Image processing device and image processing method
US10469827B2 (en) * 2013-12-27 2019-11-05 Sony Corporation Image processing device and image processing method
US9942481B2 (en) 2013-12-31 2018-04-10 Personify, Inc. Systems and methods for iterative adjustment of video-capture settings based on identified persona
US20160350585A1 (en) * 2013-12-31 2016-12-01 Personify Inc. Systems and methods for persona identification using combined probability maps
US9485433B2 (en) 2013-12-31 2016-11-01 Personify, Inc. Systems and methods for iterative adjustment of video-capture settings based on identified persona
US9740916B2 (en) * 2013-12-31 2017-08-22 Personify Inc. Systems and methods for persona identification using combined probability maps
US10325172B2 (en) * 2013-12-31 2019-06-18 Personify, Inc. Transmitting video and sharing content via a network
US20160314369A1 (en) * 2013-12-31 2016-10-27 Personify, Inc. Transmitting video and sharing content via a network
US9414016B2 (en) * 2013-12-31 2016-08-09 Personify, Inc. System and methods for persona identification using combined probability maps
US20150350608A1 (en) * 2014-05-30 2015-12-03 Placemeter Inc. System and method for activity monitoring using video data
US10432896B2 (en) * 2014-05-30 2019-10-01 Placemeter Inc. System and method for activity monitoring using video data
US10735694B2 (en) 2014-05-30 2020-08-04 Placemeter Inc. System and method for activity monitoring using video data
US10880524B2 (en) 2014-05-30 2020-12-29 Placemeter Inc. System and method for activity monitoring using video data
US10692192B2 (en) * 2014-10-21 2020-06-23 Connaught Electronics Ltd. Method for providing image data from a camera system, camera system and motor vehicle
EP3029633A1 (en) * 2014-12-02 2016-06-08 Honeywell International Inc. System and method of foreground extraction for digital cameras
US10321100B2 (en) 2014-12-02 2019-06-11 Ademco Inc. System and method of foreground extraction for digital cameras
EP3276951A4 (en) * 2015-03-26 2018-09-12 Sony Corporation Image processing system, image processing method, and program
US10043078B2 (en) * 2015-04-21 2018-08-07 Placemeter LLC Virtual turnstile system and method
US11334751B2 (en) 2015-04-21 2022-05-17 Placemeter Inc. Systems and methods for processing video data for activity monitoring
US10726271B2 (en) 2015-04-21 2020-07-28 Placemeter, Inc. Virtual turnstile system and method
US9953223B2 (en) 2015-05-19 2018-04-24 Personify, Inc. Methods and systems for assigning pixels distance-cost values using a flood fill technique
US9916668B2 (en) 2015-05-19 2018-03-13 Personify, Inc. Methods and systems for identifying background in video data using geometric primitives
TWI712990B (en) * 2015-05-21 2020-12-11 荷蘭商皇家飛利浦有限公司 Method and apparatus for determining a depth map for an image, and non-transitory computer readable storage medium
CN107636728A (en) * 2015-05-21 2018-01-26 皇家飞利浦有限公司 For the method and apparatus for the depth map for determining image
US10580154B2 (en) * 2015-05-21 2020-03-03 Koninklijke Philips N.V. Method and apparatus for determining a depth map for an image
US10380431B2 (en) 2015-06-01 2019-08-13 Placemeter LLC Systems and methods for processing video streams
US10997428B2 (en) 2015-06-01 2021-05-04 Placemeter Inc. Automated detection of building entrances
US11138442B2 (en) 2015-06-01 2021-10-05 Placemeter, Inc. Robust, adaptive and efficient object detection, classification and tracking
US11864926B2 (en) 2015-08-28 2024-01-09 Foresite Healthcare, Llc Systems and methods for detecting attempted bed exit
US11819344B2 (en) 2015-08-28 2023-11-21 Foresite Healthcare, Llc Systems for automatic assessment of fall risk
US9607397B2 (en) 2015-09-01 2017-03-28 Personify, Inc. Methods and systems for generating a user-hair-color model
US11100335B2 (en) 2016-03-23 2021-08-24 Placemeter, Inc. Method for queue time estimation
US20190230342A1 (en) * 2016-06-03 2019-07-25 Utku Buyuksahin A system and a method for capturing and generating 3d image
US10917627B2 (en) * 2016-06-03 2021-02-09 Utku Buyuksahin System and a method for capturing and generating 3D image
US9883155B2 (en) 2016-06-14 2018-01-30 Personify, Inc. Methods and systems for combining foreground video and background video using chromatic matching
US10453202B2 (en) * 2016-06-28 2019-10-22 Foresite Healthcare, Llc Systems and methods for use in detecting falls utilizing thermal sensing
US11276181B2 (en) * 2016-06-28 2022-03-15 Foresite Healthcare, Llc Systems and methods for use in detecting falls utilizing thermal sensing
US20170372483A1 (en) * 2016-06-28 2017-12-28 Foresite Healthcare, Llc Systems and Methods for Use in Detecting Falls Utilizing Thermal Sensing
US9881207B1 (en) 2016-10-25 2018-01-30 Personify, Inc. Methods and systems for real-time user extraction using deep learning networks
EP3351958A1 (en) * 2017-01-19 2018-07-25 Hitachi-LG Data Storage, Inc. Object position detection apparatus
CN107169995A (en) * 2017-05-05 2017-09-15 武汉理工大学 A kind of adaptive moving target visible detection method
US10674096B2 (en) 2017-09-22 2020-06-02 Feedback, LLC Near-infrared video compositing
US10560645B2 (en) 2017-09-22 2020-02-11 Feedback, LLC Immersive video environment using near-infrared video compositing
US10270986B2 (en) 2017-09-22 2019-04-23 Feedback, LLC Near-infrared video compositing
US11004207B2 (en) * 2017-12-06 2021-05-11 Blueprint Reality Inc. Multi-modal data fusion for scene segmentation
US10803596B2 (en) * 2018-01-29 2020-10-13 HypeVR Fully automated alpha matting for virtual reality systems
US11481915B2 (en) * 2018-05-04 2022-10-25 Packsize Llc Systems and methods for three-dimensional data acquisition and processing under timing constraints
US11915460B2 (en) 2018-12-14 2024-02-27 Apple Inc. Machine learning assisted image prediction
CN112912896A (en) * 2018-12-14 2021-06-04 苹果公司 Machine learning assisted image prediction
US10878577B2 (en) * 2018-12-14 2020-12-29 Canon Kabushiki Kaisha Method, system and apparatus for segmenting an image of a scene
US11386355B2 (en) * 2018-12-14 2022-07-12 Apple Inc. Machine learning assisted image prediction
US11493931B2 (en) * 2019-05-14 2022-11-08 Lg Electronics Inc. Method of extracting feature from image using laser pattern and device and robot of extracting feature thereof
US11257238B2 (en) * 2019-09-27 2022-02-22 Sigma Technologies, S.L. Unsupervised object sizing method for single camera viewing
EP4055564A4 (en) * 2019-12-13 2023-01-11 Sony Group Corporation Multi-spectral volumetric capture
US20210366096A1 (en) * 2020-05-22 2021-11-25 Robert Bosch Gmbh Hazard detection ensemble architecture system and method
US20220172401A1 (en) * 2020-11-27 2022-06-02 Canon Kabushiki Kaisha Image processing apparatus, image generation method, and storage medium
US11800056B2 (en) 2021-02-11 2023-10-24 Logitech Europe S.A. Smart webcam system
US20220272245A1 (en) * 2021-02-24 2022-08-25 Logitech Europe S.A. Image generating system
US11659133B2 (en) 2021-02-24 2023-05-23 Logitech Europe S.A. Image generating system with background replacement or modification capabilities
US11800048B2 (en) * 2021-02-24 2023-10-24 Logitech Europe S.A. Image generating system with background replacement or modification capabilities
CN113066115A (en) * 2021-04-28 2021-07-02 北京的卢深视科技有限公司 Deep prediction network training method, device, server and readable storage medium

Also Published As

Publication number Publication date
JP6562900B2 (en) 2019-08-21
CN105706143B (en) 2019-02-22
BR112015025974B1 (en) 2022-01-25
KR102171231B1 (en) 2020-10-28
AU2014254218A1 (en) 2015-10-29
US20140307952A1 (en) 2014-10-16
MX352449B (en) 2017-11-24
AU2014254218A8 (en) 2016-01-14
WO2014172226A1 (en) 2014-10-23
EP2987139A1 (en) 2016-02-24
CN105706143A (en) 2016-06-22
BR112015025974A8 (en) 2020-01-14
CN105229697B (en) 2019-01-22
BR112015025974A2 (en) 2017-07-25
RU2660596C2 (en) 2018-07-06
EP2987140B1 (en) 2018-12-05
EP2987140A1 (en) 2016-02-24
RU2015143935A (en) 2017-04-19
AU2014254218B2 (en) 2017-05-25
MX2015014570A (en) 2016-06-29
CA2908689A1 (en) 2014-10-23
US9191643B2 (en) 2015-11-17
KR20150143751A (en) 2015-12-23
JP2016522923A (en) 2016-08-04
WO2014172230A1 (en) 2014-10-23
US20190379873A1 (en) 2019-12-12
CA2908689C (en) 2021-08-24
US11546567B2 (en) 2023-01-03
CN105229697A (en) 2016-01-06

Similar Documents

Publication Publication Date Title
US11546567B2 (en) Multimodal foreground background segmentation
US10867430B2 (en) Method and system of 3D reconstruction with volume-based filtering for image processing
US11115633B2 (en) Method and system for projector calibration
Sulami et al. Automatic recovery of the atmospheric light in hazy images
CN107209931B (en) Color correction apparatus and method
US20140192158A1 (en) Stereo Image Matching
US9509886B2 (en) Flicker removal for high speed video
US20110267348A1 (en) Systems and methods for generating a virtual camera viewpoint for an image
KR20190112894A (en) Method and apparatus for 3d rendering
TW201627950A (en) Method for optimizing occlusion in augmented reality based on depth camera
JP7152705B2 (en) Restoring Missing Legs of Human Objects from Image Sequences Based on Ground Detection
CN106327505B (en) Machine vision processing system, apparatus, method, and computer-readable storage medium
US20220172331A1 (en) Image inpainting with geometric and photometric transformations
KR20220117324A (en) Learning from various portraits
US10070111B2 (en) Local white balance under mixed illumination using flash photography
CN114697623A (en) Projection surface selection and projection image correction method and device, projector and medium
Li et al. Color constancy using achromatic surface
GB2543776A (en) Systems and methods for processing images of objects
US10055826B2 (en) Systems and methods for processing images of objects using coarse surface normal estimates
EP4090006A2 (en) Image signal processing based on virtual superimposition
WO2022036338A2 (en) System and methods for depth-aware video processing and depth perception enhancement
Gu et al. Shadow modelling based upon Rayleigh scattering and Mie theory
Bouzaraa et al. Dual-exposure image registration for HDR processing
Yao et al. Removing shadows from a single real-world color image
US9852352B2 (en) System and method for determining colors of foreground, and computer readable recording medium therefor

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COLLET ROMEA, ALVARO;ZHANG, BAO;KIRK, ADAM G.;SIGNING DATES FROM 20130610 TO 20130613;REEL/FRAME:030619/0073

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034747/0417

Effective date: 20141014

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:039025/0454

Effective date: 20141014

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION