WO2009143163A2 - Face relighting from a single image - Google Patents

Face relighting from a single image Download PDF

Info

Publication number
WO2009143163A2
WO2009143163A2 PCT/US2009/044533 US2009044533W WO2009143163A2 WO 2009143163 A2 WO2009143163 A2 WO 2009143163A2 US 2009044533 W US2009044533 W US 2009044533W WO 2009143163 A2 WO2009143163 A2 WO 2009143163A2
Authority
WO
WIPO (PCT)
Prior art keywords
face
images
novel
apparent
image
Prior art date
Application number
PCT/US2009/044533
Other languages
French (fr)
Other versions
WO2009143163A3 (en
Inventor
Baba C. Vemuri
Angelos Barmpoutis
Arunava Benerjee
Ritwik Kailash Kumar
Original Assignee
University Of Florida Research Foundation, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Florida Research Foundation, Inc. filed Critical University Of Florida Research Foundation, Inc.
Publication of WO2009143163A2 publication Critical patent/WO2009143163A2/en
Publication of WO2009143163A3 publication Critical patent/WO2009143163A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/60Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • the present invention relates to face recognition, facial relighting, and specifically to methods and techniques for synthesizing facial images under novel illumination conditions.
  • Extended Yale B and CMU PIE consist of a single or multiple 2-dimensional face images and therefore, it is less pragmatic, if not less accurate, to use 3-dimensional information as input to systems dealing with facial illumination problems. Furthermore, recent systems that do use 3D information directly (based on the morphable models), require manual intervention at various stages which is clearly undesirable. At the same time, techniques which require specially acquired 2D information or an exorbitant amount of 2D information are also not attractive. Hence a method which does not make these limiting assumptions and still produces good results is highly desirable.
  • the present invention provides a novel anti-symmetric higher-order Cartesian tensor spline based method for the estimation of the Apparent Bi-directional Reflectance Function (ABRDF) field for human faces that seamlessly accounts for specularities and cast shadows.
  • ABRDF Apparent Bi-directional Reflectance Function
  • Fig. 1 shows a plot of an ABRDF function according to varying methodologies
  • Fig. 2 shows images under novel illumination directions synthesized from the estimated ABRDF field;
  • Fig. 3 is depicts an estimation of facial features that arise from cast shadows and speculatiries according the present invention
  • Fig. 4 is a comparison of the present invention with two other known methods
  • FIG. 5 depicts the registration results of the present invention and provides a comparison to two known methods
  • Fig. 6 depicts an image under novel illumination directions synthesized from a single example image according to the method of present invention.
  • Figs. 7 and 8 depict the present invention as applied to several reference human faces; [019] Fig. 8 depicts the present invention as applied to several reference human faces;
  • Fig. 9 is a quantitative comparison of the method of the present invention according to a varying number of reference images
  • FIG. 10 is an illustration of the method of the present invention using a 3 r order antisymmetric tensor spline estimation
  • Fig. 11 is a plot of the lighting directions of images in the training sets
  • Fig. 12 is a comparison of the average intensity value errors of the Lambertian model and the method of present invention.
  • Fig. 13 depicts synthesized images under several different lighting directions for a randomly selected subject.
  • Fig. 14 depicts a comparison of the synthesized images using the Lambertian model and the method of the present invention
  • Fig. 15 depicts the approximated ABRDFs plotted as spherical functions in a region of interest that has specularities and shadows;
  • Fig. 16 is an intensity value error comparison of the method of the present invention and several known methods
  • the present invention is composed of two stages.
  • the first stage comprises learning the Apparent Bi-directional Reflectance Function field of a reference face using its nine images taken under different illumination conditions.
  • the ABRDF is a spherical function that gives the image intensity value at each pixel in each illumination direction. Three novel methods are set forth below for estimating this ABRDF field from nine or, if available, more images.
  • the second stage comprises transferring the ABRDF field from a reference face to a new target face using just one 2-dimensional image of the target face using a novel ABRDF transfer algorithm.
  • images of a novel face under a new illumination direction can be rendered by first transferring the ABRDF field and then sampling the field in the appropriate illumination direction.
  • the present invention provides a novel anti-symmetric higher-order Cartesian tensor spline based method for the estimation of the ABRDF field for human faces that seamlessly accounts for specularities and cast shadows.
  • a spherical function can be approximated by a nth-order Cartesian tensor, which can be expressed in the following form:
  • a tensor spline can be defined by combining the Cartesian tensor basis within a single pixel, as set forth above, with the well-known B-spline basis across the image lattice.
  • the degree of the spline is fixed at 3 (i.e. a cubic spline) for purposes of simplicity since this degree of continuity is commonly used literature.
  • a tensor spline is to be defined as a B-spline on multilinear functions of any order in general.
  • the multilinear functions which are anti-symmetric tensors in the present invention, are weighted by the B-spline basis N, u + / , where:
  • T y (V) is given by Eq. 1. It should be noted that in Eq. 4, there is a field of control tensors T y (V) instead of the control points used in a regular B-spline. Below, the bi-cubic tensor splines are employed for approximating the ABRDF field of a human face given a set of fixed-pose images under different known lighting directions.
  • n 1 in Eq. 1
  • CLn x , ⁇ xn y and 7o,o,i ⁇ « z .
  • the Lambertian model is anti-symmetric and has a single peak.
  • the fitting of the tensor spline to the given data can be done by minimizing the following energy:
  • uniform grid knots 1, 2, 3 ... can be used in both lattice coordinates. Accordingly, there are (M + T) x (M+ T) control tensors, where Mx Mis the lattice size of each given image. Under this configuration, in the case of 3rd-order anti-symmetric tensors, there are 10 unique coefficients for each control tensor. Therefore, the number of unknowns in Eq. 6 is equal to 10(M + 2) 2 and in the case of a 5th-order anti-symmetric tensor, the number of unknowns is 21 (M+ if.
  • the derivatives, dEldT hJ ⁇ m are analytically computed and thus, any gradient-based functional minimization method can be used.
  • a non-linear conjugate gradient with a randomly initialized control tensor coefficient field can be used.
  • images are synthesized under new lighting direction v by evaluating the apparent BRDF field in the direction v, whereby each apparent BRDF is given by Eq. 4.
  • the generated images can also be upsampled directly by evaluating Eq. 4 on a denser sampling grid since the tensor spline is a continuous function.
  • Our goal is to generate images of a face under various illumination conditions using a single example 2-dimensional image. This can be achieved by acquiring a reference ABRDF field once and then transferring it to new faces using their single images.
  • the ABRDF represents the response of the object at a point to light in each direction, in the presence of the rest of the scene, not merely the surface reflectivity.
  • cast shadows which are image artifacts manifested by the presence of scene objects obstructing the light from reaching otherwise visible scene regions, can be easily captured.
  • the surface spherical harmonic basis provides a natural orthonormal basis for functions defined on a sphere.
  • the spherical harmonic bases are defined for complex-valued functions but as the apparent BRDF is a real-valued function, the real-valued spherical harmonic bases are used to represent the apparent BRDF functions.
  • each pixel location has an associated ABRDF and across the whole face, we have a field of such ABRDFs.
  • a modulated spherical harmonics is used by combining spherical harmonic basis within a single pixel and B-splines basis across the field.
  • the B-spline basis, N ⁇ where:
  • N hk (f) is the spline basis of degree k + 1 with associated knots (Lk, tk+i, ... , t n+ i).
  • the bi-cubic spline is chosen because it is one of the most commonly used in literature and more importantly, it provides enough smoothness for the ABRDF field so that the discontinuities present in the field due to cast shadows are appropriately approximated as demonstrated in the results shown below.
  • the present invention employs a bi-cubic B-Spline modulated anti-symmetric spherical harmonic functions for this task.
  • the ABRDF field can be estimated by minimizing the following error function:
  • the first term in the summation is the representation of the ABRDF function using modulated antisymmetric spherical harmonic functions.
  • T is the set of odd natural numbers and Wj j i m are the unknown coefficients of the apparent BRDF function that is being sought.
  • the spline control grid is overlayed on data grid (pixels) and the inner summation on / andj is over the bi-cubic B-Spline basis domain.
  • This objective function is minimized using the non-linear conjugate gradient method initialized with a unit vector, for which the derivative of the error function with respect wy ⁇ m can be computed in the analytic form as,
  • any spherical function can be written as a continuous mixture of such functions. Accordingly, the apparent BRDF, a spherical function, can be modeled as a continuous mixture of functions S(v) as follows:
  • a N x 642 matrix A nj can be setup by evaluating Eq.17 for every value V n and ⁇ ,. Then, for each pixel, the unknown weights of Eq. 19 can be estimated by solving the overdetermined system:
  • the second part of the method of the present invention deals with transferring the ABRDF field from one face (reference) to another (target) and thus generating images under various novel illuminations for the target face using just one exemplar image.
  • the basic shapes of features are more or less the same on all faces and thus the optical artifacts, e.g. cast and attached shadows, created by these features are also similar on all faces. Accordingly, the nature of the ABRDFs on various faces is also similar and hence, one should be able to derive the ABRDF field of the target face using a given reference ABRDF field.
  • the non-rigid warping field between the reference and the target face images must be estimated.
  • the non-rigid warping field between the reference and the target face images can be formalized as the estimation of a non-rigid coordinate transformation T such that:
  • I re f and I target are the reference and target images respectively, x is the location on the image domain /.
  • an information theoretic match measure based registration technique should be used in order for the registration to be done across different faces with possibly different illuminations (e.g. Mutual Information (MI) and Cross-Cumulative Residual Entropy (CCRE) based registration described respectively in the following publications: 1) "Alignment by maximization of mutual information," P. Viola and I. William M. Wells, IJCV, 24(2): 137-154, 1997 and 2) "Non-rigid multi-modal image registration using cross-cumulative residual entropy," F.
  • MI Mutual Information
  • CCRE Cross-Cumulative Residual Entropy
  • FIG. 5 depicts the results produced by MI and CCRE for the purpose of visual comparison.
  • the first and second columns contain the reference image and the target image respectively.
  • the third and fourth columns contain deformed faces produced by CCRE and MI respectively.
  • the deformation field is used to warp the source image's apparent BRDF field coefficients to displace the apparent BRDFs into appropriate locations for the target image.
  • modulated spherical harmonic functions we can obtain a continuous representation of the coefficient field, which is written explicitly as:
  • w /m ( x ) are the unknown coefficients for the order / and degree m spherical harmonic basis at location x .
  • T the deformation field recovered by minimization of Eq. 21.
  • w/ m (x) the apparent BRDF field can be readily computed using the spherical harmonic basis.
  • This discrepancy can be fixed by using the following intensity mapping technique.
  • a separate transformation can be chosen for each pixel.
  • the intensity mapping quotient, Q(x), for each location x can be defined as:
  • the lighting directions cover the azimuth-elevation plane better; however there is no lighting direction of an extreme high angle.
  • the training set 'C samples the azimuth-elevation plane even better, including high-angle lighting directions along the elevation axis.
  • the ABRDF of 10 different subjects from the Extended Yale B dataset was computed under the lighting configurations described in FIG. 11, using: a) the anti-symmetric tensor spline model of the present invention of order 3 and b) the Lambertian version of our framework using l st -order tensors.
  • the training was performed using only 9 images per subject according to the method described above, 64 facial images per subject were synthesized by evaluating Eq. 4 for the 64 lighting directions provided in the Yale B database.
  • FIG. 13 presents the synthesized images under several different lighting directions for a randomly selected subject.
  • the images demonstrate that our proposed model approximated well the underlying ABRDF, producing realistic images.
  • the 9 input images used here are shown in FIG. 10.
  • FIG. 12 shows the average error in the intensity value embedded as the absolute distance between the intensity values of the synthesized images and the ground truth images in the database. Based on the reported errors, it can be concluded that the method of the present invention performs significantly better than the Lambertian model. Moreover, in all three training set configurations, the performance remained approximately the same, which conclusively demonstrates that the method of the present invention approximates well the underlying ABRDF regardless of the lighting directions of the 9 input images.
  • FIG. 14 examples of the synthesized images using the Lambertian model and the antisymmetric tensor spline method of the present invention are visually compared.
  • the first column shows the ground truth image from the extended Yale B dataset. Note that the ground truth images presented in FIG. 14 were not a part of the training set used for the synthesis of the images presented in the second and third columns of FIG. 14.
  • the 3 rd -order tensorial model can accommodate cast shadows and approximate well the specular components of the underlying ABRDFs. In contrast, specularity and shadows are missing from the images synthesized under the Lambertian model, which demonstrates the invalidity of the Lambertial assumption.
  • FIG. 15 shows the approximated ABRDFs plotted as spherical functions in a region of interest that has specularities and shadows.
  • the shapes of the plotted functions contain up to three lobes and show complexities that cannot be approximated under the Lambertian assumption.
  • the continuous mixture of single lobed functions was employed to approximate the underlying ABRDF by using all 64 given images as the training set.
  • This model although less efficient (since it requires a much larger training set of 64 images) than the anti-symmetric tensor spline method of the present invention (which uses only 9 images), can approximate spherical functions with a very complex structure characterized by a large number of lobes.
  • the 3 rd -order anti-symmetric tensor spline model can approximate functions whose shape complexity consists of at most three lobes.
  • FIG. 2A we present the novel images synthesized from the learnt ABRDF field using spline modulated spherical harmonics, which clearly demonstrate that photo-realistic images can be generated by our model. Note the sharpness of the cast shadows in the last row.
  • the presented technique is capable of both extrapolating and interpolating illumination directions from the sample images provided to it [see FIG 2B].
  • FIG. 3 we present the estimated ABRDF field overlayed on a face and in FIG. 3 (right), the method of the present invention can be seen to capture multiple bumps with varying sharpness to account for shadows and specularities.
  • the method of the present invention's ability to capture cast shadows and specularities in images is clearly demonstrated in FIG. 4.
  • FIG. 6 we present a set of images generated under novel illumination conditions of the target face [see 2 nd row and 2 nd column in FIG. 5] using just one image. It can be noted that the specularity of the nose tip and cast shadows have been captured to produce photo-realistic results.
  • FIG. 7 we present novel images of the same subject using three different reference faces. Discounting minor artifacts, it can be noted that these images are perceptually similar.
  • results demonstrate that our technique can produce competitive results even when used with a naive classifier like Nearest Neighbor.
  • a naive classifier like Nearest Neighbor To make the results comparable to the competing methods we used the 10 subjects from the Yale B face database. Results were averaged over 5 independent runs of the recognition algorithm.
  • the results pertaining to the other techniques are summarized in the publication, "Acquiring linear sub-spaces for face recognition under variable lighting " K. Lee, J. Ho, and D. J. Kriegman, PAMI, 27(5):684-698, 2005.
  • the results demonstrate that the method of the present invention can produce competitive results even when used with a nave classifier like the Nearest Neighbor classifier.
  • a second set of experiments demonstrate how the ABRDF transfer technique, which works with a single image, can be used to enhance various existing benchmark face recognition techniques [see TABLE 2]. For this, we make use of the fact that the performance of most recognition systems is improved when a better training set is present.
  • NN Nearest Neighbor
  • Eigenfaces and Fisherfaces where we assume that only a single near- frontal illumination image of each subject is available in the gallery set.
  • ABRDF+NN, ABRDF+Eigenfaces and ABRDF+Fisherfaces we use this single image to generate more images and then use all of them to train the classifiers. Experiments were earned out using 10 randomly selected subjects from the Extended Yale B Database. Results were computed using 3 different reference faces (other than the 10 selected subjects) over 5 independent runs each of the recognition algorithms and then averaged.
  • Table 2 Recognition revults of various, benchmark methods on the Extended Yale Face Database.

Abstract

Disclosed is a method for estimating the apparent bi-directional reflectance distribution function field of a human face using anti-symmetric tensor splines, comprising: defining the complex geometry and reflectance properties of the human face by a field of spherical functions; approximating the field of spherical functions by anti-symmetric higher-order Cartesian tensors within a single pixel; applying a B-spline basis as the weighting function of the anti-symmetric higher-order Cartesian tensors; fitting the tensor spline basis to a given set of two-dimensional facial images of a human subject with a fixed pose and associated lighting directions by minimizing the energy of an objective function with respect to the unknown tensor coefficients of said anti-symmetric higher-order Cartesian tensors; and analytically computing the derivatives of the objective function for its efficient minimization.

Description

FACE RELIGHTING FROM A SINGLE IMAGE
[001] Statement Regarding Federally Sponsored Research Or Development
[002] There is no federal government sponsorship associated with this invention.
[003] Technical Field
[004] The present invention relates to face recognition, facial relighting, and specifically to methods and techniques for synthesizing facial images under novel illumination conditions.
[005] Background Art
[006] Due to important applications like face recognition and facial relighting, synthesis of facial images under novel illumination conditions has attracted immense interest, particularly in the fields of computer vision and computer graphics. The challenge presented is the following: given only a few example images of a face, generate images of that face under novel illumination conditions. This challenge is particularly difficult when only one example image is available, which is the most common and realistic scenario in the very important application of face recognition. This special circumstance is a more difficult scenario than the typical graphics relighting problem, which generally does not have a limitation on the number of example images that can be considered. Solving this challenge is especially attractive because if multiple images under novel illumination can be generated from a single example image, the images can be used to enhance recognition performance of any learning based face recognition method.
[007] The literature is replete with various proposals to solve this challenge. However, each of these existing solutions works only under certain assumptions (e.g. the convex-Lambertian assumption) or requires specific kinds of data (e.g. 3D face scans) and/or manual intervention. Thus, it is important to compare these methods in the light of these assumptions and requirements and not just by the claimed results. The method of the present invention produces results which are better or comparable to those of the existing methods, even though it works under an extremely emaciated set of requirements. It is a completely automatic method which works with a single 2D image, does not require any 3D information, seamlessly handles cast shadows and specularities (i.e. does not make a convex-Lambertian assumption) and does not require any specially acquired information (i.e. works well with existing benchmark databases like Extended Yale B).
[008] The convex-Lambertian assumption is inaccurate as human faces are neither exactly Lambertian nor exactly convex. It is common to see cast shadows (e.g. in the peri-nasal region due to non-convexity) and specularities (e.g. oily forehead and nose tip due to non- Lambertianess) on facial images. Any method which fails to take into account these inaccuracies is clearly limited in its applicability. Furthermore, some of these methods end up using 3- dimensional information that is expensive to acquire and/or require undesirable manual intervention. Though the cost of acquiring 3 -dimensional geometry is decreasing, most of the existing benchmark face databases (e.g. Extended Yale B and CMU PIE) consist of a single or multiple 2-dimensional face images and therefore, it is less pragmatic, if not less accurate, to use 3-dimensional information as input to systems dealing with facial illumination problems. Furthermore, recent systems that do use 3D information directly (based on the morphable models), require manual intervention at various stages which is clearly undesirable. At the same time, techniques which require specially acquired 2D information or an exorbitant amount of 2D information are also not attractive. Hence a method which does not make these limiting assumptions and still produces good results is highly desirable.
[009] Disclosure of Invention
[010] The present invention provides a novel anti-symmetric higher-order Cartesian tensor spline based method for the estimation of the Apparent Bi-directional Reflectance Function (ABRDF) field for human faces that seamlessly accounts for specularities and cast shadows.
[011] Brief Description of Drawings
[012] Fig. 1 shows a plot of an ABRDF function according to varying methodologies
[013] Fig. 2 shows images under novel illumination directions synthesized from the estimated ABRDF field; [014] Fig. 3 is depicts an estimation of facial features that arise from cast shadows and speculatiries according the present invention
[015] Fig. 4 is a comparison of the present invention with two other known methods;
[016] Fig. 5 depicts the registration results of the present invention and provides a comparison to two known methods;
[017] Fig. 6 depicts an image under novel illumination directions synthesized from a single example image according to the method of present invention.
[018] Figs. 7 and 8 depict the present invention as applied to several reference human faces; [019] Fig. 8 depicts the present invention as applied to several reference human faces;
[020] Fig. 9 is a quantitative comparison of the method of the present invention according to a varying number of reference images;
[021] Fig. 10 is an illustration of the method of the present invention using a 3r order antisymmetric tensor spline estimation;
[022] Fig. 11 is a plot of the lighting directions of images in the training sets;
[023] Fig. 12 is a comparison of the average intensity value errors of the Lambertian model and the method of present invention;
[024] Fig. 13 depicts synthesized images under several different lighting directions for a randomly selected subject.;
[025] Fig. 14 depicts a comparison of the synthesized images using the Lambertian model and the method of the present invention; [026] Fig. 15 depicts the approximated ABRDFs plotted as spherical functions in a region of interest that has specularities and shadows;
[027] Fig. 16 is an intensity value error comparison of the method of the present invention and several known methods;
[028] Modes of Carrying Out the Invention
[029] The present invention is composed of two stages. The first stage comprises learning the Apparent Bi-directional Reflectance Function field of a reference face using its nine images taken under different illumination conditions. The ABRDF is a spherical function that gives the image intensity value at each pixel in each illumination direction. Three novel methods are set forth below for estimating this ABRDF field from nine or, if available, more images.
[030] The second stage comprises transferring the ABRDF field from a reference face to a new target face using just one 2-dimensional image of the target face using a novel ABRDF transfer algorithm. Hence, once the reference ABRDF field has been captured, images of a novel face under a new illumination direction can be rendered by first transferring the ABRDF field and then sampling the field in the appropriate illumination direction.
[031] Learning the ABRDF Field Using Tensor Splines
[032] The present invention provides a novel anti-symmetric higher-order Cartesian tensor spline based method for the estimation of the ABRDF field for human faces that seamlessly accounts for specularities and cast shadows.
[033] Spherical Functions Modeled as Tensors
[034] In general, a spherical function can be approximated by a nth-order Cartesian tensor, which can be expressed in the following form:
(1) where v = [vi V2 VΪY is a unit vector and Tkim are the real-valued tensor coefficients. It should be noted that the spherical functions modeled by Eq. 1 are symmetric (i.e. T(\) = T(-v)) for even orders, and anti-symmetric (i.e.
Figure imgf000007_0001
for odd orders. As a special case of Eq. 1, the 1st- order tensors take the form T(v) = T v, where T = [71,0,0 ?o,i,o 7o,o,i] and the 2nd-order tensors take the form T(\) = v 7Tv, where T is a 3 x 3 matrix. It should also be noted that in the case of 3rd-order tensors, there are 10 unique coefficients 7\/m in Eq. 1, while in the case of 5th-order anti-symmetric tensors, there are 21 unique coefficients Tk,ι,m- The ability of a Cartesian tensor to approximate the complex geometry of a spherical function with multiple lobes increases with its order. A 1 st-order tensor can only be used to approximate single-lobed anti-symmetric spherical functions. In order to approximate a function with more lobes, higher-order tensors are required. However, higher-order tensors can be perceived to be more sensitive to noise, simply by virtue of their ability to model high frequency detail. In contrast, the lower order tensors are incapable of modeling high frequency detail. Since it is impossible to discriminate between high frequency detail in the data and high frequency noise in the data, it is reasonable to say that the high order tensors possess higher noise sensitivity. Therefore, a balance between the accuracy in the approximation and the noise sensitivity must be found in determining the best suited tensor order.
[035] Tensor Splines
[036] A tensor spline can be defined by combining the Cartesian tensor basis within a single pixel, as set forth above, with the well-known B-spline basis across the image lattice. Preferably, the degree of the spline is fixed at 3 (i.e. a cubic spline) for purposes of simplicity since this degree of continuity is commonly used literature. A tensor spline is to be defined as a B-spline on multilinear functions of any order in general. In a tensor spline, the multilinear functions, which are anti-symmetric tensors in the present invention, are weighted by the B-spline basis N, u + /, where:
1 if /. i. / -- ri+ι
- - ( l ) otherwise and
.Y1 J1 I n = .Y, fc_i ( n- - ΛVH
Figure imgf000008_0001
where the iVj, * + /(/) functions are polynomials of degree k and are associated with n+k+2 monotonically increasing numbers called "knots" (Lk, Lk+i, ... , ?„+/)• By using the above equation, the bi-cubic (i.e. k = 3) tensor spline is given by: Sϊt. V) = 53 -V1-4(^, JAJ-4 I^JT1J (V I
: .J
(4)
where t = \tx ty ~\, v = [vi v2 v3]r is a unit vector, and Ty(V) is given by Eq. 1. It should be noted that in Eq. 4, there is a field of control tensors Ty(V) instead of the control points used in a regular B-spline. Below, the bi-cubic tensor splines are employed for approximating the ABRDF field of a human face given a set of fixed-pose images under different known lighting directions.
[037] Apparent BRDF Approximation By Tensor Splines [038] The BRDF of a Lambertian surface is given by:
D(v \ = o (o ■ v) ^
where v is the light source direction, n is the normal vector at a particular point of the surface and α is a constant. It is immediate that the Lambertian model is a lst-order tensor (i.e. n = 1 in Eq. 1) with CLnx,
Figure imgf000009_0001
<xny and 7o,o,i = α«z. As a lst-order tensor, the Lambertian model is anti-symmetric and has a single peak.
[039] Human faces however, are not exactly Lambertian since specularity can be observed in certain regions (e.g. nose and forehead). Moreover, the non-convex shapes on the face (e.g. lips and nose) can create cast shadows. The shadows and specularities of the human face are indicative of a multi-lobed apparent BRDF. Therefore, in these cases the ABRDF cannot be modeled successfully by a lst-order tensor and hence higher-order anti-symmetric tensors should be employed instead.
[040] As described above, the challenge is as follows: given a set of N face images of a given human subject with a fixed pose, In, n = 1 ... N with associated lighting directions Vn, one wants to estimate the ABRDF field of the face using a bi-cubic spline tensor spline. The fitting of the tensor spline to the given data can be done by minimizing the following energy:
E
Figure imgf000010_0001
where /*, ^ run through the lattice of the given images. The minimization of Eq. 6 is done with respect to the unknown tensor coefficients T,jχι,m that correspond to the control tensor TtJ (vπ).
[041] For example, uniform grid knots 1, 2, 3 ... can be used in both lattice coordinates. Accordingly, there are (M + T) x (M+ T) control tensors, where Mx Mis the lattice size of each given image. Under this configuration, in the case of 3rd-order anti-symmetric tensors, there are 10 unique coefficients for each control tensor. Therefore, the number of unknowns in Eq. 6 is equal to 10(M + 2)2 and in the case of a 5th-order anti-symmetric tensor, the number of unknowns is 21 (M+ if.
[042] From Eq. 6, the derivatives, dEldThJχιm, are analytically computed and thus, any gradient-based functional minimization method can be used. For example, a non-linear conjugate gradient with a randomly initialized control tensor coefficient field can be used. After having estimated the tensor field, images are synthesized under new lighting direction v by evaluating the apparent BRDF field in the direction v, whereby each apparent BRDF is given by Eq. 4. The generated images can also be upsampled directly by evaluating Eq. 4 on a denser sampling grid since the tensor spline is a continuous function.
[043] Learning the ABRDF Field Using Spline Modulated Spherical Harmonics
[044] Our goal is to generate images of a face under various illumination conditions using a single example 2-dimensional image. This can be achieved by acquiring a reference ABRDF field once and then transferring it to new faces using their single images. The ABRDF represents the response of the object at a point to light in each direction, in the presence of the rest of the scene, not merely the surface reflectivity. Hence, by acquiring the ABRDF field of an object, cast shadows, which are image artifacts manifested by the presence of scene objects obstructing the light from reaching otherwise visible scene regions, can be easily captured. Note that since we want to analyze the effects of the illumination direction change, we would assume the ABRDF to be a function of just the illumination direction by fixing the viewing direction, though sometimes it is denned to be a function of both the illumination and viewing directions. Below, the first part of this process, i.e. the reference ABRDF field estimation, is described using novel bi-cubic B-Spline modulated anti-symmetric spherical harmonics.
[045] Surface Spherical Harmonic Basis
[046] The surface spherical harmonic basis, the analog to the Fourier basis for Cartesian signals, provides a natural orthonormal basis for functions defined on a sphere. In general, the spherical harmonic bases are defined for complex-valued functions but as the apparent BRDF is a real-valued function, the real-valued spherical harmonic bases are used to represent the apparent BRDF functions. The spherical harmonics basis function, ψm/ (order = /, degree = m), with / = 0, 1, 2, ... and -/ < m ≤ I, is defined as follows:
Figure imgf000011_0001
where P^ are the associated Legendre functions and Φm(θ, φ) is defined as:
Figure imgf000011_0002
[047] Note that even orders of the spherical harmonics basis functions are antipodally symmetric while odd orders are anti-symmetric. Perceptually speaking, given a limited number of data samples, the ABRDFs are best approximated using only antipodally anti-symmetric components of the spherical harmonic bases. To recognize this there are two crucial questions that must be examined. First, whether using just even or odd ordered bases drastically limit the approximation of the ABRDF and second, whether symmetric or anti-symmetric bases are more suitable.
[048] With respect to the first question, even though the ABRDF is a function defined on a sphere, for the purposes of the present invention, the interested only lies in its behavior on the frontal hemisphere. Hence, if the function's behavior at very extreme angles (0° and 180°) is ignored, once the ABRDF has been modeled accurately on the frontal hemisphere, the rear hemisphere can be filled in appropriately to make the function either symmetric or antisymmetric. To visualize this, polar plots in the first column of FIG. 1 show a typical ABRDF function defined on a semicircle. The second column shows the same function being approximated by an antipodally symmetric (row 1) and an antipodally anti-symmetric (row 2) function. By not using both types of components, not much is lost in the approximation power. For visualization, the problem has been scaled down to 2-dimensional and the blue circle represents the zero value in these polar plots. A more important reason that keeps us from using the complete set of bases is that for a fixed number of given example images, using just symmetric or anti-symmetric components allows us to go to higher orders which are necessary to approximate discontinuities like cast shadows and specularities in the image.
[049] With respect to the second question, one must observe the function's behavior at the extreme angles (0° and 180°). In reality, most facial ABRDF functions have a positive value near one of the extreme angles (as they face the light source) and a very small (« 0) value near the other extreme angle (as they go into attached shadows). Hence, the function in column 1 of FIG. 1 is very close to physical ABRDFs. Clearly, the function's behavior at 0° and 180° is neither antipodally symmetric nor anti-symmetric and hence, using just one of the two would lead to errors in approximation at these extreme angles. The error caused by symmetric approximation is perceptually very noticeable as it gives the function a positive value where it should be 0 [see FIG. 1, last column, first row and the regions marked by arrows in FIG. 1, last row as they are unnaturally bright] while the error caused by anti-symmetric approximation is not perceptually noticeable as it gives the function a negative value where it should be 0, which can be easily set to 0 as it is known that ABRDF is never negative [see FIG. 1, last column, second row and FIG. 1, last row, last two images]. Non-negativity is achieved similarly in the Lambertian model using the max function. Errors at the non-zero end of the function are not perceptually noticeable, as can be seen from the last row of FIG. 1.
[050] Bi-cubic B-spline Modulated Spherical Harmonic
[051] For a fixed pose, each pixel location has an associated ABRDF and across the whole face, we have a field of such ABRDFs. To model such a field of spherical functions (S x R — > R), a modulated spherical harmonics is used by combining spherical harmonic basis within a single pixel and B-splines basis across the field. The B-spline basis, N^, where:
f 1 if*. 1 t C iϊ+ i * u \ fl otherwise.
and
-V|.fc(O = X,.k-L {t) . t U , + -Yl+l,*-l i t) J+^
' H-fc- 1 ' H-* (10)
acts as a weight to the spherical harmonic basis. Here, Nhk(f) is the spline basis of degree k + 1 with associated knots (Lk, tk+i, ... , tn+i). Hence, the expression for the modulated spherical harmonics is given by:
Φ,m( 0. O, X, I, J ) = J - -- ^-— Λ ,' .V,.4( J-I iO. Ofiπtft. Φ )
Figure imgf000013_0001
with Φm(θ, φ) and Pi\m\ as defined before, 3c = (xi, xi) are the spline control points, / andy are the basis indices. The bi-cubic spline is chosen because it is one of the most commonly used in literature and more importantly, it provides enough smoothness for the ABRDF field so that the discontinuities present in the field due to cast shadows are appropriately approximated as demonstrated in the results shown below.
[052] There are three distinct advantages of using this novel bi-cubic B-spline modulated spherical harmonics for ABRDF field estimation. First, the built-in smoothness provides a degree of robustness against noise which is very common when dealing with image data. Second, it allows for using neighborhood information while estimating the ABRDF at each pixel location. Finally, it provides a continuous representation of the spherical harmonic coefficient field, which will be exploited during the ABRDF transfer that is defined further below. [053] ABRDF Field Estimation
[054] If the ABRDF field is available for a face, images of the face under novel illumination directions can be rendered by simply sampling the ABRDF at each location in the appropriate directions. But in a realistic setting, only a few images of a face (sample of the ABRDF field) are given. Hence, the problem at hand is of ABRDF field estimation from these few samples. Motivated by the reasoning outlined above, the present invention employs a bi-cubic B-Spline modulated anti-symmetric spherical harmonic functions for this task.
[055] Using Sx(Q, φ), the given data samples (intensity values) in (θ, φ) direction at location x , the ABRDF field can be estimated by minimizing the following error function:
Figure imgf000014_0001
where the first term in the summation is the representation of the ABRDF function using modulated antisymmetric spherical harmonic functions. T is the set of odd natural numbers and Wjjim are the unknown coefficients of the apparent BRDF function that is being sought. Here, the spline control grid is overlayed on data grid (pixels) and the inner summation on / andj is over the bi-cubic B-Spline basis domain. This objective function is minimized using the non-linear conjugate gradient method initialized with a unit vector, for which the derivative of the error function with respect wyιm can be computed in the analytic form as,
Figure imgf000014_0002
[056] Both of odd orders 3 and 5 are able to yield sufficiently good synthesis results, with order 3 performing slightly better than order 5. Therefore, order 3 is preferred. This is because the order 5 approximation over-fits the data. In an order 3 (value of 1) modulated anti-symmetric spherical harmonic approximation, values of the unknown coefficients can be recovered with just 9 images under different illumination conditions. Estimation is better if the given 9 images somewhat uniformly sample the illumination directions and improves if more images are present. As ABRDF is a positive function, any negative values produced by the model are set to 0 (as also done by the max function in the Lambertian model).
[057] Learning ABRDF Field Using Continuous Mixture of Single Lobed Functions
[058] The method described above can be quantitatively compared to a more general model that, in theory, can approximate spherical functions using a continuous mixture of single-lobed spherical functions. There are various spherical functions with a single lobe that can be used in a continuous mixture. For this application, it is desirable to choose a function that leads to an analytic solution, such as the following:
Six-) = , -« >- _ ! (14) where u and v are unit vectors. Eq. 14 has the following desirable properties: 1) it has a single peak, and 2) S(v) = 0 for all v such that v • u = 0 (because if the viewing and illumination directions are perpendicular, we expect zero intensity). These properties are also valid for the Lambertian model.
[059] Given the single-lobed function in Eq. 14, any spherical function can be written as a continuous mixture of such functions. Accordingly, the apparent BRDF, a spherical function, can be modeled as a continuous mixture of functions S(v) as follows:
Figure imgf000015_0001
(15) where the integration is over the set of all unit vectors u (i.e. the unit sphere) and βu) is a distribution on orientations. The von Mises-Fisher distribution is chosen as the mixing density as it is the analog of the Gaussian distribution on S2- The von Mises-Fisher distribution is given by:
Figure imgf000015_0002
where μ is a unit vector defining the orientation and K is a scalar governing the concentration of the distribution. The important observation is made that by substituting Eq. 16 into Eq. 15, an integral is derived that is the Laplace transform of the von Mises-Fisher distribution, which is analytically computed to be:
Figure imgf000016_0001
However, the single von Mises-Fisher distribution model cannot approximate angular distributions with several peaks, such as the human face apparent BRDF fields. Therefore, a finite mixture of von Mises-Fisher distributions can be used, which leads to the following alternate definition of Eq. 15:
Figure imgf000016_0002
(18) where w, are the mixture weights.
[060] In order to use this mixture of von Mises-Fisher distributions to obtain an expression for the apparent BRDF, a dense sampling of 642 directions of the unit sphere obtained by the 4th- order tessellation of the icosahedron can be used. The result is the following expression:
Figure imgf000016_0003
Although βu) has the form of a discrete mixture, the approximating function B(\) is still a continuous mixture of single-lobed functions expressed by Eq. 15.
[061] Given a set of N facial images of a given human subject with a fixed pose, In, n = I ... N, associated with lighting directions Vn, a N x 642 matrix Anj can be setup by evaluating Eq.17 for every value Vn and μ,. Then, for each pixel, the unknown weights of Eq. 19 can be estimated by solving the overdetermined system:
AW = B (20) where B is a N-dimensional vector that consists of the intensities of a fixed pixel in the N given images, and W is the vector of the unknown weights. This system can be solved efficiently to obtain a sparse solution by the non-negative least square minimization algorithm. This general model just described is used as a benchmark for evaluating quantitatively the ability of the antisymmetric tensor spline model of the present invention in approximating the apparent BRDF of human faces.
[062] ABRDF Transfer Algorithm
[063] The second part of the method of the present invention deals with transferring the ABRDF field from one face (reference) to another (target) and thus generating images under various novel illuminations for the target face using just one exemplar image. The basic shapes of features are more or less the same on all faces and thus the optical artifacts, e.g. cast and attached shadows, created by these features are also similar on all faces. Accordingly, the nature of the ABRDFs on various faces is also similar and hence, one should be able to derive the ABRDF field of the target face using a given reference ABRDF field.
[064] First, the non-rigid warping field between the reference and the target face images must be estimated. The non-rigid warping field between the reference and the target face images can be formalized as the estimation of a non-rigid coordinate transformation T such that:
^ Mi ϊref (Ti X)). ftargeti?)}
*6J (21) is minimized. Mis a general matching criterion which depends on the registration technique. Iref and Itarget are the reference and target images respectively, x is the location on the image domain /. Preferably, an information theoretic match measure based registration technique should be used in order for the registration to be done across different faces with possibly different illuminations (e.g. Mutual Information (MI) and Cross-Cumulative Residual Entropy (CCRE) based registration described respectively in the following publications: 1) "Alignment by maximization of mutual information," P. Viola and I. William M. Wells, IJCV, 24(2): 137-154, 1997 and 2) "Non-rigid multi-modal image registration using cross-cumulative residual entropy," F. Wang and B. C. Vemuri, IJCV, 74(2):201-205, 2007)). The CCRE registration technique works with cumulative distributions rather than probability densities and hence, is more robust to noise. Therefore, CCRE is able to produce better results when applied to faces. FIG. 5 depicts the results produced by MI and CCRE for the purpose of visual comparison. The first and second columns contain the reference image and the target image respectively. The third and fourth columns contain deformed faces produced by CCRE and MI respectively.
[065] Once the deformation field has been recovered, it is used to warp the source image's apparent BRDF field coefficients to displace the apparent BRDFs into appropriate locations for the target image. As described above, by using modulated spherical harmonic functions, we can obtain a continuous representation of the coefficient field, which is written explicitly as:
'1'IvJS) = ^] λ\ IJ J 1 J Λ~Λ4 ( So ) U\βm υ (22)
As defined above, w/m( x ) are the unknown coefficients for the order / and degree m spherical harmonic basis at location x . The apparent BRDF field coefficients for the target image w /„,( x ) can be computed using Eq. 22 as w/m(x ) = w/m(T(x)), where T is the deformation field recovered by minimization of Eq. 21. Using w/m(x), the apparent BRDF field can be readily computed using the spherical harmonic basis. As can be noted from FIG. 5, though the locations of the apparent BRDF have been changed to match the target face image, they are still the source (reference) image's apparent BRDF and thus, the images obtained by sampling them appear like the source (reference) image as can be seen in columns three and four of FIG. 5.
[066] This discrepancy can be fixed by using the following intensity mapping technique. A separate transformation can be chosen for each pixel. Based upon the geometric transformation between the reference and the target image, the intensity mapping quotient, Q(x), for each location x can be defined as:
Qf .Fi = Itar grt l ?)ilrffi 7"( Pn
[067] Because the images are known to be noisy and the division operation accentuates that noise, a Gaussian kernel Gσ can be used to smooth the image intensity mapping quotient field. As a result, the intensity value at location x of an image of the target face under novel illumination direction (θ, φ) can be computed as: l
ieτ m=-t (24)
where the argument (θ, <p, x) indicates that the apparent BRDF at location x is being queried in direction (θ, φ) and 3(θ, φ) can be from any of the 3 above mentioned methods.
[068] The intensity mapping quotient (Eq. 23) is not the same as the Quotient image proposed by Riklin-Raviv and Shashua as they make explicit Lambertian assumption and define their quotient image to be ratio of the albedos which is clearly not the case here.
[069] Experimental Results
[070] All the experiments in this section used the Extended Yale B database, which has 64 different images per subject under known illumination directions. In order to test the sensitivity of our anti-symmetric tensor spline model to the selection of the train set, we constructed three different training sets, each consisting of 9 facial pictures per subject taken under different lighting directions. The lighting directions of the 9 images for the three selected training sets (A, B and C) are shown in FIG. 11. In FIG. 11, the lighting direction of each image is presented as a point in the azimuth-elevation plane. The training set 'A' shows a case where the 9 lighting directions do not span the azimuth-elevation plane in a symmetric and uniform manner, and therefore the input dataset does not represent well the underlying ABRDF. In the training set 'B', the lighting directions cover the azimuth-elevation plane better; however there is no lighting direction of an extreme high angle. Finally, the training set 'C samples the azimuth-elevation plane even better, including high-angle lighting directions along the elevation axis.
[071] To see the impact of different training sets on the approximated ABRDF field, the ABRDF of 10 different subjects from the Extended Yale B dataset was computed under the lighting configurations described in FIG. 11, using: a) the anti-symmetric tensor spline model of the present invention of order 3 and b) the Lambertian version of our framework using lst-order tensors. After the training was performed using only 9 images per subject according to the method described above, 64 facial images per subject were synthesized by evaluating Eq. 4 for the 64 lighting directions provided in the Yale B database.
[072] FIG. 13 presents the synthesized images under several different lighting directions for a randomly selected subject. The images demonstrate that our proposed model approximated well the underlying ABRDF, producing realistic images. The 9 input images used here are shown in FIG. 10.
[073] FIG. 12 shows the average error in the intensity value denned as the absolute distance between the intensity values of the synthesized images and the ground truth images in the database. Based on the reported errors, it can be concluded that the method of the present invention performs significantly better than the Lambertian model. Moreover, in all three training set configurations, the performance remained approximately the same, which conclusively demonstrates that the method of the present invention approximates well the underlying ABRDF regardless of the lighting directions of the 9 input images.
[074] In FIG. 14, examples of the synthesized images using the Lambertian model and the antisymmetric tensor spline method of the present invention are visually compared. The first column shows the ground truth image from the extended Yale B dataset. Note that the ground truth images presented in FIG. 14 were not a part of the training set used for the synthesis of the images presented in the second and third columns of FIG. 14. By visual comparison, one can conclude that the 3rd-order tensorial model can accommodate cast shadows and approximate well the specular components of the underlying ABRDFs. In contrast, specularity and shadows are missing from the images synthesized under the Lambertian model, which demonstrates the invalidity of the Lambertial assumption.
[075] FIG. 15 shows the approximated ABRDFs plotted as spherical functions in a region of interest that has specularities and shadows. The shapes of the plotted functions contain up to three lobes and show complexities that cannot be approximated under the Lambertian assumption.
[076] Next, the continuous mixture of single lobed functions was employed to approximate the underlying ABRDF by using all 64 given images as the training set. This model, although less efficient (since it requires a much larger training set of 64 images) than the anti-symmetric tensor spline method of the present invention (which uses only 9 images), can approximate spherical functions with a very complex structure characterized by a large number of lobes. In contrast, the 3rd-order anti-symmetric tensor spline model can approximate functions whose shape complexity consists of at most three lobes. By comparing the performance of the continuous mixture of exponential functions with that of the anti-symmetric tensor spline, both presented in FIG. 16, one can conclude that they yield similar intensity values. This quantitatively demonstrates that in spite of the limitations of the 3rd-order anti-symmetric tensor spline model, we can still capture and approximate the shape of the underlying facial ABRDFs.
[077] In FIG. 2A, we present the novel images synthesized from the learnt ABRDF field using spline modulated spherical harmonics, which clearly demonstrate that photo-realistic images can be generated by our model. Note the sharpness of the cast shadows in the last row. The presented technique is capable of both extrapolating and interpolating illumination directions from the sample images provided to it [see FIG 2B]. In FIG. 3 (left), we present the estimated ABRDF field overlayed on a face and in FIG. 3 (right), the method of the present invention can be seen to capture multiple bumps with varying sharpness to account for shadows and specularities. The method of the present invention's ability to capture cast shadows and specularities in images is clearly demonstrated in FIG. 4.
[078] In FIG. 6, we present a set of images generated under novel illumination conditions of the target face [see 2nd row and 2nd column in FIG. 5] using just one image. It can be noted that the specularity of the nose tip and cast shadows have been captured to produce photo-realistic results. Next, in FIG. 7, we present novel images of the same subject using three different reference faces. Discounting minor artifacts, it can be noted that these images are perceptually similar.
[079] In the next set of experiments, we demonstrate the robustness and versatility of the method of the present invention. First, we demonstrate that we can produce good results even when parts of the face in the target image are occluded [see FIG. 8]. This is accomplished by setting the intensity mapping quotient to unity and performing a histogram equalization in the occluded regions. The results show that our framework can handle larger occlusion than what was demonstrated recently by Wang et al. Second, even though we do not use any 3- dimensional information, the technique of the present invention is capable of generating photorealistic images of faces in poses different from that of the reference face under novel illumination directions. At this stage, our framework can handle poses that differ up to 12°. In FIG. 8, we look at the quantitative error introduced by our method as a function of the number of images used for the ABRDF field estimation. We compare the synthesized novel images to the ground truth images present in the Extended Yale B database. We observe that the quantitative error increases with the harshness of illumination direction, which we attribute to the lack of accurate texture information for extreme illumination directions.
[080] Finally, we present two sets of results for the application of the proposed techniques to face recognition. First, using a simple Nearest Neighbor classifier, we compare the results of our ABRDF estimation techniques using 9 sample images with those of existing techniques which use multiple (from 4 to 9) images [see TABLE I]. For this experiment, we assume that 9 gallery images with known illumination directions per person are available (from subset 1 and 2). We estimate the ABRDF field of each face using the techniques (Tensor Spline and Spline Modulated Spherical Harmonics) described, generate a number of images under novel illumination directions (defined on a grid) and then use all of them in our Nearest Neighbor classifier as gallery images. The results demonstrate that our technique can produce competitive results even when used with a naive classifier like Nearest Neighbor. To make the results comparable to the competing methods we used the 10 subjects from the Yale B face database. Results were averaged over 5 independent runs of the recognition algorithm. The results pertaining to the other techniques are summarized in the publication, "Acquiring linear sub-spaces for face recognition under variable lighting " K. Lee, J. Ho, and D. J. Kriegman, PAMI, 27(5):684-698, 2005. The results demonstrate that the method of the present invention can produce competitive results even when used with a nave classifier like the Nearest Neighbor classifier.
Figure imgf000023_0001
Table I : Recognition results on Yale B Face Database of various existing tech- niques..
[081] A second set of experiments demonstrate how the ABRDF transfer technique, which works with a single image, can be used to enhance various existing benchmark face recognition techniques [see TABLE 2]. For this, we make use of the fact that the performance of most recognition systems is improved when a better training set is present. We present results for Nearest Neighbor (NN), Eigenfaces and Fisherfaces, where we assume that only a single near- frontal illumination image of each subject is available in the gallery set. For ABRDF+NN, ABRDF+Eigenfaces and ABRDF+Fisherfaces, we use this single image to generate more images and then use all of them to train the classifiers. Experiments were earned out using 10 randomly selected subjects from the Extended Yale B Database. Results were computed using 3 different reference faces (other than the 10 selected subjects) over 5 independent runs each of the recognition algorithms and then averaged.
Figure imgf000024_0001
Table 2: Recognition revults of various, benchmark methods on the Extended Yale Face Database.
[082] Attached as Exhibits to the instant application, and incorporated by reference hereto are the following articles authored by the inventors herein:
- Beyond the Lambertian Assumption: A generative model for Apparent BRFD fields of Faces using Anti-Symmetric Tensor Spline
From one to many: A generative model for face image synthesis under varying illumination
[083] Accordingly, it will be understood that embodiments of the present invention has been disclosed by way of example and that other modifications and alterations may occur to those skilled in the art without departing from the scope and spirit of the above description or appended claims.

Claims

CLAIMSWe claim:
1. A method for estimating the apparent bi-directional reflectance distribution function field of a human face using anti-symmetric tensor splines, comprising:
defining the complex geometry and reflectance properties of the human face by a field of spherical functions;
approximating the field of spherical functions by anti-symmetric higher-order Cartesian tensors within a single pixel;
applying a B-spline basis as the weighting function of the anti-symmetric higher- order Cartesian tensors;
fitting the tensor spline basis to a given set of two-dimensional facial images of a human subject with a fixed pose and associated lighting directions by minimizing the energy of an objective function with respect to the unknown tensor coefficients of said antisymmetric higher-order Cartesian tensors; and
analytically computing the derivatives of the objective function for its efficient minimization.
2. The method of claim 1 wherein the order of the anti-symmetric higher-order Cartesian tensor approximation is of any order that can be used depending on the amount of data available.
3. The method of claim 1 wherein the method for minimizing the energy of the objective function is any gradient-based functional minimization method.
4. The method of claim 1, further comprising synthesizing images under new lighting directions by evaluating the estimated apparent bi-directional reflectance distribution function field in the new lighting directions.
5. A method for estimating the apparent bi-directional reflectance distribution function field of a human face using spline modulated spherical harmonics, comprising:
defining the complex geometry and reflectance properties of the human face by a field of spherical functions;
approximating the field of spherical functions by spline modulated spherical harmonics wherein the spherical harmonics are modulated with B-spline functions;
fitting the spline modulated spherical harmonics to a given set of two-dimensional facial images of a human subject with a fixed pose and associated lighting directions by minimizing the energy of an objective function with respect to the unknown coefficients of the spline modulated spherical harmonics basis; and
analytically computing the derivatives of the objective function for its efficient minimization.
6. The method of claim 5 wherein the order of the spline modulated spherical harmonics approximation is of any order that can be used depending on the amount of data available.
7. The method of claim 5 wherein the method for minimizing the energy of the objective function is any gradient-based functional minimization method.
8. The method of claim 5, further comprising synthesizing images under new lighting directions by evaluating the estimated apparent bi-directional reflectance distribution function field in the new lighting directions.
9. A method for estimating the apparent bi-directional reflectance distribution function field of a human face using a continuous mixture of single lobed functions, comprising:
defining the complex geometry and reflectance properties of the human face by a field of spherical functions; approximating the field of spherical functions by a continuous mixture of single lobed functions;
using a finite mixture von Mises-Fisher distribution as the mixing density in the continuous mixture;
fitting a field of continuous mixtures to a given set of two-dimensional facial images of a human subject with a fixed pose and associated lighting directions by minimizing the energy of an objective function with respect to the unknown weights of the continuous mixture.
10. The method of claim 9 wherein the method for determining the mixture weights is any non- negative least squares solving method.
11. The method of claim 9, further comprising synthesizing images under new lighting directions by evaluating the estimated apparent bi-directional reflectance distribution function field in the new lighting directions.
12. The method of claim 1, further comprising a method of face relighting, comprised of:
estimating the apparent bi-directional reflectance distribution function field of a face from nine or more example images;
generating a novel image of the face under one or more novel illumination conditions by sampling the apparent bi-directional reflectance distribution function field in the novel direction; and
generating one or more face images under complicated illuminations by computing a weighted sum of the images obtained with illumination in the individual point source directions.
13. The method of claim 12, further comprising a method of face recognition, comprised of: estimating the apparent bi-directional reflectance distribution function field of the known faces;
generating novel images of those faces under novel illumination conditions; and
using the generated novel images to train any learning based face classification technique.
14. The method of claim 13 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.
15. The method of claim 5, further comprising a method of face relighting, comprised of:
estimating the apparent bi-directional reflectance distribution function field of a face using nine or more example images;
generating a novel image of the face under one or more novel illumination conditions by sampling the apparent bi-directional reflectance distribution function field in the novel direction; and
generating one or more face images under complicated illuminations by computing a weighted sum of the images obtained with illumination in the individual point source directions.
16. The method of claim 15, further comprising a method of face recognition, comprised of:
estimating the apparent bi-directional reflectance distribution function field of the known faces;
generating novel images of those faces under novel illumination conditions; and
using the generated novel images to train any learning based face classification technique.
17. The method of claim 16 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.
18. The method of claim 9, further comprising a method of face relighting, comprised of:
estimating the apparent bi-directional reflectance distribution function field of a face using nine or more example images;
generating a novel image of the face under one or more novel illumination conditions by sampling the apparent bi-directional reflectance distribution function field in the novel direction; and
generating face images under complicated illuminations by computing a weighted sum of the images obtained with illumination in the individual point source directions.
19. The method of claim 18, further comprising a method of face recognition, comprised of:
estimating the apparent bi-directional reflectance distribution function field of the known faces;
generating novel images of those faces under novel illumination conditions; and
using the generated novel images to train any learning based face classification technique.
20. The method of claim 19 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.
21. The method of claim 1, further comprising a method for generating novel images of a target human face under various novel illumination conditions using a single two-dimensional image of the target face, comprised of: obtaining the apparent bi-directional reflectance distribution function of a reference face;
estimating a non-rigid coordinate transformation between the reference face image and the target face image using an image registration technique;
displacing the apparent bi-directional reflectance distribution functions of the reference image into appropriate locations for the target image using the deformation field obtained from the non-rigid registration; and
applying an intensity mapping quotient to the apparent bi-directional reflectance distribution functions at each location of the target image.
22. The method of claim 21 wherein the registration technique uses an information theoretic distance measure like Mutual Information or Cross-Cumulative Residual Entropy.
23. The method of claim 21, further comprising a method of face relighting, comprised of:
estimating the apparent bi-directional reflectance distribution function field of a face using a single two-dimensional example image;
generating a novel image of the face under one or more novel illumination conditions by sampling the apparent bi-directional reflectance distribution function field in novel directions; and
generating face images under complicated illuminations by computing a weighted sum of the images obtained with illumination in the individual point source directions.
24. The method of claim 23, further comprising a method of face recognition, comprised of:
estimating the apparent bi-directional reflectance distribution function field of the known faces from the single two-dimensional image;
generating novel images of those faces under novel illumination conditions; and using the generated novel images to train any learning based face classification technique.
25. The method of claim 24 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.
26. The method of claim 5, further comprising a method for generating novel images of a target human face under various novel illumination conditions using a single two-dimensional image of the target face, comprised of:
obtaining the apparent bi-directional reflectance distribution function of a reference face;
estimating a non-rigid coordinate transformation between the reference face image and the target face image using an image registration technique;
displacing the apparent bi-directional reflectance distribution functions of the reference image into appropriate locations for the target image using the deformation field obtained from the non-rigid registration; and
applying an intensity mapping quotient to the apparent bi-directional reflectance distribution functions at each location of the target image.
27. The method of claim 26 wherein the registration technique uses an information theoretic distance measure like Mutual Information or Cross-Cumulative Residual Entropy.
28. The method of claim 26, further comprising a method of face relighting, comprised of:
estimating the apparent bi-directional reflectance distribution function field of a face using a single two-dimensional example image; generating a novel image of the face under one or more novel illumination conditions by sampling the apparent bi-directional reflectance distribution function field in novel directions; and
generating face images under complicated illuminations by computing a weighted sum of the images obtained with illumination in the individual point source directions.
29. The method of claim 28, further comprising a method of face recognition, comprised of:
estimating the apparent bi-directional reflectance distribution function field of the known faces from the single two-dimensional image;
generating novel images of those faces under novel illumination conditions; and
using the generated novel images to train any learning based face classification technique.
30. The method of claim 29 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.
31. The method of claim 9, further comprising a method for generating novel images of a target human face under various novel illumination conditions using a single two-dimensional image of the target face, comprised of:
obtaining the apparent bi-directional reflectance distribution function of a reference face;
estimating a non-rigid coordinate transformation between the reference face image and the target face image using an image registration technique;
displacing the apparent bi-directional reflectance distribution functions of the reference image into appropriate locations for the target image using the deformation field obtained from the non-rigid registration; and applying an intensity mapping quotient to the apparent bi-directional reflectance distribution functions at each location of the target image.
32. The method of claim 31 wherein the registration technique uses an information theoretic distance measure like Mutual Information or Cross-Cumulative Residual Entropy.
33. The method of claim 31, further comprising a method of face relighting, comprised of:
estimating the apparent bi-directional reflectance distribution function field of a face using a single two-dimensional example image;
generating a novel image of the face under one or more novel illumination conditions by sampling the apparent bi-directional reflectance distribution function field in novel directions; and
generating face images under complicated illuminations by computing a weighted sum of the images obtained with illumination in the individual point source directions.
34. The method of claim 33, further comprising a method of face recognition, comprised of:
estimating the apparent bi-directional reflectance distribution function field of the known faces from the single two-dimensional image;
generating novel images of those faces under novel illumination conditions; and
using the generated novel images to train any learning based face classification technique.
35. The method of claim 34 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.
PCT/US2009/044533 2008-05-21 2009-05-19 Face relighting from a single image WO2009143163A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US5500208P 2008-05-21 2008-05-21
US61/055,002 2008-05-21

Publications (2)

Publication Number Publication Date
WO2009143163A2 true WO2009143163A2 (en) 2009-11-26
WO2009143163A3 WO2009143163A3 (en) 2012-04-26

Family

ID=41340824

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/044533 WO2009143163A2 (en) 2008-05-21 2009-05-19 Face relighting from a single image

Country Status (1)

Country Link
WO (1) WO2009143163A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163330A (en) * 2011-04-02 2011-08-24 西安电子科技大学 Multi-view face synthesis method based on tensor resolution and Delaunay triangulation
CN105447906A (en) * 2015-11-12 2016-03-30 浙江大学 Method for calculating lighting parameters and carrying out relighting rendering based on image and model
CN105447829A (en) * 2015-11-25 2016-03-30 小米科技有限责任公司 Image processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060280342A1 (en) * 2005-06-14 2006-12-14 Jinho Lee Method and system for generating bi-linear models for faces
US20070014435A1 (en) * 2005-07-13 2007-01-18 Schlumberger Technology Corporation Computer-based generation and validation of training images for multipoint geostatistical analysis
US20070031028A1 (en) * 2005-06-20 2007-02-08 Thomas Vetter Estimating 3d shape and texture of a 3d object based on a 2d image of the 3d object
US7215802B2 (en) * 2004-03-04 2007-05-08 The Cleveland Clinic Foundation System and method for vascular border detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7215802B2 (en) * 2004-03-04 2007-05-08 The Cleveland Clinic Foundation System and method for vascular border detection
US20060280342A1 (en) * 2005-06-14 2006-12-14 Jinho Lee Method and system for generating bi-linear models for faces
US20070031028A1 (en) * 2005-06-20 2007-02-08 Thomas Vetter Estimating 3d shape and texture of a 3d object based on a 2d image of the 3d object
US20070014435A1 (en) * 2005-07-13 2007-01-18 Schlumberger Technology Corporation Computer-based generation and validation of training images for multipoint geostatistical analysis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163330A (en) * 2011-04-02 2011-08-24 西安电子科技大学 Multi-view face synthesis method based on tensor resolution and Delaunay triangulation
CN105447906A (en) * 2015-11-12 2016-03-30 浙江大学 Method for calculating lighting parameters and carrying out relighting rendering based on image and model
CN105447829A (en) * 2015-11-25 2016-03-30 小米科技有限责任公司 Image processing method and device
CN105447829B (en) * 2015-11-25 2018-06-08 小米科技有限责任公司 Image processing method and device

Also Published As

Publication number Publication date
WO2009143163A3 (en) 2012-04-26

Similar Documents

Publication Publication Date Title
Wang et al. Face relighting from a single image under arbitrary unknown lighting conditions
Yamaguchi et al. High-fidelity facial reflectance and geometry inference from an unconstrained image
JP5136965B2 (en) Image processing apparatus, image processing method, and image processing program
Wang et al. Face re-lighting from a single image under harsh lighting conditions
JP3818369B2 (en) Method for selecting an image from a plurality of three-dimensional models most similar to an input image
Zhang et al. Recognizing rotated faces from frontal and side views: An approach toward effective use of mugshot databases
Zeng et al. Examplar coherent 3D face reconstruction from forensic mugshot database
Muhammad et al. Spec-Net and Spec-CGAN: Deep learning models for specularity removal from faces
Savva et al. Geometry-based vs. intensity-based medical image registration: A comparative study on 3D CT data
WO2011162352A1 (en) Three-dimensional data generating apparatus, three-dimensional data generating method, and three-dimensional data generating program
Tu et al. Automatic face recognition from skeletal remains
Ferková et al. Age and gender-based human face reconstruction from single frontal image
WO2009143163A2 (en) Face relighting from a single image
Bannister et al. A deep invertible 3-D facial shape model for interpretable genetic syndrome diagnosis
Dai et al. 3D morphable models: The face, ear and head
Fooprateepsiri et al. A general framework for face reconstruction using single still image based on 2D-to-3D transformation kernel
Barmpoutis et al. Beyond the lambertian assumption: A generative model for apparent brdf fields of faces using anti-symmetric tensor splines
Ma et al. A lighting robust fitting approach of 3D morphable model for face reconstruction
Maghari et al. Adaptive face modelling for reconstructing 3D face shapes from single 2D images
Colaianni et al. A pose invariant statistical shape model for human bodies
Kakadiaris et al. Face recognition using 3D images
Aldrian et al. Inverse rendering in suv space with a linear texture model
Mazumdar et al. Forgery detection in digital images through lighting environment inconsistencies
Letenkov et al. Method for Generating Synthetic Images of Masked Human Faces
Basri et al. Illumination modeling for face recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09751391

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09751391

Country of ref document: EP

Kind code of ref document: A1