WO2009143163A2

WO2009143163A2 - Face relighting from a single image

Info

Publication number: WO2009143163A2
Application number: PCT/US2009/044533
Authority: WO
Inventors: Baba C. Vemuri; Angelos Barmpoutis; Arunava Benerjee; Ritwik Kailash Kumar
Original assignee: University Of Florida Research Foundation, Inc.
Priority date: 2008-05-21
Filing date: 2009-05-19
Publication date: 2009-11-26
Also published as: WO2009143163A3

Abstract

Disclosed is a method for estimating the apparent bi-directional reflectance distribution function field of a human face using anti-symmetric tensor splines, comprising: defining the complex geometry and reflectance properties of the human face by a field of spherical functions; approximating the field of spherical functions by anti-symmetric higher-order Cartesian tensors within a single pixel; applying a B-spline basis as the weighting function of the anti-symmetric higher-order Cartesian tensors; fitting the tensor spline basis to a given set of two-dimensional facial images of a human subject with a fixed pose and associated lighting directions by minimizing the energy of an objective function with respect to the unknown tensor coefficients of said anti-symmetric higher-order Cartesian tensors; and analytically computing the derivatives of the objective function for its efficient minimization.

Description

FACE RELIGHTING FROM A SINGLE IMAGE

[001] Statement Regarding Federally Sponsored Research Or Development

[002] There is no federal government sponsorship associated with this invention.

[003] Technical Field

[004] The present invention relates to face recognition, facial relighting, and specifically to methods and techniques for synthesizing facial images under novel illumination conditions.

[005] Background Art

[006] Due to important applications like face recognition and facial relighting, synthesis of facial images under novel illumination conditions has attracted immense interest, particularly in the fields of computer vision and computer graphics. The challenge presented is the following: given only a few example images of a face, generate images of that face under novel illumination conditions. This challenge is particularly difficult when only one example image is available, which is the most common and realistic scenario in the very important application of face recognition. This special circumstance is a more difficult scenario than the typical graphics relighting problem, which generally does not have a limitation on the number of example images that can be considered. Solving this challenge is especially attractive because if multiple images under novel illumination can be generated from a single example image, the images can be used to enhance recognition performance of any learning based face recognition method.

[007] The literature is replete with various proposals to solve this challenge. However, each of these existing solutions works only under certain assumptions (e.g. the convex-Lambertian assumption) or requires specific kinds of data (e.g. 3D face scans) and/or manual intervention. Thus, it is important to compare these methods in the light of these assumptions and requirements and not just by the claimed results. The method of the present invention produces results which are better or comparable to those of the existing methods, even though it works under an extremely emaciated set of requirements. It is a completely automatic method which works with a single 2D image, does not require any 3D information, seamlessly handles cast shadows and specularities (i.e. does not make a convex-Lambertian assumption) and does not require any specially acquired information (i.e. works well with existing benchmark databases like Extended Yale B).

[008] The convex-Lambertian assumption is inaccurate as human faces are neither exactly Lambertian nor exactly convex. It is common to see cast shadows (e.g. in the peri-nasal region due to non-convexity) and specularities (e.g. oily forehead and nose tip due to non- Lambertianess) on facial images. Any method which fails to take into account these inaccuracies is clearly limited in its applicability. Furthermore, some of these methods end up using 3- dimensional information that is expensive to acquire and/or require undesirable manual intervention. Though the cost of acquiring 3 -dimensional geometry is decreasing, most of the existing benchmark face databases (e.g. Extended Yale B and CMU PIE) consist of a single or multiple 2-dimensional face images and therefore, it is less pragmatic, if not less accurate, to use 3-dimensional information as input to systems dealing with facial illumination problems. Furthermore, recent systems that do use 3D information directly (based on the morphable models), require manual intervention at various stages which is clearly undesirable. At the same time, techniques which require specially acquired 2D information or an exorbitant amount of 2D information are also not attractive. Hence a method which does not make these limiting assumptions and still produces good results is highly desirable.

[009] Disclosure of Invention

[010] The present invention provides a novel anti-symmetric higher-order Cartesian tensor spline based method for the estimation of the Apparent Bi-directional Reflectance Function (ABRDF) field for human faces that seamlessly accounts for specularities and cast shadows.

[011] Brief Description of Drawings

[012] Fig. 1 shows a plot of an ABRDF function according to varying methodologies

[013] Fig. 2 shows images under novel illumination directions synthesized from the estimated ABRDF field; [014] Fig. 3 is depicts an estimation of facial features that arise from cast shadows and speculatiries according the present invention

[015] Fig. 4 is a comparison of the present invention with two other known methods;

[016] Fig. 5 depicts the registration results of the present invention and provides a comparison to two known methods;

[017] Fig. 6 depicts an image under novel illumination directions synthesized from a single example image according to the method of present invention.

[018] Figs. 7 and 8 depict the present invention as applied to several reference human faces; [019] Fig. 8 depicts the present invention as applied to several reference human faces;

[020] Fig. 9 is a quantitative comparison of the method of the present invention according to a varying number of reference images;

[021] Fig. 10 is an illustration of the method of the present invention using a 3^r order antisymmetric tensor spline estimation;

[022] Fig. 11 is a plot of the lighting directions of images in the training sets;

[023] Fig. 12 is a comparison of the average intensity value errors of the Lambertian model and the method of present invention;

[024] Fig. 13 depicts synthesized images under several different lighting directions for a randomly selected subject.;

[025] Fig. 14 depicts a comparison of the synthesized images using the Lambertian model and the method of the present invention; [026] Fig. 15 depicts the approximated ABRDFs plotted as spherical functions in a region of interest that has specularities and shadows;

[027] Fig. 16 is an intensity value error comparison of the method of the present invention and several known methods;

[028] Modes of Carrying Out the Invention

[029] The present invention is composed of two stages. The first stage comprises learning the Apparent Bi-directional Reflectance Function field of a reference face using its nine images taken under different illumination conditions. The ABRDF is a spherical function that gives the image intensity value at each pixel in each illumination direction. Three novel methods are set forth below for estimating this ABRDF field from nine or, if available, more images.

[030] The second stage comprises transferring the ABRDF field from a reference face to a new target face using just one 2-dimensional image of the target face using a novel ABRDF transfer algorithm. Hence, once the reference ABRDF field has been captured, images of a novel face under a new illumination direction can be rendered by first transferring the ABRDF field and then sampling the field in the appropriate illumination direction.

[031] Learning the ABRDF Field Using Tensor Splines

[032] The present invention provides a novel anti-symmetric higher-order Cartesian tensor spline based method for the estimation of the ABRDF field for human faces that seamlessly accounts for specularities and cast shadows.

[033] Spherical Functions Modeled as Tensors

[034] In general, a spherical function can be approximated by a nth-order Cartesian tensor, which can be expressed in the following form:

(1) where v = [vi V₂ V_ΪY is a unit vector and T_ki_m are the real-valued tensor coefficients. It should be noted that the spherical functions modeled by Eq. 1 are symmetric (i.e. T(\) = T(-v)) for even orders, and anti-symmetric (i.e.

for odd orders. As a special case of Eq. 1, the 1^st- order tensors take the form T(v) = T ^• v, where T = [71,0,0 ?o,i,o 7o,o,i] and the 2^nd-order tensors take the form T(\) = v ⁷Tv, where T is a 3 x 3 matrix. It should also be noted that in the case of 3rd-order tensors, there are 10 unique coefficients 7\_/m in Eq. 1, while in the case of 5th-order anti-symmetric tensors, there are 21 unique coefficients T_k,ι,_m- The ability of a Cartesian tensor to approximate the complex geometry of a spherical function with multiple lobes increases with its order. A 1 st-order tensor can only be used to approximate single-lobed anti-symmetric spherical functions. In order to approximate a function with more lobes, higher-order tensors are required. However, higher-order tensors can be perceived to be more sensitive to noise, simply by virtue of their ability to model high frequency detail. In contrast, the lower order tensors are incapable of modeling high frequency detail. Since it is impossible to discriminate between high frequency detail in the data and high frequency noise in the data, it is reasonable to say that the high order tensors possess higher noise sensitivity. Therefore, a balance between the accuracy in the approximation and the noise sensitivity must be found in determining the best suited tensor order.

[035] Tensor Splines

[036] A tensor spline can be defined by combining the Cartesian tensor basis within a single pixel, as set forth above, with the well-known B-spline basis across the image lattice. Preferably, the degree of the spline is fixed at 3 (i.e. a cubic spline) for purposes of simplicity since this degree of continuity is commonly used literature. A tensor spline is to be defined as a B-spline on multilinear functions of any order in general. In a tensor spline, the multilinear functions, which are anti-symmetric tensors in the present invention, are weighted by the B-spline basis N, u _{+ /}, where:

1 if /. i. / -- r_i+ι

- - ( l ⁾ otherwise and

.Y_{1 J1} I n = .Y, _fc_i ( n- - ΛVH

where the iVj, * ₊ /(/) functions are polynomials of degree k and are associated with n+k+2 monotonically increasing numbers called "knots" (L_k, L_k+i, ... , ?„₊/)• By using the above equation, the bi-cubic (i.e. k = 3) tensor spline is given by: Sϊt. V) = 53 -V_1-4(^, JA_J-4 I^JT_1J (V I

: .J

(4)

where t = \t_x t_y ^~\, v = [vi v₂ v₃]^r is a unit vector, and T_y(V) is given by Eq. 1. It should be noted that in Eq. 4, there is a field of control tensors T_y(V) instead of the control points used in a regular B-spline. Below, the bi-cubic tensor splines are employed for approximating the ABRDF field of a human face given a set of fixed-pose images under different known lighting directions.

[037] Apparent BRDF Approximation By Tensor Splines [038] The BRDF of a Lambertian surface is given by:

D(v \ = o (o ■ v) ^

where v is the light source direction, n is the normal vector at a particular point of the surface and α is a constant. It is immediate that the Lambertian model is a lst-order tensor (i.e. n = 1 in Eq. 1) with CLn_x,

<xn_y and 7o,o,i = α«_z. As a lst-order tensor, the Lambertian model is anti-symmetric and has a single peak.

[039] Human faces however, are not exactly Lambertian since specularity can be observed in certain regions (e.g. nose and forehead). Moreover, the non-convex shapes on the face (e.g. lips and nose) can create cast shadows. The shadows and specularities of the human face are indicative of a multi-lobed apparent BRDF. Therefore, in these cases the ABRDF cannot be modeled successfully by a l^st-order tensor and hence higher-order anti-symmetric tensors should be employed instead.

[040] As described above, the challenge is as follows: given a set of N face images of a given human subject with a fixed pose, I_n, n = 1 ... N with associated lighting directions V_n, one wants to estimate the ABRDF field of the face using a bi-cubic spline tensor spline. The fitting of the tensor spline to the given data can be done by minimizing the following energy:

E

where /_*, ^ run through the lattice of the given images. The minimization of Eq. 6 is done with respect to the unknown tensor coefficients T,_jχι,_m that correspond to the control tensor T_tJ (v_π).

[041] For example, uniform grid knots 1, 2, 3 ... can be used in both lattice coordinates. Accordingly, there are (M + T) x (M+ T) control tensors, where Mx Mis the lattice size of each given image. Under this configuration, in the case of 3rd-order anti-symmetric tensors, there are 10 unique coefficients for each control tensor. Therefore, the number of unknowns in Eq. 6 is equal to 10(M + 2)² and in the case of a 5th-order anti-symmetric tensor, the number of unknowns is 21 (M+ if.

[042] From Eq. 6, the derivatives, dEldT_hJχι_m, are analytically computed and thus, any gradient-based functional minimization method can be used. For example, a non-linear conjugate gradient with a randomly initialized control tensor coefficient field can be used. After having estimated the tensor field, images are synthesized under new lighting direction v by evaluating the apparent BRDF field in the direction v, whereby each apparent BRDF is given by Eq. 4. The generated images can also be upsampled directly by evaluating Eq. 4 on a denser sampling grid since the tensor spline is a continuous function.

[043] Learning the ABRDF Field Using Spline Modulated Spherical Harmonics

[044] Our goal is to generate images of a face under various illumination conditions using a single example 2-dimensional image. This can be achieved by acquiring a reference ABRDF field once and then transferring it to new faces using their single images. The ABRDF represents the response of the object at a point to light in each direction, in the presence of the rest of the scene, not merely the surface reflectivity. Hence, by acquiring the ABRDF field of an object, cast shadows, which are image artifacts manifested by the presence of scene objects obstructing the light from reaching otherwise visible scene regions, can be easily captured. Note that since we want to analyze the effects of the illumination direction change, we would assume the ABRDF to be a function of just the illumination direction by fixing the viewing direction, though sometimes it is denned to be a function of both the illumination and viewing directions. Below, the first part of this process, i.e. the reference ABRDF field estimation, is described using novel bi-cubic B-Spline modulated anti-symmetric spherical harmonics.

[045] Surface Spherical Harmonic Basis

[046] The surface spherical harmonic basis, the analog to the Fourier basis for Cartesian signals, provides a natural orthonormal basis for functions defined on a sphere. In general, the spherical harmonic bases are defined for complex-valued functions but as the apparent BRDF is a real-valued function, the real-valued spherical harmonic bases are used to represent the apparent BRDF functions. The spherical harmonics basis function, ψ^m/ (order = /, degree = m), with / = 0, 1, 2, ... and -/ < m ≤ I, is defined as follows:

where P^ are the associated Legendre functions and Φ_m(θ, φ) is defined as:

[047] Note that even orders of the spherical harmonics basis functions are antipodally symmetric while odd orders are anti-symmetric. Perceptually speaking, given a limited number of data samples, the ABRDFs are best approximated using only antipodally anti-symmetric components of the spherical harmonic bases. To recognize this there are two crucial questions that must be examined. First, whether using just even or odd ordered bases drastically limit the approximation of the ABRDF and second, whether symmetric or anti-symmetric bases are more suitable.

[048] With respect to the first question, even though the ABRDF is a function defined on a sphere, for the purposes of the present invention, the interested only lies in its behavior on the frontal hemisphere. Hence, if the function's behavior at very extreme angles (0° and 180°) is ignored, once the ABRDF has been modeled accurately on the frontal hemisphere, the rear hemisphere can be filled in appropriately to make the function either symmetric or antisymmetric. To visualize this, polar plots in the first column of FIG. 1 show a typical ABRDF function defined on a semicircle. The second column shows the same function being approximated by an antipodally symmetric (row 1) and an antipodally anti-symmetric (row 2) function. By not using both types of components, not much is lost in the approximation power. For visualization, the problem has been scaled down to 2-dimensional and the blue circle represents the zero value in these polar plots. A more important reason that keeps us from using the complete set of bases is that for a fixed number of given example images, using just symmetric or anti-symmetric components allows us to go to higher orders which are necessary to approximate discontinuities like cast shadows and specularities in the image.

[049] With respect to the second question, one must observe the function's behavior at the extreme angles (0° and 180°). In reality, most facial ABRDF functions have a positive value near one of the extreme angles (as they face the light source) and a very small (« 0) value near the other extreme angle (as they go into attached shadows). Hence, the function in column 1 of FIG. 1 is very close to physical ABRDFs. Clearly, the function's behavior at 0° and 180° is neither antipodally symmetric nor anti-symmetric and hence, using just one of the two would lead to errors in approximation at these extreme angles. The error caused by symmetric approximation is perceptually very noticeable as it gives the function a positive value where it should be 0 [see FIG. 1, last column, first row and the regions marked by arrows in FIG. 1, last row as they are unnaturally bright] while the error caused by anti-symmetric approximation is not perceptually noticeable as it gives the function a negative value where it should be 0, which can be easily set to 0 as it is known that ABRDF is never negative [see FIG. 1, last column, second row and FIG. 1, last row, last two images]. Non-negativity is achieved similarly in the Lambertian model using the max function. Errors at the non-zero end of the function are not perceptually noticeable, as can be seen from the last row of FIG. 1.

[050] Bi-cubic B-spline Modulated Spherical Harmonic

[051] For a fixed pose, each pixel location has an associated ABRDF and across the whole face, we have a field of such ABRDFs. To model such a field of spherical functions (S x R — > R), a modulated spherical harmonics is used by combining spherical harmonic basis within a single pixel and B-splines basis across the field. The B-spline basis, N^, where:

_<γ f 1 if*. 1 t C i_{ϊ+ i} *^u \ fl otherwise.

and

-V|.fc(O = X,.k-L {t⁾ . ^{t U} , + -Y_l+l,*-l i t) J⁺^

' H-fc- 1 ' H-* (10)

acts as a weight to the spherical harmonic basis. Here, N_hk(f) is the spline basis of degree k + 1 with associated knots (Lk, tk+i, ... , t_n+i). Hence, the expression for the modulated spherical harmonics is given by:

Φ,^m( 0. O, X, I, J ) = J - -- ^-— _Λ ,^' .V,.4( J-I iO. Ofiπtft. Φ )

with Φ_m(θ, φ) and Pi\_m\ as defined before, 3c = (xi, xi) are the spline control points, / andy are the basis indices. The bi-cubic spline is chosen because it is one of the most commonly used in literature and more importantly, it provides enough smoothness for the ABRDF field so that the discontinuities present in the field due to cast shadows are appropriately approximated as demonstrated in the results shown below.

[052] There are three distinct advantages of using this novel bi-cubic B-spline modulated spherical harmonics for ABRDF field estimation. First, the built-in smoothness provides a degree of robustness against noise which is very common when dealing with image data. Second, it allows for using neighborhood information while estimating the ABRDF at each pixel location. Finally, it provides a continuous representation of the spherical harmonic coefficient field, which will be exploited during the ABRDF transfer that is defined further below. [053] ABRDF Field Estimation

[054] If the ABRDF field is available for a face, images of the face under novel illumination directions can be rendered by simply sampling the ABRDF at each location in the appropriate directions. But in a realistic setting, only a few images of a face (sample of the ABRDF field) are given. Hence, the problem at hand is of ABRDF field estimation from these few samples. Motivated by the reasoning outlined above, the present invention employs a bi-cubic B-Spline modulated anti-symmetric spherical harmonic functions for this task.

[055] Using S_x(Q, φ), the given data samples (intensity values) in (θ, φ) direction at location x , the ABRDF field can be estimated by minimizing the following error function:

where the first term in the summation is the representation of the ABRDF function using modulated antisymmetric spherical harmonic functions. T is the set of odd natural numbers and Wj_ji_m are the unknown coefficients of the apparent BRDF function that is being sought. Here, the spline control grid is overlayed on data grid (pixels) and the inner summation on / andj is over the bi-cubic B-Spline basis domain. This objective function is minimized using the non-linear conjugate gradient method initialized with a unit vector, for which the derivative of the error function with respect wyι_m can be computed in the analytic form as,

[056] Both of odd orders 3 and 5 are able to yield sufficiently good synthesis results, with order 3 performing slightly better than order 5. Therefore, order 3 is preferred. This is because the order 5 approximation over-fits the data. In an order 3 (value of 1) modulated anti-symmetric spherical harmonic approximation, values of the unknown coefficients can be recovered with just 9 images under different illumination conditions. Estimation is better if the given 9 images somewhat uniformly sample the illumination directions and improves if more images are present. As ABRDF is a positive function, any negative values produced by the model are set to 0 (as also done by the max function in the Lambertian model).

[057] Learning ABRDF Field Using Continuous Mixture of Single Lobed Functions

[058] The method described above can be quantitatively compared to a more general model that, in theory, can approximate spherical functions using a continuous mixture of single-lobed spherical functions. There are various spherical functions with a single lobe that can be used in a continuous mixture. For this application, it is desirable to choose a function that leads to an analytic solution, such as the following:

Six-) = , -« >- _ ! ₍₁₄₎ where u and v are unit vectors. Eq. 14 has the following desirable properties: 1) it has a single peak, and 2) S(v) = 0 for all v such that v • u = 0 (because if the viewing and illumination directions are perpendicular, we expect zero intensity). These properties are also valid for the Lambertian model.

[059] Given the single-lobed function in Eq. 14, any spherical function can be written as a continuous mixture of such functions. Accordingly, the apparent BRDF, a spherical function, can be modeled as a continuous mixture of functions S(v) as follows:

(15) where the integration is over the set of all unit vectors u (i.e. the unit sphere) and βu) is a distribution on orientations. The von Mises-Fisher distribution is chosen as the mixing density as it is the analog of the Gaussian distribution on S₂- The von Mises-Fisher distribution is given by:

where μ is a unit vector defining the orientation and K is a scalar governing the concentration of the distribution. The important observation is made that by substituting Eq. 16 into Eq. 15, an integral is derived that is the Laplace transform of the von Mises-Fisher distribution, which is analytically computed to be:

However, the single von Mises-Fisher distribution model cannot approximate angular distributions with several peaks, such as the human face apparent BRDF fields. Therefore, a finite mixture of von Mises-Fisher distributions can be used, which leads to the following alternate definition of Eq. 15:

(18) where w, are the mixture weights.

[060] In order to use this mixture of von Mises-Fisher distributions to obtain an expression for the apparent BRDF, a dense sampling of 642 directions of the unit sphere obtained by the 4th- order tessellation of the icosahedron can be used. The result is the following expression:

Although βu) has the form of a discrete mixture, the approximating function B(\) is still a continuous mixture of single-lobed functions expressed by Eq. 15.

[061] Given a set of N facial images of a given human subject with a fixed pose, I_n, n = I ... N, associated with lighting directions V_n, a N x 642 matrix A_nj can be setup by evaluating Eq.17 for every value V_n and μ,. Then, for each pixel, the unknown weights of Eq. 19 can be estimated by solving the overdetermined system:

AW = B ₍₂₀₎ where B is a N-dimensional vector that consists of the intensities of a fixed pixel in the N given images, and W is the vector of the unknown weights. This system can be solved efficiently to obtain a sparse solution by the non-negative least square minimization algorithm. This general model just described is used as a benchmark for evaluating quantitatively the ability of the antisymmetric tensor spline model of the present invention in approximating the apparent BRDF of human faces.

[062] ABRDF Transfer Algorithm

[063] The second part of the method of the present invention deals with transferring the ABRDF field from one face (reference) to another (target) and thus generating images under various novel illuminations for the target face using just one exemplar image. The basic shapes of features are more or less the same on all faces and thus the optical artifacts, e.g. cast and attached shadows, created by these features are also similar on all faces. Accordingly, the nature of the ABRDFs on various faces is also similar and hence, one should be able to derive the ABRDF field of the target face using a given reference ABRDF field.

[064] First, the non-rigid warping field between the reference and the target face images must be estimated. The non-rigid warping field between the reference and the target face images can be formalized as the estimation of a non-rigid coordinate transformation T such that:

^ Mi ϊref (Ti X)). ftargeti?)}

*6J (21) is minimized. Mis a general matching criterion which depends on the registration technique. I_ref and I_target are the reference and target images respectively, x is the location on the image domain /. Preferably, an information theoretic match measure based registration technique should be used in order for the registration to be done across different faces with possibly different illuminations (e.g. Mutual Information (MI) and Cross-Cumulative Residual Entropy (CCRE) based registration described respectively in the following publications: 1) "Alignment by maximization of mutual information," P. Viola and I. William M. Wells, IJCV, 24(2): 137-154, 1997 and 2) "Non-rigid multi-modal image registration using cross-cumulative residual entropy," F. Wang and B. C. Vemuri, IJCV, 74(2):201-205, 2007)). The CCRE registration technique works with cumulative distributions rather than probability densities and hence, is more robust to noise. Therefore, CCRE is able to produce better results when applied to faces. FIG. 5 depicts the results produced by MI and CCRE for the purpose of visual comparison. The first and second columns contain the reference image and the target image respectively. The third and fourth columns contain deformed faces produced by CCRE and MI respectively.

[065] Once the deformation field has been recovered, it is used to warp the source image's apparent BRDF field coefficients to displace the apparent BRDFs into appropriate locations for the target image. As described above, by using modulated spherical harmonic functions, we can obtain a continuous representation of the coefficient field, which is written explicitly as:

'1'I_vJS) = ^] λ\ _IJ J ₁ J Λ~_Λ4 ( So ) U\β_m υ (22)

As defined above, w_/m( x ) are the unknown coefficients for the order / and degree m spherical harmonic basis at location x . The apparent BRDF field coefficients for the target image w _/„,( x ) can be computed using Eq. 22 as w_/m(x ) = w/_m(T(x)), where T is the deformation field recovered by minimization of Eq. 21. Using w/_m(x), the apparent BRDF field can be readily computed using the spherical harmonic basis. As can be noted from FIG. 5, though the locations of the apparent BRDF have been changed to match the target face image, they are still the source (reference) image's apparent BRDF and thus, the images obtained by sampling them appear like the source (reference) image as can be seen in columns three and four of FIG. 5.

[066] This discrepancy can be fixed by using the following intensity mapping technique. A separate transformation can be chosen for each pixel. Based upon the geometric transformation between the reference and the target image, the intensity mapping quotient, Q(x), for each location x can be defined as:

Qf .Fi = I_{tar grt} l ?)ilr_ffi 7^"( Pn

[067] Because the images are known to be noisy and the division operation accentuates that noise, a Gaussian kernel G_σ can be used to smooth the image intensity mapping quotient field. As a result, the intensity value at location x of an image of the target face under novel illumination direction (θ, φ) can be computed as: l

ieτ _m=-t ₍₂₄₎

where the argument (θ, <p, x) indicates that the apparent BRDF at location x is being queried in direction (θ, φ) and 3(θ, φ) can be from any of the 3 above mentioned methods.

[068] The intensity mapping quotient (Eq. 23) is not the same as the Quotient image proposed by Riklin-Raviv and Shashua as they make explicit Lambertian assumption and define their quotient image to be ratio of the albedos which is clearly not the case here.

[069] Experimental Results

[070] All the experiments in this section used the Extended Yale B database, which has 64 different images per subject under known illumination directions. In order to test the sensitivity of our anti-symmetric tensor spline model to the selection of the train set, we constructed three different training sets, each consisting of 9 facial pictures per subject taken under different lighting directions. The lighting directions of the 9 images for the three selected training sets (A, B and C) are shown in FIG. 11. In FIG. 11, the lighting direction of each image is presented as a point in the azimuth-elevation plane. The training set 'A' shows a case where the 9 lighting directions do not span the azimuth-elevation plane in a symmetric and uniform manner, and therefore the input dataset does not represent well the underlying ABRDF. In the training set 'B', the lighting directions cover the azimuth-elevation plane better; however there is no lighting direction of an extreme high angle. Finally, the training set 'C samples the azimuth-elevation plane even better, including high-angle lighting directions along the elevation axis.

[071] To see the impact of different training sets on the approximated ABRDF field, the ABRDF of 10 different subjects from the Extended Yale B dataset was computed under the lighting configurations described in FIG. 11, using: a) the anti-symmetric tensor spline model of the present invention of order 3 and b) the Lambertian version of our framework using l^st-order tensors. After the training was performed using only 9 images per subject according to the method described above, 64 facial images per subject were synthesized by evaluating Eq. 4 for the 64 lighting directions provided in the Yale B database.

[072] FIG. 13 presents the synthesized images under several different lighting directions for a randomly selected subject. The images demonstrate that our proposed model approximated well the underlying ABRDF, producing realistic images. The 9 input images used here are shown in FIG. 10.

[073] FIG. 12 shows the average error in the intensity value denned as the absolute distance between the intensity values of the synthesized images and the ground truth images in the database. Based on the reported errors, it can be concluded that the method of the present invention performs significantly better than the Lambertian model. Moreover, in all three training set configurations, the performance remained approximately the same, which conclusively demonstrates that the method of the present invention approximates well the underlying ABRDF regardless of the lighting directions of the 9 input images.

[074] In FIG. 14, examples of the synthesized images using the Lambertian model and the antisymmetric tensor spline method of the present invention are visually compared. The first column shows the ground truth image from the extended Yale B dataset. Note that the ground truth images presented in FIG. 14 were not a part of the training set used for the synthesis of the images presented in the second and third columns of FIG. 14. By visual comparison, one can conclude that the 3^rd-order tensorial model can accommodate cast shadows and approximate well the specular components of the underlying ABRDFs. In contrast, specularity and shadows are missing from the images synthesized under the Lambertian model, which demonstrates the invalidity of the Lambertial assumption.

[075] FIG. 15 shows the approximated ABRDFs plotted as spherical functions in a region of interest that has specularities and shadows. The shapes of the plotted functions contain up to three lobes and show complexities that cannot be approximated under the Lambertian assumption.

[076] Next, the continuous mixture of single lobed functions was employed to approximate the underlying ABRDF by using all 64 given images as the training set. This model, although less efficient (since it requires a much larger training set of 64 images) than the anti-symmetric tensor spline method of the present invention (which uses only 9 images), can approximate spherical functions with a very complex structure characterized by a large number of lobes. In contrast, the 3^rd-order anti-symmetric tensor spline model can approximate functions whose shape complexity consists of at most three lobes. By comparing the performance of the continuous mixture of exponential functions with that of the anti-symmetric tensor spline, both presented in FIG. 16, one can conclude that they yield similar intensity values. This quantitatively demonstrates that in spite of the limitations of the 3^rd-order anti-symmetric tensor spline model, we can still capture and approximate the shape of the underlying facial ABRDFs.

[077] In FIG. 2A, we present the novel images synthesized from the learnt ABRDF field using spline modulated spherical harmonics, which clearly demonstrate that photo-realistic images can be generated by our model. Note the sharpness of the cast shadows in the last row. The presented technique is capable of both extrapolating and interpolating illumination directions from the sample images provided to it [see FIG 2B]. In FIG. 3 (left), we present the estimated ABRDF field overlayed on a face and in FIG. 3 (right), the method of the present invention can be seen to capture multiple bumps with varying sharpness to account for shadows and specularities. The method of the present invention's ability to capture cast shadows and specularities in images is clearly demonstrated in FIG. 4.

[078] In FIG. 6, we present a set of images generated under novel illumination conditions of the target face [see 2^nd row and 2^nd column in FIG. 5] using just one image. It can be noted that the specularity of the nose tip and cast shadows have been captured to produce photo-realistic results. Next, in FIG. 7, we present novel images of the same subject using three different reference faces. Discounting minor artifacts, it can be noted that these images are perceptually similar.

[079] In the next set of experiments, we demonstrate the robustness and versatility of the method of the present invention. First, we demonstrate that we can produce good results even when parts of the face in the target image are occluded [see FIG. 8]. This is accomplished by setting the intensity mapping quotient to unity and performing a histogram equalization in the occluded regions. The results show that our framework can handle larger occlusion than what was demonstrated recently by Wang et al. Second, even though we do not use any 3- dimensional information, the technique of the present invention is capable of generating photorealistic images of faces in poses different from that of the reference face under novel illumination directions. At this stage, our framework can handle poses that differ up to 12°. In FIG. 8, we look at the quantitative error introduced by our method as a function of the number of images used for the ABRDF field estimation. We compare the synthesized novel images to the ground truth images present in the Extended Yale B database. We observe that the quantitative error increases with the harshness of illumination direction, which we attribute to the lack of accurate texture information for extreme illumination directions.

[080] Finally, we present two sets of results for the application of the proposed techniques to face recognition. First, using a simple Nearest Neighbor classifier, we compare the results of our ABRDF estimation techniques using 9 sample images with those of existing techniques which use multiple (from 4 to 9) images [see TABLE I]. For this experiment, we assume that 9 gallery images with known illumination directions per person are available (from subset 1 and 2). We estimate the ABRDF field of each face using the techniques (Tensor Spline and Spline Modulated Spherical Harmonics) described, generate a number of images under novel illumination directions (defined on a grid) and then use all of them in our Nearest Neighbor classifier as gallery images. The results demonstrate that our technique can produce competitive results even when used with a naive classifier like Nearest Neighbor. To make the results comparable to the competing methods we used the 10 subjects from the Yale B face database. Results were averaged over 5 independent runs of the recognition algorithm. The results pertaining to the other techniques are summarized in the publication, "Acquiring linear sub-spaces for face recognition under variable lighting " K. Lee, J. Ho, and D. J. Kriegman, PAMI, 27(5):684-698, 2005. The results demonstrate that the method of the present invention can produce competitive results even when used with a nave classifier like the Nearest Neighbor classifier.

Table I : Recognition results on Yale B Face Database of various existing tech- niques..

[081] A second set of experiments demonstrate how the ABRDF transfer technique, which works with a single image, can be used to enhance various existing benchmark face recognition techniques [see TABLE 2]. For this, we make use of the fact that the performance of most recognition systems is improved when a better training set is present. We present results for Nearest Neighbor (NN), Eigenfaces and Fisherfaces, where we assume that only a single near- frontal illumination image of each subject is available in the gallery set. For ABRDF+NN, ABRDF+Eigenfaces and ABRDF+Fisherfaces, we use this single image to generate more images and then use all of them to train the classifiers. Experiments were earned out using 10 randomly selected subjects from the Extended Yale B Database. Results were computed using 3 different reference faces (other than the 10 selected subjects) over 5 independent runs each of the recognition algorithms and then averaged.

Table 2: Recognition revults of various, benchmark methods on the Extended Yale Face Database.

[082] Attached as Exhibits to the instant application, and incorporated by reference hereto are the following articles authored by the inventors herein:

- Beyond the Lambertian Assumption: A generative model for Apparent BRFD fields of Faces using Anti-Symmetric Tensor Spline

From one to many: A generative model for face image synthesis under varying illumination

[083] Accordingly, it will be understood that embodiments of the present invention has been disclosed by way of example and that other modifications and alterations may occur to those skilled in the art without departing from the scope and spirit of the above description or appended claims.

Claims

CLAIMSWe claim:

1. A method for estimating the apparent bi-directional reflectance distribution function field of a human face using anti-symmetric tensor splines, comprising:

defining the complex geometry and reflectance properties of the human face by a field of spherical functions;

approximating the field of spherical functions by anti-symmetric higher-order Cartesian tensors within a single pixel;

applying a B-spline basis as the weighting function of the anti-symmetric higher- order Cartesian tensors;

fitting the tensor spline basis to a given set of two-dimensional facial images of a human subject with a fixed pose and associated lighting directions by minimizing the energy of an objective function with respect to the unknown tensor coefficients of said antisymmetric higher-order Cartesian tensors; and

analytically computing the derivatives of the objective function for its efficient minimization.

2. The method of claim 1 wherein the order of the anti-symmetric higher-order Cartesian tensor approximation is of any order that can be used depending on the amount of data available.

3. The method of claim 1 wherein the method for minimizing the energy of the objective function is any gradient-based functional minimization method.

4. The method of claim 1, further comprising synthesizing images under new lighting directions by evaluating the estimated apparent bi-directional reflectance distribution function field in the new lighting directions.

5. A method for estimating the apparent bi-directional reflectance distribution function field of a human face using spline modulated spherical harmonics, comprising:

approximating the field of spherical functions by spline modulated spherical harmonics wherein the spherical harmonics are modulated with B-spline functions;

fitting the spline modulated spherical harmonics to a given set of two-dimensional facial images of a human subject with a fixed pose and associated lighting directions by minimizing the energy of an objective function with respect to the unknown coefficients of the spline modulated spherical harmonics basis; and

6. The method of claim 5 wherein the order of the spline modulated spherical harmonics approximation is of any order that can be used depending on the amount of data available.

7. The method of claim 5 wherein the method for minimizing the energy of the objective function is any gradient-based functional minimization method.

8. The method of claim 5, further comprising synthesizing images under new lighting directions by evaluating the estimated apparent bi-directional reflectance distribution function field in the new lighting directions.

9. A method for estimating the apparent bi-directional reflectance distribution function field of a human face using a continuous mixture of single lobed functions, comprising:

defining the complex geometry and reflectance properties of the human face by a field of spherical functions; approximating the field of spherical functions by a continuous mixture of single lobed functions;

using a finite mixture von Mises-Fisher distribution as the mixing density in the continuous mixture;

fitting a field of continuous mixtures to a given set of two-dimensional facial images of a human subject with a fixed pose and associated lighting directions by minimizing the energy of an objective function with respect to the unknown weights of the continuous mixture.

10. The method of claim 9 wherein the method for determining the mixture weights is any non- negative least squares solving method.

11. The method of claim 9, further comprising synthesizing images under new lighting directions by evaluating the estimated apparent bi-directional reflectance distribution function field in the new lighting directions.

12. The method of claim 1, further comprising a method of face relighting, comprised of:

estimating the apparent bi-directional reflectance distribution function field of a face from nine or more example images;

generating a novel image of the face under one or more novel illumination conditions by sampling the apparent bi-directional reflectance distribution function field in the novel direction; and

generating one or more face images under complicated illuminations by computing a weighted sum of the images obtained with illumination in the individual point source directions.

13. The method of claim 12, further comprising a method of face recognition, comprised of: estimating the apparent bi-directional reflectance distribution function field of the known faces;

generating novel images of those faces under novel illumination conditions; and

using the generated novel images to train any learning based face classification technique.

14. The method of claim 13 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.

15. The method of claim 5, further comprising a method of face relighting, comprised of:

estimating the apparent bi-directional reflectance distribution function field of a face using nine or more example images;

16. The method of claim 15, further comprising a method of face recognition, comprised of:

estimating the apparent bi-directional reflectance distribution function field of the known faces;

generating novel images of those faces under novel illumination conditions; and

17. The method of claim 16 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.

18. The method of claim 9, further comprising a method of face relighting, comprised of:

generating face images under complicated illuminations by computing a weighted sum of the images obtained with illumination in the individual point source directions.

19. The method of claim 18, further comprising a method of face recognition, comprised of:

generating novel images of those faces under novel illumination conditions; and

20. The method of claim 19 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.

21. The method of claim 1, further comprising a method for generating novel images of a target human face under various novel illumination conditions using a single two-dimensional image of the target face, comprised of: obtaining the apparent bi-directional reflectance distribution function of a reference face;

estimating a non-rigid coordinate transformation between the reference face image and the target face image using an image registration technique;

displacing the apparent bi-directional reflectance distribution functions of the reference image into appropriate locations for the target image using the deformation field obtained from the non-rigid registration; and

applying an intensity mapping quotient to the apparent bi-directional reflectance distribution functions at each location of the target image.

22. The method of claim 21 wherein the registration technique uses an information theoretic distance measure like Mutual Information or Cross-Cumulative Residual Entropy.

23. The method of claim 21, further comprising a method of face relighting, comprised of:

estimating the apparent bi-directional reflectance distribution function field of a face using a single two-dimensional example image;

generating a novel image of the face under one or more novel illumination conditions by sampling the apparent bi-directional reflectance distribution function field in novel directions; and

24. The method of claim 23, further comprising a method of face recognition, comprised of:

estimating the apparent bi-directional reflectance distribution function field of the known faces from the single two-dimensional image;

generating novel images of those faces under novel illumination conditions; and using the generated novel images to train any learning based face classification technique.

25. The method of claim 24 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.

26. The method of claim 5, further comprising a method for generating novel images of a target human face under various novel illumination conditions using a single two-dimensional image of the target face, comprised of:

obtaining the apparent bi-directional reflectance distribution function of a reference face;

27. The method of claim 26 wherein the registration technique uses an information theoretic distance measure like Mutual Information or Cross-Cumulative Residual Entropy.

28. The method of claim 26, further comprising a method of face relighting, comprised of:

estimating the apparent bi-directional reflectance distribution function field of a face using a single two-dimensional example image; generating a novel image of the face under one or more novel illumination conditions by sampling the apparent bi-directional reflectance distribution function field in novel directions; and

29. The method of claim 28, further comprising a method of face recognition, comprised of:

generating novel images of those faces under novel illumination conditions; and

30. The method of claim 29 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.

31. The method of claim 9, further comprising a method for generating novel images of a target human face under various novel illumination conditions using a single two-dimensional image of the target face, comprised of:

displacing the apparent bi-directional reflectance distribution functions of the reference image into appropriate locations for the target image using the deformation field obtained from the non-rigid registration; and applying an intensity mapping quotient to the apparent bi-directional reflectance distribution functions at each location of the target image.

32. The method of claim 31 wherein the registration technique uses an information theoretic distance measure like Mutual Information or Cross-Cumulative Residual Entropy.

33. The method of claim 31, further comprising a method of face relighting, comprised of:

34. The method of claim 33, further comprising a method of face recognition, comprised of:

generating novel images of those faces under novel illumination conditions; and

35. The method of claim 34 wherein the face classification technique can be any face recognition method that uses examples to do classification of test face images.