WO2011031932A1

WO2011031932A1 - Media control and analysis based on audience actions and reactions

Info

Publication number: WO2011031932A1
Application number: PCT/US2010/048375
Authority: WO
Inventors: Miguel I. Peschiera
Original assignee: Home Box Office, Inc.
Priority date: 2009-09-10
Filing date: 2010-09-10
Publication date: 2011-03-17

Abstract

In one embodiment, a video-display and audience-monitoring (VD/ AM) system includes a display for displaying a first program to a first audience of one or more viewers, a camera subsystem for generating monitoring video of the first audience, and a control subsystem for controlling the display and for processing the monitoring video to characterize emotional reactions of the first audience to the first program. The control subsystem stores information recording the emotional reactions of the first audience to the first program. When a second audience views the first program, the second audience's emotional reactions to the first program are characterized and recorded. If the first audience has reactions to the first program sufficiently similar to the corresponding reactions of the second audience, then a reaction group having the first and second audiences as members is formed, wherein the reaction group is used for recommending other programs to group members.

Description

MEDIA CONTROL AND ANALYSIS

BASED ON AUDIENCE ACTIONS AND REACTIONS

This application claims the benefit of the filing date of U.S. Provisional Application No. 61/241,141 filed on September 10, 2009, the teachings of which are incorporated herein by reference in their entirety.

BACKGROUND OF THE INVENTION

Field of the Invention

The current invention relates to the presentation and sensation of audio-visual media, and more specifically, but not exclusively, to presentation control of media and audience-reaction analysis based on actions and reactions of one or more users.

Description of the Related Art

One area of the entertainment industry is the provision of audio- visual content to viewers' display devices. Audio-visual content may be provided to users' devices, for example, via dedicated cable systems, fiber-optic systems, or through wired or wireless communication networks such as the Internet. Note that, unless otherwise indicated, the term "user" is interchangeable with the term "viewer," as used herein. Content providers generally want to know how viewers react to the content in order to aid the content providers in recommending and/or creating content that viewers will want to continue to watch. In addition, viewers generally wish to be able to control the content they view and its presentation and also appreciate receiving recommendations for additional content. Many systems and methods are known for facilitating these tasks for content providers and for viewers.

One relatively simple way to assess viewer reactions to a program is to ask viewers for feedback. This method is, however, (1) time consuming and cumbersome for viewers and content providers and (2) of uncertain accuracy for the content providers. Another way to assess viewer reactions, typically used by marketing researchers, is to unobtrusively observe a viewer and the viewer's reactions as the viewer watches the program. Marketing researchers sometimes use one-way mirrors to observe viewers. This method is, however, inherently limited to a small sample size that can be brought into a testing facility and personally observed. Increasingly, automated computerized systems are used to monitor the reactions to various content and products of viewers and other consumers.

U.S. Pat. App. Pub. No. 2009/0195392 to Zalewski ("Zalewski"), incorporated herein by reference in its entirety, describes the passive collection of emotional responses to a media presentation by a system having a microphone and/or a camera. Zalewski teaches correlating a tracked emotional response with the stimuli that caused it and then aggregating the resulting response data across a population of individuals. Zalewski also teaches simultaneously and individually tracking the emotional response of multiple individuals during a media presentation, where face and voice detection are used to match responses to particular individuals. Zalewski' s system maintains a timeline for the media presentation, where the timeline is annotated each time an individual's response is detected. In other words, Zalewski teaches annotating the timeline only when detectible emotional responses are identified. Zalewski teaches identifying key aspects of a program once the program is completed using the information collected. Zalewski suggests using the collected information to judge the quality of the content and to judge the nature of various individuals to assist in selecting future content to be provided to them or to those similarly situated demographically. However, Zalewski does not provide specific implementation details for the proposed systems and methods.

U.S. Pat. App. Pub. No. 2005/0289582 to Tavares et al. ("Tavares"), incorporated herein by reference in its entirety, describes a system and method for capturing facial expressions of one or more persons in an audience to determine an audience's reaction to movie content. The Tavares system can interpret the captured information to determine the human emotions and/or emotional levels (degree or probability) as feedback or reaction to movie content and add the information and reactions to a database. Specifically, the Tavares system stores in a database the score of each emotion for a predetermined time of particular content. The Tavares system further discloses categorizing segments of content based on viewers' reactions to the segments.

U.S. Pat. App. Pub. No. 2003/0093784 to Dimitrova et al. ("Dimitrova"), incorporated herein by reference in its entirety, discloses a system and method for collecting, analyzing, and using sensory reactions of members of a viewing audience. The Dimitrova system uses a plurality of sensors to monitor the viewer or viewers for recognizable evidence of an emotional response that can be associated with a discrete program segment. The Dimitrova system further monitors subsequent programs for the opportunity to notify the viewer or automatically present, or avoid presenting, a subsequent program based on associations of responses to types of program content.

U.S. Pat. App. Pub. No. 2009/0285456 to Moon et al. ("Moon"), incorporated herein by reference in its entirety, describes measuring human emotional response to visual stimuli based on a person's facial expressions. Moon teaches methods and systems to locate a person's face, determine the locations of facial features, estimate facial-muscle actions using a learning machine, and translate recognized facial actions into emotional states.

U.S. Pat. No. 6,404,900 to Qian et al. ("Qian"), incorporated herein by reference in its entirety, teaches a method for finding and tracking multiple faces in an image from a color video sequence. Such a method may be useful for tracking individuals within a co-located assemblage of individuals.

U.S. Pat. No. 7,512,255 to Kakadiaris et al. ("Kakadiaris"), incorporated herein by reference in its entirety, teaches a method for identifying an individual's face. Kakadiaris 's method uses visible and infrared cameras in generating a geometric model template of the face, extracting first metadata from the template, storing the first metadata and subsequently comparing the first metadata to similarly generated second metadata to determine their degree of similarity and, consequently, whether the first and second metadata match or not.

U.S. Pat. No. 5,774,591 to Black et al. ("Black"), incorporated herein by reference in its entirety, teaches systems and methods for dynamically tracking facial expressions. Black teaches tracking a computer user's head and eye motions to determine where the user is looking and adjusting the appearance of graphical objects on the computer monitor to correspond with changes in the user's gaze. Black also teaches passively observing a family watching television to determine, for example, who is watching the screen, who is smiling, who is laughing, and who is crying in order to provide feedback to networks and producers about their products.

U.S. Pat. App. Pub. No. 2003/0081834 to Philomin et al. ("Philomin"), incorporated herein by reference in its entirety, discloses an intelligent television room that automatically adjusts viewing conditions, such as volume, based on processing a viewer's observed facial expressions. Philomin's system includes a controller able to increase or decrease the output power of several electronic appliances, such as lamps, fans, and air conditioners, where the controller can vary the output based on (1) an analysis of the audio and video signals of the incoming content or (2) an analysis of the facial expressions of the viewer.

U.S. Pat. No. 6,990,453 to Wang et al. ("Wang"), incorporated herein by reference in its entirety, discloses a system and method for recognizing sounds, even when subject to background noise and other distortions. The Wang system relies on a database indexing a large set of original recordings, each represent by a so-called fingerprints. The unknown audio sample is processed to generate its fingerprint, which is then compared to the indexed fingerprints in the database. The unknown audio sample is identified is identified if its fingerprint is successfully matched to an indexed fingerprint in the database.

U.S. Pat. App. Pub. No. 2005/0288954 to McCarthy et al. ("McCarthy"), incorporated herein by reference in its entirety, discloses a method and system including observing a user's facial expression and gestures in order to determine the user's emotional state and better target advertisements and web content to the user.

U.S. Pat. No. 5,638,176 to Hobbs et al. ("Hobbs"), incorporated herein by reference in its entirety, discloses a non-video eye-tracking system that determines the gaze point of an eye. A video-based eye-tracking system is described by Zafer Savas in "TrackEye: Real-Time Tracking of Human Eyes Using a Webcam" ("Savas"), incorporated herein by reference in its entirety. Another video-based eye-tracking system is a computer with an infra-red-capable camera running the ITU Gaze Tracker program from the IT University of Copenhagen, Denmark. Such systems may be used to determine where the gazes of one or more users are focused.

U.S. Pat. No. 7,274,803 to Sharma et al. ("Sharma"), incorporated herein by reference in its entirety, discloses a method and system for detecting and analyzing body- motion patterns of an individual observed by the system's camera. The system provides feedback to aid in interaction between the individual and the system. Sharma' s system allows multiple people to be present in front of the system while only one active user makes selections on the corresponding display. Sharma does not disclose using the gestures of the so-called bystanders.

U.S. Pat. No. 6,788,809 to Grzeszczuk et al. ("Grzeszczuk"), incorporated herein by reference in its entirety, discloses a method and system for gesture recognition using stereo imaging and color vision. The system estimates a hand pose, which is then used to determine the type of gesture.

U.S. Pat. App. Pub. No. 2007/0252898 to Delean ("Delean"), incorporated herein by reference in its entirety, discloses a video processor for recognizing an individual's hand gestures and using the recognized hand gestures to control audio- visual equipment. For example, Delean teaches using hand gestures to vary device volume, change the channel, and control video playback.

In "A Human Model for Detecting People in Video from Low Level Features" by Harrasse, Bonaud, and Desvignes ("Harrasse"), incorporated herein by reference in its entirety, a system is described for determining the presence of multiple people in a video. The system uses skin-tone determination and statistical analysis to determine whether and where people are located in a video sequence.

In "Automatic Recognition of Facial Actions in Spontaneous Expressions" by Bartlett et al. ("Bartlett"), incorporated herein in its entirety, a system is described which automatically detects faces appearing in a video and encodes each video frame with respect to a variety of facial action units that can be used to determine an instantaneous emotional state of a detected face. Bartlett' s system is trained with a database of expressions and is then used to detect facial actions in real time.

Affective Interfaces, Inc. ("Al"), of San Francisco, California, provides an application programming interface (API) that allows for emotion analysis of a face appearing in a video. The analysis automatically and simultaneously measures expressed levels of joy, sadness, fear, anger, disgust, surprise, engagement, emotional valence, drowsiness, and pain/distress. Similarly, the eMotion program from Visual Recognition, of Amsterdam, the Netherlands, processes video of a face to provide an assessment of simultaneous levels of expressed neutrality, happiness, surprise, anger, disgust, fear, and sadness. A system such as AI's or Visual Recognition's may be useful for providing multiple sequences of numerical values for various reactions to units of content based on video of observed viewers. Viewers sometimes watch programs alone and sometimes in multi-viewer assemblages. New systems and methods of integrating observed information from viewers, whether alone or in an assemblage, may be used to enhance the viewing experience of viewers.

SUMMARY OF THE INVENTION

One embodiment of the invention can be a system for outputting media content and monitoring audience reactions. The system comprises: a display adapted to output a first media-content program to a first audience of one or more users, a camera subsystem adapted to generate monitoring video of the first audience, and a control subsystem adapted to process the monitoring video from the camera system and control provision of media content to the display. The control subsystem processes the monitoring video to characterize reactions of the first audience to the first program. The control subsystem characterizes similarity of (i) the reactions of the first audience to the first program to (ii) reactions of a second audience to the first program. The control subsystem includes the first audience and the second audience in a first reaction group if the similarity characterization satisfies a specified similarity condition.

Another embodiment of the invention can be a method for outputting media content and monitoring audience reactions. The method comprises: outputting a first media-content program to a first audience of one or more users, generating monitoring video of the first audience, processing the monitoring video from the camera system to characterize reactions of the first audience to the first program, controlling provision of the first program to the display, characterizing similarity of (i) the reactions of the first audience to the first program to (ii) reactions of a second audience to the first program, and including the first audience and the second audience in a first reaction group if the similarity characterization satisfies a specified similarity condition.

Yet another embodiment of the invention can be a system for outputting media content and monitoring audience reactions. The system comprises: means for outputting a first media-content program to a first audience of one or more users, means for generating monitoring video of the first audience, means for processing the monitoring video from the camera system to characterize reactions of the first audience to the first program, means for controlling provision of the first program to the display, means for characterizing similarity of the reactions of (i) the first audience to the first program to (ii) reactions of a second audience to the first program, and means for including the first audience and the second audience in a first reaction group if the similarity characterization satisfies a specified similarity condition.

BRIEF DESCRIPTION OF THE DRAWINGS

Other aspects, features, and advantages of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements.

FIG. 1 shows a simplified block diagram of a video-display and audience- monitoring (VD/AM) system in accordance with one embodiment of the present invention.

FIG. 2 shows an exemplary user-data table stored by the remote server of

FIG. 1.

FIG. 3 shows a flowchart for an exemplary process performed by the VD/AM system controller of FIG. 1.

FIG. 4 shows a flowchart of another exemplary process performed by the VD/AM system controller of FIG. 1.

FIG. 5 shows an exemplary multi-user party (MUP) tracking table for assemblages of users from the table of FIG. 2.

FIG. 6 shows an exemplary setting-tracking table containing information about the display and location of FIG. 1.

FIG. 7 shows an exemplary program-tracking table containing information about programs watched on the display of FIG. 1.

FIG. 8 shows an exemplary reaction-data table containing information about audience reactions to segments of programs viewed on, for example, the display of FIG. 1.

FIG. 9 shows a simplified block diagram of a collaborative VD/AM system in accordance with one embodiment of the present invention. FIG. 10 shows an exemplary reaction-group tracking table for reaction groups of audiences.

FIG. 11 shows an exemplary reaction-group-summary table for groups from the table of FIG. 10.

FIG. 12 shows an exemplary audience-comparison table, showing the result of a comparison of the emotional-reaction responses of two audiences.

FIG. 13 shows a flowchart for an exemplary process for forming and updating reaction groups that is performed by the VD/AM system of FIG. 1. DETAILED DESCRIPTION

Various exemplary embodiments of the invention are described below. The descriptions are generally organized into embodiments dealing with (i) individual viewers, (ii) multiple co-located viewers, and (iii) the processing of the data generated to, for example, create and update preference groups.

FIG. 1 shows a simplified block diagram of video-display and audience- monitoring (VD/AM) system 100 in accordance with one embodiment of the present invention. VD/AM system 100 comprises VD/AM system controller 101, which is connected to display 102, camera system 103, and network cloud 104, which are also part of VD/AM system 100. Network cloud 104, in turn, is connected to remote server 105, which is also part of VD/AM system 100. VD/AM system controller 101, display 102, and camera system 103 are all located at a user's location 106, such as the user's living room. Display 102 may be a television monitor, computer monitor, projector, handheld device, speaker, or any other visual display and/or audio device. Network cloud 104 comprises a data- transmission network, such as, for example, the Internet or a cable system.

Camera system 103 comprises at least one camera used to monitor one or more viewers of display 102 co-located at location 106 and provide video of the viewers to VD/AM system controller 101 for processing. Camera system 103 may also comprise microphones (not shown) to provide audio information to VD/AM system controller 101. VD/AM system controller 101 receives multimedia content from, and provides feedback data and requests to, remote server 105 via network cloud 104. VD/AM system controller 101 controls display 102 to display multimedia content received from remote server 105. VD/AM system controller 101 may comprise a media player, such as a DVD player, allowing VD/AM system controller 101 to show multimedia content on display 102 without relying on remote server 105 for the multimedia content. VD/AM system controller 101 processes the video received from camera system 103 to determine the number of viewers present at user location 106 and assess their reactions to the multimedia content shown on display 102. VD/AM system controller 101 may incorporate a combination of one or more facial- expression-analysis and gesture-recognition systems, such as those described above.

In one embodiment, VD/AM system controller 101 uses a facial-expression- analysis system to record, for every frame of video content shown on display 102, emotional-reaction levels for a plurality of emotional reactions for each viewer present and watching display 102. In this embodiment, VD/AM system controller 101 uses a gesture-recognition system to allow the viewers present at location 106 to use body gestures to modify the content shown on display 102 and/or the presentation of the content.

Remote server 105 provides multimedia content for viewing on display 102. Remote server 105 may, for example, be part of a cable-television- system headend or be a network server connected to the Internet. Remote server 105 stores face- descriptive data describing viewers and the viewers' faces, which data is used to control access to content provided by remote server 105. When a user accesses programming using VD/AM system 100, remote server 105 receives data about the user's face, as captured by camera system 103, from VD/AM system controller 101 and compares the received face-descriptive data to the stored face-descriptive data in order to verify the user's identity.

In one embodiment, access to programming requires further video verification comprising a particular facial expression or a particular hand gesture, which functions as a password. The facial expression or hand gesture is captured by camera system 103 and identified and categorized by VD/AM system controller 101, which transmits data representing the categorized expression or gesture to remote server 105. A password hand gesture may be a pantomime of a secret handshake or a series of hand movements. In one implementation, a password comprises a particular sequence of both hand gestures and facial expressions. In one implementation, the complexity of the requisite password may depend on the category of content requested from remote server 105. For example, simple face-recognition is sufficient authentication for a user requesting regular programming while (i) the user's request for pay-per-view programming requires a combination of a facial expression and a hand gesture for authentication and (ii) the user's request to change personal settings requires a sequence of multiple facial expressions and hand gestures for authentication. Once a user is authenticated, VD/AM system controller 101 retrieves the user' s preferences and viewing privileges from remote server 105. Note that the user's preferences and viewing privileges, stored on remote server 105, may also be retrievable from devices and locations other than VD/AM system controller 101 at location 106.

FIG. 2 shows exemplary user-data Table 200 stored by remote server 105 of FIG. 1. Remote server 105 includes a relational database system for storing, updating, and analyzing data. Table 200 shows exemplary identification, preference, and demographic data for several sample users. For each user record, Table 200 comprises fields for user ID, last name, first name, address, age, gender, ethnicity, language, income level, education level, content authorization level, drama preference, and comedy preference. Note that values for demographic information in a user-data record may be unknown or blank. Also note that, as is typical in a relational database system, some of the demographic data is encoded as index values where additional tables (not shown) stored by remote server 105 contain information explaining the corresponding codes. The user ID is an index value used to uniquely identify each user. Each user record of Table 200 is also associated with information describing the user's face (not shown). Each user record of Table 200 may also be associated with expression and/or gesture password information, as described above. As would be appreciated by a person of ordinary skill in the art, variations of Table 200 may contain more or fewer fields or be broken up into multiple tables.

VD/AM system 100 of FIG. 1 is adapted to provide collaborative services to a multi-user party (MUP), which is an audience of two or more viewers co-located with display 102 and camera system 103 at location 106. When multiple viewers are seen by camera system 103 to be viewing display 102, VD/AM system controller 101 reacts to the reactions and gestures of the individual viewers and considers the presence and actions, if any, of the other viewers in processing those reactions and gestures.

VD/AM system 100 of FIG. 1 allows for collaborative control of the content shown on display 102, as well as of the presentation of the content. Initially, it is determined whether collaborative control is to be permitted. This determination can be based on a default setting, can be set by a first user, or by the user with the highest authorization level present (e.g. , a parent when watching content with children). If collaborative control is permitted, then VD/AM system controller 101 determines the number of viewers in the multi-user party and tracks their gestures and/or facial expressions.

FIG. 3 shows exemplary flowchart 300 for determining whether to raise the volume of display 102 of FIG. 1. In this implementation, a viewer raising a hand is interpreted as a request to raise the volume of display 102. The process starts at step 301. VD/AM system controller 101 identifies and tracks the people watching at location 106 (step 302). Multiple users may be identified using, for example, one or more of the user-counting systems described above. In order to discount people present but not viewing display 102, VD/AM system controller 101 determines which of the multiple users present are actually watching display 102 by tracking the users' faces and eyes using, for example, one or more of the above-described eye- and face- tracking systems. Note that the identification of viewers may be as basic as assigning temporary handles to each user or as complex as correlating each user to a corresponding record in Table 200 of FIG. 2 based on the user' s face.

If, at any moment, it is determined that more than half (i.e., a majority) of the viewers in an audience are raising their hands (step 303), then the volume is raised (step 304); otherwise, the process returns to step 302. Note that if the audience at location 106 consists of only one user, then that user is automatically the majority. VD/AM system controller 101 dynamically reacts to viewers joining or leaving location 106. As would be appreciated by a person of ordinary skill in the art, a similar process can be used to control other aspects of content presentation on display 102, where different gestures are used to control different corresponding features (e.g. , (i) lowering hand to lower volume or (ii) moving hand left or right to vary the channel down or up, respectively). FIG. 4 shows flowchart 400 of another exemplary process performed by VD/AM system controller 101 of FIG. 1, where focusing attention on an on-screen menu item shown on display 102 is interpreted to be a request for the selection of that menu item. The process starts when it is determined that multiple people are watching display 102 of FIG. 1 (step 401). VD/AM system controller 101 tracks the viewing focus of each of the multiple viewers watching display 102 (step 402). Focus tracking may be performed by tracking the users' eyes and/or heads using one or more of the gaze-tracking systems described above. If a viewer is focusing on a menu item, then that menu item is highlighted. If it is determined that a majority of users are focusing on a particular menu item (step 403), then that menu item's highlighting is enhanced (step 404); otherwise, the process returns to step 402.

As would be appreciated by a person of ordinary skill in the art, there are various ways in which to highlight and enhance the highlight of menu items for selection. In one implementation, each viewer is associated with a different corresponding frame color for highlighted menu items. This provides both (i) feedback to each viewer about the system's determination of which item the viewer is focusing on and (ii) an indication to all viewers of what menu items are being focused on by other viewers. In addition, if a viewer's focus stabilizes on a particular item, then the corresponding frame may change by, for example, getting thicker or bolder in color, to indicate the more-intense focus. If the viewer's focus wanders, then the frames may change in an opposite way to indicate the waning attention. If several viewers focus on the same menu item, then their respective frames may be shown concentrically, or the frames may be combined in some other manner. Other ways to highlight menu items include, for example, using arrows to point at or holding the text of the corresponding menu items.

If a majority of viewers indicate approval of the selection of the menu item (step 405) by, for example, nodding their heads or maintaining focus for more than a threshold time, then the menu item is selected and a corresponding action is executed (step 406); otherwise, the process returns to step 402. If, for example, a majority of viewers are looking at a menu item, then VD/AM system controller 101 determines that it is an item of interest and presents more information about the selected menu item. For example, if a majority of viewers nod while looking at a program listing in an interactive program guide, then VD/AM system controller 101 interprets that as a signal to select the corresponding program for viewing or reveal a list of options pertaining to that program, such as (i) record the program, (ii) provide a list of similar programs, or (iii) provide additional broadcast times for the program.

FIG. 5 shows exemplary multi-user party (MUP) tracking Table 500 for assemblages of users from Table 200 of FIG. 2. Records in Table 500 comprise two fields: MUP ID and user ID. Every MUP has as many records as viewers, each record indicating that a particular viewer is a member of the corresponding MUP. For example, Table 500 shows that MUP "00023" includes the viewers whose user IDs are "00005" and "00006." Table 500 provides a simple way to track viewer memberships in MUPs. Note that MUPs may also be created for assemblages of viewers that are not co-located but watch a program together via a communicative connection as, for example, described below for VD/AM system 900 of FIG. 9. As would be appreciated by a person of ordinary skill in the art, there are alternative ways to track this information in a relational database system.

FIG. 6 shows exemplary setting-tracking Table 600 containing information about display 102 and location 106 of FIG. 1. Records in Table 600 comprise four fields: audience ID, setting ID, device ID, and location text. Every audience ID has as many records as different relevant settings. An audience ID is either a user ID or a MUP ID. It should be noted that, in this embodiment, user IDs and MUP IDs are obtained from a single stream and, consequently, if a certain ID is used as a user ID it cannot also be used as a MUP ID and vice-versa. The single stream is used in order to allow MUPs to be treated as unique users in certain circumstances. As would be appreciated by a person of ordinary skill in the art, alternative database rules and structures may be used to achieve similar ends. It should also be noted that each different setting ID is unique to the corresponding audience ID. In other words, setting ID "0001" for audience ID "00005" might not refer to the same setting as setting ID "0001" for audience ID "00006."

The device ID uniquely identifies a particular display type. A corresponding device-tracking table (not shown) has information for each device ID such as device type, make, model, and screen size. Note that the device ID may alternatively be used to uniquely identify a particular instance of a display type, similar to the way a vehicle identification number (VIN) uniquely identifies a unique instance of a vehicle. The location text has a user- selected description of location 106 for the corresponding setting ID. Note, for example, that setting ID "0002" for audience ID "00023," which comprises users "00005" and "00006," has the same device type and location text as setting ID "0001" for audience ID "00005." In other words, both user "00005" and MUP "00023" have a setting that includes a display of type "0951753" located in a living room. This is because, in this example, one viewing setting for MUP "00023" is the living room and display device of user Jane Doe, whose user ID is "00005."

FIG. 7 shows exemplary program-tracking Table 700, containing information about programs watched on display 102 of FIG. 1. Records in Table 700 comprise six fields: program ID, program type, program name, episode number, broadcast date and time, and length in hours, minutes, and seconds. The program ID field is an index to uniquely identify programs provided as content. The other fields provide self- explanatory descriptive information about the corresponding program.

FIG. 8 shows exemplary reaction-data Table 800 containing information about audiences' reactions to segments of programs viewed on, for example, display 102 of FIG. 1. Records in Table 800 comprise audience ID, MUP ID, program ID, setting ID, frame number, happiness, laughter, anger, tension, disgust, surprise, fear, engagement, gesture, and gesture target. The combination of audience ID, MUP ID, program ID, setting ID, and frame number serve as an index to uniquely identify a reaction record. The audience ID field identifies either (i) a particular user from Table 200 or (ii) a particular MUP from Table 500, associated with the particular reaction-data record. The MUP ID field identifies (i) for a user, the MUP, if any, that the user belongs to for the particular record or (ii) for a MUP, the MUP ID, which is the same as the audience ID for the corresponding record. Note that a MUP ID of "00000" is used as a special ID to indicate that the corresponding audience is a user watching the program alone and not as part of a multi-user party.

The program ID identifies the program, from Table 700, that is being viewed. The setting ID identifies the setting, from Table 600, for the viewing of the program. The frame number identifies the frame, using a frame timecode, for a frame of the corresponding program. The happiness, laughter, anger, tension, disgust, surprise, fear, and engagement fields indicate a determined level, on a scale of 0-9, of the corresponding emotional reaction, of the corresponding user or MUP, to the corresponding frame, based on analysis of video of the audience. Note that the data for a MUP is the average of the data for the users that make up the MUP. The gesture field indicates a detected gesture by a user, such as a raised hand, a high five, clapping, hitting, or hugging. Gestures may also include static positions, such as standing, lying down, or sitting. In addition, the gesture ID may indicate combinations of active an passive gestures, such as clapping while sitting or sitting without clapping. The gesture target identifies a user, if any, determined to be the target of the gesture. Note that (i) a gesture of "00" is used as a special ID to indicate no gesture and (ii) a gesture target of "00000" is used as a special ID to indicate no gesture target.

For example, if two users in a MUP are sitting and hugging during a particular frame, then the corresponding record shows each user as performing a hug gesture having the other user as the gesture target. Note that gestures determined to be directed to VD/AM system 100 may be identified as having either nothing (e.g. , "00000") or the VD/AM system 100 as the gesture target. Also note that gestures performed by viewers may be evaluated independently as an emotional reaction or can be taken into account by the facial-expression-analysis system in determining an emotional-reaction level to a frame of program. Note that the emotional-reaction fields of Table 800 are exemplary and fewer or greater number of reactions may be tracked by alternative implementations.

Table 800 shows several sets of sample data for (i) exemplary users Jane Doe and John Brown, whose user IDs are "00005" and "00006," respectively, and (ii) exemplary MUP "00023." Three records for Jane by herself (as indicated by a MUP ID of "00000") show her reactions to three frames of program "000020." Three records for user John by himself show his reaction to the same three frames of program "000020." Nine records show the reactions of users Jane and John to those same three frames that they also happened to watch together, as part of MUP "00023." For each included frame of program "000020," there is (i) a record for Jane's reaction, (ii) a record for John' s reaction, and (iii) a record showing the average reaction levels for all the members of MUP "00023." The records for Jane and John also show that each performed gesture "03" (e.g. , a hug) whose target was the other person.

A user's determined emotional reactions to a program may be used to generate a personal categorization for that program, which is maintained in the relational database of VD/AM system 100. For example, if a user's dominant reaction to the program is laughter, then the program is labeled a comedy for the user. If, for example, the user laughs for parts of the program, but shows concern, stress, or sadness for most of the other scenes, then the program is labeled as a drama for that user. If a user shows increasing amounts of fatigue, distraction, or low levels of engagement, then the show is given a low interest score for that user. The information generated may be used to determine what types of programs the user finds most engaging and use that determination in making programming recommendations to the user or other users deemed similar. If the user falls asleep or leaves location 106 while a program is playing on display 102, then VD/AM system controller 101 pauses program playback and may also power down display 102.

In one implementation, a user has the option of training VD/AM system 100 with facial expressions and/or body gestures when starting to use VD/AM system 100. Another option allows a user to review the determinations of the facial- expression-analysis and gesture-recognition systems of VD/AM system 100 and provide corrections anytime after a user finishes watching a program, or portion of a program. VD/AM system 100 may also seek input from the user for facial expressions and/or gestures that VD/AM system 100 is unable to categorize with acceptable certainty. The user may interact with VD/AM system 100 by any available means, such as, for example, scrolling through options, spelling by pointing to letters on display 102, typing on a physical keyboard, or using a virtual keyboard.

In one implementation, VD/AM system 100 uses aggregate data and/or feedback from multiple users to learn and thereby improve the recognition capabilities of VD/AM system 100. If VD/AM system 100 records the video of viewers that is used for emotional and gesture recognition, then VD/AM system 100 may re-evaluate and re-process previously analyzed programs for users after learning adjustments or user corrections are made. In one implementation, a user may program VD/AM system controller 101 to perform a specified action in response to a determination of particular emotional reactions, gestures, or combinations of reactions and/or gestures. For example, VD/AM system controller 101 may be programmed to pause a program and automatically bring up an interactive program guide on display 102 if (i) it is determined that the user is exhibiting a below-threshold level of engagement or happiness for more than a threshold amount of time or (ii) the user makes a thumbs- down, or similar, hand gesture.

FIG. 9 shows a simplified block diagram of collaborative VD/AM system 900 in accordance with one embodiment of the present invention. Collaborative VD/AM system 900 comprises all the elements of VD/AM system 100 of FIG. 1 and further comprises display 902, VD/AM system controller 901, and camera system 903 at location 904, where the elements at location 904 are configured, connected, and behave in substantially the same way as the corresponding elements at location 106. Collaborative VD/AM system 900 allows for interaction among viewers located at different locations that are simultaneously watching the same programming on their respective displays.

VD/AM system controller 101 shares information about the reactions of viewers at location 106 with VD/AM system controller 901 at location 904, and vice- versa. For example, if viewers at locations 106 and 904 are watching the same football game, then display 102 shows representations of the viewers at location 904, and display 902 shows representations of the viewers at location 106. The representations may be, for example, images of the respective users' faces or avatars of the users' faces. The representations are shown on the edges of the respective displays. Alternatively, the representations may be shown on other display devices at the corresponding locations. The display of the representations of the viewers at the other location allows users to see how their friends at the other location are reacting to a particular program. VD/AM system 900 may add visual effects to the representations, which depend on viewers' assessed emotional reactions and/or gestures. If, for example, a viewer at location 904 is upset because his team is losing, then his representation on display 102 at location 106 may be shown with thunder clouds around it. The intensity of the visual effects may be varied in proportion to the intensity of a particular emotional reaction.

In one implementation, VD/AM system 100 tracks the presentation parameters of display 102 along with the above-described emotional-reaction information. The presentation parameters include audio and visual parameters. Audio parameters include levels of one or more of volume, bass, treble, balance, and reverb. Visual parameters include levels of one or more of brightness, contrast, saturation, and color balance. As would be appreciated by one of ordinary skill in the art, there are various ways to record and correlate these parameters with users and MUPs watching a particular program. One way would be to simply add fields to the records of Table 800 of FIG. 8 for tracking the desired parameters by frame.

VD/AM system 100 may store a preferred set of parameters for a user (i) in response to the user's explicit instructions, (ii) automatically based on the user's historical average settings, or (iii) automatically based on an analysis of the user's emotional reactions correlated to variations in the parameters. VD/AM system 100 may then retrieve the preferred set of parameters and apply them to display 102 when the user is viewing display 102. VD/AM system 100 may analyze aggregate user emotional-reaction data in relation to display parameters by display type and make display-parameter recommendations or automatically set display parameters based on the results of the analysis in order to enhance users' viewing experiences. Note that VD/AM system 100 may also determine and track audio and visual parameters of the program itself, as distinguished from its presentation on display 102, and similarly analyze and use that information as described above.

In one implementation, VD/AM system 100 tracks ambient parameters at location 106 along with the above-described emotional-reaction information. The ambient parameters include background noise and ambient light parameters. Background noise parameters include levels of one or more of noise volume, tone, pitch, and source location. Ambient light parameters include levels of one or more of light intensity, color, and source location. As would be appreciated by one of ordinary skill in the art, there are various ways to record and correlate these parameters with audiences watching a particular program. One way would be to simply add fields for tracking the desired parameters by frame to the records of Table 800 of FIG. 8.

VD/AM system 100 may recommend, or automatically take, action to mitigate deleterious effects of ambient noise and light. For example, some background noise may be drowned out by turning on a source of white noise, such as a fan (not shown), and raising the volume of display 102. If VD/AM system 100 detects background noise of particular parameter levels, then VD/AM system 100 may automatically raise the volume of display 102 and either have a fan at location 106 turned on or recommend to the viewer or viewers present to turn on a fan or similar device.

VD/AM system 100 may analyze aggregate audience emotional-reaction data in relation to ambient parameters and make ambient-parameter recommendations or automatically control noise-suppression and/or lighting devices based on the results of the analysis in order to enhance users' viewing experiences. VD/AM system 100 may fine tune its recommendations and/or automatic adjustments for display and/or ambient parameters for a particular audience by giving greater weight to reaction data generated by audiences having preferences and/or reactions similar to those of the particular audience.

In one implementation, if VD/AM system 100 determines that a viewer's facial expression indicates confusion or difficulty in hearing, then VD/AM system 100 raises the volume of display 102 or suggests a volume increase to the viewer. If VD/AM system 100 further detects a steady noise source, then VD/AM system 100 determines the dominant frequencies of the steady noise source and increases the volume of display 102 at the determined dominant frequencies.

In one implementation, VD/AM system 100 analyzes the generated emotional reaction and ambient parameters data by program genre to determine whether a significant correlation exists between any emotional reaction level and any ambient parameter for any genre. This analysis may be done for audiences or for groups. If such a correlation is found for a particular genre, then VD/AM system 100 recommends or automatically sets ambient parameters for future presentations of programs of the particular genre to enhance desired emotional reactions.

For example, if a user shows more attention to horror movies in bright light, then VD/AM system 100 will either turn on more lights at location 106 or suggest that the viewer do so whenever the user watches a horror movie at location 106. For example, if a user laughs more watching comedies in hot weather when an air conditioner is operating than when the air conditioner is off, then, when presenting future comedies, VD/AM system 100 retrieves current weather information and, if the temperature is above a certain threshold, then VD/AM system 100 turns on the air conditioner at location 106 or suggests that the air conditioner be turned on. Note that VD/AM system 100 may determine the operating status of the air conditioner by analyzing the audio signal picked up by a microphone at location 106 for audio characteristics of an operating air conditioner. VD/AM system 100 may store any determined preferences in the relational database of VD/AM system 100. Note that similar analysis and action may also be performed for the above-described display parameters.

In one implementation, if multiple viewers having stored preferences for display and/or ambient parameters are watching a program on display 102 at location 106, then VD/AM system 100 uses the average parameter values for the multiple viewers in making adjustments and recommendations for the viewing of the program at location 106. If, for example, three viewers at location 106 have volume-level preferences, then the volume is set to the average volume level of the preferences. If, for example, one of the viewers leaves, then VD/AM adjusts the parameters in accordance with the preferences of the remaining viewers at location 106.

In one implementation, if a majority of viewers in a multi-user party (MUP) have a clear preference for a particular program genre, then VD/AM system 100 recommends programs in that genre. If the majority have a preference for one genre but also have an interest in common with the minority, then VD/AM system 100 recommends programming from that genre as being able to potentially please all the viewers in the MUP.

In one implementation, VD/AM system monitors and verifies all the viewers at location 106 to ensure that no unauthorized users are viewing the program presented on display 102. If, for example, VD/AM system 100 identifies one of the users at location 106 as a child, then the presentation of adult material on display 102 is prevented. VD/AM system 100 might identify a user as a child by matching the user's observed facial information, body information, and/or movement information to (i) information stored in the child's profile on remote server 105, if the child has such a profile or (ii) general characteristics for children. If an adult user having an account on VD/AM system 100 is misidentified as a child by VD/AM system 100, then the adult can correct the misidentification by performing an expression or gesture password, as described above, to log into the corresponding account and verify his or her information. Alternatively, VD/AM system 100 may be set up to recognize and collect information from presented photographic identification forms, thereby allowing the misidentified adult to present a photo ID to camera system 103 and have VD/AM system 100 verify the adult's age.

Note that VD/AM system 100 may be set up to require all viewers to verify their age before watching adult content by logging into their accounts or presenting photo IDs. Also note that VD/AM system 100 may further comprise, at location 106, an identification scanner (not shown), such as one made by Biometric Solution, a division of FSS, Inc., of Altoona, PA. The identification scanner may scan encoded information on an ID card or may scan a finger, hand, or eye or other user-body-part for biometric identification and provide that information to VD/AM system controller 101 for further processing, such as correlating to user information stored or retrieved by VD/AM system 100.

If, for example, a content provider designates certain programs as displayable to only authorized viewers, such as paying subscribers, then VD/AM system 100 may be used to enforce such restrictions. If, for example, a restricted program is viewed at location 106 by (i) a viewer authorized to watch the restricted program and (ii) other viewers not so authorized, then VD/AM system 100 identifies the other viewers and determines whether any of the other viewers are authorized to watch the restricted program. If VD/AM system 100 determines that more than a threshold number of unauthorized viewers are viewing the restricted program at location 106, then VD/AM system controller 101 may do one or more of: (i) stop playback of the restricted program, (ii) shut down display 102, (iii) send a notification to the content provider that the threshold number of unauthorized viewers was exceeded at location 106, and (iv) provide notification to the authorized viewer of the unauthorized viewing event.

VD/AM system 100 uses the emotional-reaction data, for both particular programs and program genres, collected from multiple audiences to create groups whose members (individuals or MUPs) share similar emotional response to particular program genres. If a viewer or MUP reacts favorably to a program, then, if the viewer or MUP belongs to a group for the program's genre, then VD/AM system 100 recommends that program to the other members of that group. Note that VD/AM system 100 may also recommend program segments, rather than whole programs, based on the reactions of other group members to the segments. Also note that a viewer' s MUP might belong to a reaction group to which the individual viewer does not belong.

VD/AM system 100 dynamically adds and removes members of genre groups as user reactions to programs are analyzed. Genre groups may be based on single emotional-reaction fields or on combinations of emotional-reaction fields. For example, laughter groups may be based solely on laughter levels, while comedy groups may be based on a combination of laughter and happiness levels, as indicated, for example, in Table 800 of FIG. 8. Note that the combinations of emotional- reaction fields may provide equal weight to each component emotional-reaction type or unequal weights to the component emotional-reaction types.

FIG. 10 shows exemplary reaction- group tracking Table 1000 for reaction groups of audiences. A reaction group is a genre group based on a single emotional- reaction field. Records in Table 1000 comprise three fields: group ID, group type, and audience ID. Every group has as many records as audiences, each record indicating that the particular user or MUP is a member of the corresponding reaction group. The group ID is simply an index for Table 1000. The group type describes the reaction type for the particular group, and the audience ID identifies the corresponding user or MUP as a member of the corresponding reaction group. For example, Table 1000 shows that group "00012" is a laughter group comprising users Jane Doe and Mary Johnson, whose user IDs are "00005" and "00049," respectively. The presence of Jane Doe and Mary Johnson in the same laughter group means that their laughter reactions to programs they have both watched have been determined by VD/AM system 100 to be similar. VD/AM system 100 generates records summarizing the reactions of users to segments of programs in order to facilitate maintenance of reaction-group information. FIG. 11 shows exemplary reaction-group-summary Table 1100 for groups from Table 1000 of FIG. 10. Records in Table 1100 comprise five fields: group ID, program ID, start frame, end frame, and reaction average. The group and program IDs identify the corresponding group and program. The start frame and end frame fields define a particular segment of the corresponding program. The reaction average shows the average emotional-reaction level of group members over the corresponding segment of the program. The particular emotional reaction average is determined by the group type of the corresponding group. For example, Table 1100 shows average reaction values for four one-minute segments of program "000020." Note that, while typically the start-frame and end-frame fields will define regular-length and consecutive segments, the program segments are not so limited. Program segments in Table 1100 may be non-contiguous or overlapping. Segments may also be of irregular length. For example, a content provider may wish to set a program's segments to correspond to selected scenes in the program in order to make comparisons based on scenes rather than regular time intervals. Note that, since groups can correspond to composite emotions that are combinations of the basic responses recorded in Table 800 of FIG. 8, corresponding reaction- average levels in Table 1100 would also be averages of composite emotions. Also note that, generally, user reactions may be compared to either group averages or the reactions of other users or MUPs.

FIG. 12 shows exemplary audience-comparison Table 1200, showing the result of a comparison of the emotional-reaction responses of two audiences over a segment of a program, where, as noted above, an audience may consist of either an individual user or a MUP. Records in Table 1200 comprise thirteen fields: audience ID 1, audience ID 2, program ID, start frame, end frame, and average square differences for happiness, laughter, anger, tension, disgust, surprise, fear, and engagement. The audience ID fields identify the two audiences whose reactions are compared in the record. The program ID identifies the corresponding program. The start frame and end frame identify the corresponding segment of the program. The average square difference fields indicate, for each emotional-reaction type, the average value for the squares of the difference between the emotional-reaction values of the two audiences over the frames of the corresponding segment. For example, Table 1200 shows various sample average squares of differences between reactions of Jane Doe and John Brown to the entirety of program "000020," as indicated by a start frame of "00:00:00:00" and an end frame of "00:27:29:29." The averages of squares of differences are obtained by calculating, for each frame of program "000020," the differences, for each reaction type, between reactions of user "00005" and user "00006." Then the differences are squared. Then the average value for each reaction type is calculated over all the frames of the defined segment. Note that the actual steps do not need to be performed in the manner described since, for example, running averages may be used and updated as each frame is processed so as not to require caching records for multitudes of frames before calculating the requisite averages.

FIG. 13 shows flowchart 1300 for an exemplary process for forming and updating reaction groups that is performed by VD/AM system 100 of FIG. 1. The process starts when some audience, referred to generically here as Audience A, finishes watching a program, referred to generically here as Program 1 (step 1301). VD/AM system 100 then determines whether any other audiences have watched Program 1 (step 1302). If no other audiences have watched Program 1 (step 1302), then the process ends (step 1303). In other words, if Audience A is the first audience to watch Program 1, then there are no comparisons to be made with other audiences' reactions to Program 1. Note that Audience A's reaction information is automatically stored in Table 800 of FIG. 8 as Audience A watches Program 1 and that information will be available for comparison with the reactions of future viewers of Program 1. Note that, if multiple audiences simultaneously watch a broadcast program and, consequently, simultaneously finish watching the program, then the audiences are organized into a sequence and processed in order. The sequencing may be arbitrary or may be based on some predetermined formula.

If, in step 1302, it is determined that other audiences have watched Program 1, then VD/AM system 100 generates a Group List comprising the reaction groups that have reaction information for Program 1 (step 1304). Note that the Group List includes both (i) groups to which Audience A belongs and (ii) groups to which Audience A does not belong. Each set of groups is treated differently - VD/AM system 100 determines, based on Audience A's reactions to Program 1, (i) whether Audience A should be removed from any groups to which Audience A belongs and (ii) whether Audience A should be added to any groups to which Audience A does not belong. Note that the groups of the Group List to which Audience A belongs are processed in section 1321, while the groups of the Group List to which Audience A does not belong are processed in section 1322.

After the Group List is generated, VD/AM system 100 determines whether Audience A is a member of any groups appearing in the Group List (step 1305). If yes (step 1305), then VD/AM system 100 goes through the groups of the Group List that Audience A belongs to and determines whether to remove Audience A from any of the groups. The variable "Group X" is used to designate a particular group considered by VD/AM system 100 in an iteration of a process loop. Group X is set to the next group in the Group List that Audience A belongs to (step 1306). Consequently, the first time step 1306 is performed, Group X is set to be the first group in the Group List that Audience A belongs to. The second time step 1306 is performed, Group X is set to the second group in the Group List that Audience A belongs to, if such a group exists.

If there is no next group in the Group List that Audience A belongs to, then the process moves to step 1312; otherwise, the group represented by Group X is removed from the Group List (step 1306). This removal is done so that, after all the groups in the Group List that Audience A belongs to are processed, the Group List contains only the groups of the Group List that Audience A did not belong to, thereby both (i) facilitating the performance of operations for the groups in the Group List that Audience A does not belong to and (ii) avoiding determining in section 1322 whether Audience A should be added to groups that Audience A was removed from in section 1321. As would be appreciated by one of ordinary skill in the art, other methods are usable to achieve similar ends.

It is then determined whether more than N_\% and N₂ of the audiences of Group X watched Program 1, where N_\ and N₂ are threshold number values (step 1307). This is done in order to avoid removing Audience A from a group where either too small a number or too small a percentage of group members have watched Program 1. If step 1307 determines that less than N_\% of audiences of Group X and fewer than /V₂ audiences of Group X watched Program 1, then the process returns to step 1306; otherwise, the process proceeds to step 1308. For every segment tracked for Program 1 and Group X (e.g. , a record of Table 1100 of FIG. 11), VD/AM system 100 retrieves the corresponding reaction average, determines the reaction average for Audience A's corresponding reactions during the corresponding segment of Program 1, calculates the square of the differences of the retrieved and determined reaction values, and then determines the average value of the squares of the differences for all the segments of Program 1 (step 1308).

If the average-value result of step 1308 is not more than threshold value N₃, then the process returns to step 1306; otherwise, the process proceeds to step 1310 (step 1309). If the average of the squares of differences is greater than threshold N , then Audience A is removed from Group X (step 1310) since the difference indicates that Audience A' s reaction levels are no longer sufficiently similar to those of other audiences of Group X. Following the addition or removal of an audience from a reaction group, a member purge, which is explained below, is performed for the group. Consequently, following step 1310, a member purge is performed for Group X (step 1311), followed by a return to step 1306.

If the determination of step 1305 is negative or if all the groups of the Group List to which Audience A belongs have been processed, then the process moves to step 1312 for processing of groups in the Group List to which Audience A does not belong in order to determine whether, based on Audience A's reactions to Program 1, Audience A should be added to any group of the Group List to which Audience A does not belong. Note that, as explained above, the Group List now contains the groups that Audience A did not belong to before the processing of section 1321. Again, "Group X" is used as a variable to designate the particular group being considered by VD/AM system 100 in an iteration of a processing loop. VD/AM system 100 sets Group X to the next group in the Group list. If there is no next group, then the process moves to step 1316 (step 1312). Otherwise, for every segment tracked for Program 1 and Group X, VD/AM system 100 retrieves the corresponding reaction average, determines the reaction average for Audience A' s corresponding reactions during the corresponding segment of Program 1, calculates the square of the differences of the retrieved and determined reaction values, and then determines the average value of the squares of the differences for all the segments of Program 1 (step 1312).

If the average-value result of step 1312 is not less than threshold N₄, then the process returns to step 1312; otherwise, the process proceeds to step 1314 (step 1313). If the average of the squares of differences is less than threshold N₄, then Audience A is added to Group X (step 1314) since the below-threshold difference indicates that Audience A's reaction levels are sufficiently similar to those of other members of Group X. Following step 1314, a member purge is performed for Group X (step 1315), followed by a return to step 1312.

Once all the groups of the Group List are processed, the process moves to step

1316 to determine whether any new groups should be formed. VD/AM system 100 determines whether Audience A is now a member of any groups (step 1316), as can be determined, for example, by a query of Table 1000 of FIG. 10. If it is determined that Audience A belongs to one or more groups, then the process goes to step 1303 to terminate. Otherwise, VD/AM system 100 generates an Audience List of audiences that have watched Program 1 and who are not members of any reaction groups (step 1317). The variable "Audience X" is used to designate a particular audience considered by VD/AM system AM in an iteration of a processing loop.

Audience X is set to the next audience in the Audience List, where, if there is no next audience in the Audience List, then the process goes to step 1303 for termination (step 1318). For each reaction type of each frame of Program 1, VD/AM system 100 calculates the squares of the differences of the values between Audience A and Audience X and then determines the average of the squares of the differences for each reaction type for Program 1 (step 1319) to generate a record such as the sample record in Table 1200 of FIG. 12. For each reaction type, if the average of squares of differences is less than a threshold N5, then a new group is formed for the corresponding reaction type, where the group comprises Audience A and Audience X (step 1320). The process then returns to step 1318. For example, if user "00005" is Audience A, program "000020" is Program 1, users "00005" and "00006" do not belong to any groups, N5 is set to 4.0, and the sample record in Table 1200 represents the result of step 1319, then four new groups comprising users "00005" and "00006" would be created - for happiness, laughter, fear, and engagement. In one implementation, VD/AM system 100 excludes from comparison frames of Program 1 where Audience A was not present or otherwise not paying attention to display 102. In one implementation, the process shown in flowchart 1300 of FIG. 13 is modified so that, if Audience A would be eligible to join more than one group for a particular reaction type, Audience A joins only the reaction group that has the smallest average difference from Audience A' s reactions. In other words, Audience A joins only the group most similar to Audience A and not every group Audience A is sufficiently similar to. In another implementation, audiences are placed in the respective reaction groups most similar to them, regardless of whether the differences between an audience's reactions and the group' s reactions are below a threshold level. Note that, after additions or removals of members from a group, the group average values, such as shown in Table 1100 of FIG. 11, are recalculated to reflect the new average member values.

As noted above, when members are added or removed from a group, a member purge is performed. A member purge starts with a determination of whether the group has more than a threshold number Ne of members. If not, then the purge terminates; otherwise, a next member is selected, and the member' s reactions corresponding to the reaction type and programs of the group are compared to the average group values for the reaction type and programs of the group. An average of differences for the program segments is determined and if that average is greater than a threshold value N , then the first member is removed from the group. The process then returns to the above-described determination of whether the group has more than N members to determine whether further purging should be performed.

An alternative purge process calculates the average difference for each member from the group averages, then orders the members by decreasing average difference, and only then determines whether to purge members. In this way, if only some of the purge-eligible members can be purged before the group becomes too small, then the purge-eligible members having the greatest average difference from the group-average values are the ones purged.

In one implementation, VD/AM system 100 performs normalization of audiences' reaction levels. For example, if an audience' s range of recorded reaction level in a particular category after a threshold number of frames is 0-6 where the scale goes from 0 to 9, then the audience's recorded reactions are normalized by multiplying each recorded reaction level for the particular category by 1.5 (= 9/6). Any group averages for groups the audience belongs to are then recalculated using the newly normalized values for the audience.

In one implementation, VD/AM system 100 has general groups in addition to the reaction- specific groups, where membership in general groups is based on similarities across a combination of reaction types.

Embodiments of the invention have been described where emotional reactions are determined based on facial expressions and/or body gestures. In an alternative embodiment, other biometric measures, such as heart rate, magnetic -resonance imaging (MRI) images, or other body scan images are used in conjunction with or instead of the above-described facial-expression and/or body-gesture recognition systems.

Remote server 105 of FIG. 1 has been described as performing various functions. It should be noted that remote servers 105 may comprise a single computer or may comprise multiple computers working in conjunction, where the multiple computers may be co-located or located at different sites. Also, viewer-reaction assessment has been described as performed by VD/AM system controller 101, but, in an alternative implementation, may instead be performed by remote server 105, where VD/AM system controller 101's processing comprises the transmission of video data from camera system 103, or a derivative of that video data, to remote server 105 for viewer-reaction assessment.

In one alternative embodiment, VD/AM system 100 of FIG. 1 keeps track of additional location information for location 106, such as location address, ZIP code, or latitude and longitude. Also, VD/AM system 100 may track location information differently for mobile displays than for stationary displays.

Embodiments of the invention have been described as using a relational database comprising certain tables. In one alternative embodiment, tables having different structures and/or altogether different tables are used. In another alternative embodiment, another system for storing, retrieving, and updating data is used.

Embodiments of the invention have been described where the media presented to audiences includes video. In an alternative embodiment, VD/AM system 100 of FIG. 1 provides audio-only content for which display 102 only outputs sound. In one implementation, display 102 outputs only audio content and does not output video content.

An embodiment of the invention has been described where VD/AM system 100 of FIG. 1 tracks a collection of emotional reactions for every frame of content. In one alternative embodiment, intervals other than frames are used to record emotional reactions. In another alternative embodiment, composite emotions, based on combinations of the emotional reactions, are determined and recorded along with or instead of the individual emotional reactions.

An embodiment of the invention has been described where Table 800 of FIG.

8 records, for MUPs, both the reactions of the members of the MUP and the composite reaction for the MUP as a unique audience. In one alternative embodiment, Table 800 records only the individual MUP members' reactions, and VD/AM system 100 of FIG. 1 calculates the corresponding composite MUP reaction as necessary. In a different alternative embodiment, Table 800 records only the composite reaction for the MUP and VD/AM system 100 neither uses nor maintains the individual MUP members' reactions.

References herein to the verb "to set" and its variations in reference to values of fields do not necessarily require an active step and may include leaving a field value unchanged if its previous value is the desired value. Setting a value may nevertheless include performing an active step even if the previous or default value is the desired value.

Unless indicated otherwise, the term "determine" and its variants as used herein refer to obtaining a value through measurement and, if necessary, transformation. For example, to determine an electrical-current value, one may measure a voltage across a current-sense resistor, and then multiply the measured voltage by an appropriate value to obtain the electrical-current value. If the voltage passes through a voltage divider or other voltage-modifying components, then appropriate transformations can be made to the measured voltage to account for the voltage modifications of such components and to obtain the corresponding electrical- current value. As used herein in reference to data transfers between entities in the same device, and unless otherwise specified, the terms "receive" and its variants can refer to receipt of the actual data, or the receipt of one or more pointers to the actual data, wherein the receiving entity can access the actual data using the one or more pointers.

Exemplary embodiments have been described wherein particular entities

(a.k.a. modules) perform particular functions. However, the particular functions may be performed by any suitable entity and are not restricted to being performed by the particular entities named in the exemplary embodiments.

Exemplary embodiments have been described with data flows between entities in particular directions. Such data flows do not preclude data flows in the reverse direction on the same path or on alternative paths that have not been shown or described. Paths that have been drawn as bidirectional do not have to be used to pass data in both directions.

References herein to the verb "to generate" and its variants in reference to information or data do not necessarily require the creation and/or storage of new instances of that information. The generation of information could be accomplished by identifying an accessible location of that information. The generation of information could also be accomplished by having an algorithm for obtaining that information from accessible other information.

As used herein in reference to an element and a standard, the term

"compatible" means that the element communicates with other elements in a manner wholly or partially specified by the standard, and would be recognized by other elements as sufficiently capable of communicating with the other elements in the manner specified by the standard. The compatible element does not need to operate internally in a manner specified by the standard.

The present invention may be implemented as circuit-based processes, including possible implementation as a single integrated circuit (such as an ASIC or an FPGA), a multi-chip module, a single card, or a multi-card circuit pack. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing steps in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer. The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD- ROMs, hard drives, or any other non-transitory machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, stored in a non-transitory machine-readable storage medium including being loaded into and/or executed by a machine, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.

It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the scope of the invention as expressed in the following claims.

Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments. The same applies to the term "implementation."

Unless explicitly stated otherwise, each numerical value and range should be interpreted as being approximate as if the word "about" or "approximately" preceded the value of the value or range. As used in this application, unless otherwise explicitly indicated, the term "connected" is intended to cover both direct and indirect connections between elements.

For purposes of this description, the terms "couple," "coupling," "coupled," "connect," "connecting," or "connected" refer to any manner known in the art or later developed in which energy is allowed to be transferred between two or more elements, and the interposition of one or more additional elements is contemplated, although not required. The terms "directly coupled," "directly connected," etc., imply that the connected elements are either contiguous or connected via a conductor for the transferred energy.

The use of figure numbers and/or figure reference labels in the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as limiting the scope of those claims to the embodiments shown in the corresponding figures.

The embodiments covered by the claims in this application are limited to embodiments that (1) are enabled by this specification and (2) correspond to statutory subject matter. Non-enabled embodiments and embodiments that correspond to nonstatutory subject matter are explicitly disclaimed even if they fall within the scope of the claims.

Although the steps in the following method claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those steps, those steps are not necessarily intended to be limited to being implemented in that particular sequence.

Claims

CLAIMS We claim:

1. A system for outputting media content and monitoring audience reactions, the system comprising:

a display (e.g., 102) adapted to output a first media-content program to a first audience of one or more users;

a camera subsystem (e.g., 103) adapted to generate monitoring video of the first audience; and

a control subsystem (e.g., 101, 105) adapted to process the monitoring video from the camera system and control provision of media content to the display, wherein: the control subsystem processes the monitoring video to characterize reactions of the first audience to the first program;

the control subsystem characterizes similarity of (i) the reactions of the first audience to the first program to (ii) reactions of a second audience to the first program; and

the control subsystem includes the first audience and the second audience in a first reaction group if the similarity characterization satisfies a specified similarity condition.

2. The system of claim 1, wherein:

the control subsystem controls subsequent media content based on subsequent reactions to a second media-content program of one or more audiences, other than the first audience, of the first reaction group; and

the subsequent media content provided to the display for the first audience comprises or refers to the second program.

3. The system of claim 1, wherein at least one of the first and second audiences comprises at least two users.

4. The system of claim 1, wherein:

the first audience comprises at least two users; and

the control subsystem processes the monitoring video to: separately characterize reactions of each of the at least two users; and average the separately characterized reactions of each of the at least two users to characterize the reactions of the first audience.

5. The system of claim 1, wherein:

the system further comprises, at a remote location (e.g., 904), a second display (e.g., 902) and a second camera subsystem (e.g., 903);

the second display is adapted to output, simultaneously with the display, the first program to at least one remote user of the first audience, the at least one remote user located at the remote location;

the second camera subsystem is adapted to generate monitoring video of the at least one remote user;

the control subsystem is further adapted to:

process the monitoring video from the second camera system to characterize reactions of the at least one remote user to the first program;

control the display based on the characterization of the reactions of the at least one remote user;

control the second display based on the characterization of the reactions of the users of the first audience not at the remote location.

6. The system of claim 1, wherein:

the media content is one of: (i) audio-visual content, (ii) audio-only content, and (iii) visual-only content.

7. The system of claim 1, wherein:

the control subsystem comprises a database;

the control subsystem simultaneously measures reaction parameters for at least two different reaction types for the first audience;

the database comprises an audience-reaction table (e.g., 800);

the media content comprises consecutive segments; and

the control subsystem's processing of the monitoring video comprises adding at least one record to the audience-reaction table for every media-content segment, the at least one record indicating the measured at least two reaction parameters for the first audience.

8. The system of claim 7, wherein, if the first audience comprises two or more users, then the control subsystem's processing of the monitoring video comprises adding one record to the audience-reaction table for every media-content segment, the one record indicating the respective averages of the measured at least two reaction parameters for the two or more users of first audience.

9. The system of claim 7, wherein, if the first audience comprises M users, where is two or greater, then the control subsystem's processing of the monitoring video comprises adding at least M records to the audience-reaction table for every media- content segment, each of the M records indicating the measured at least two reaction parameters for a corresponding different user of the first audience.

10. The system of claim 7, wherein:

the records of the audience-reaction table for the first audience and the segments of the first program form a first record set;

the audience-reaction table comprises a second record set comprising records for the second audience and each segment of the first program, each record indicating simultaneously measured reaction parameters for the at least two different reaction types for the second audience;

the similarity characterization comprises:

for each segment of the first program and for each of the at least two reaction parameters, determining the square of the difference between the corresponding reaction parameters in the first record set and the second record set; and

then, for each of the at least two reaction types, determining the average of the determined corresponding squares of the differences.

11. The system of claim 1, wherein the control subsystem removes the first audience from a second reaction group, which comprises a plurality of audiences, if a second similarity characterization of (i) the reactions of the first audience to the first program to (ii) the averaged reactions of the second-reaction-group of a plurality of audiences to the first program satisfies a second specified similarity condition.

12. The system of claim 11, wherein, if the second reaction group satisfies a first group condition, then the control subsystem performs a reaction- group purge of the second reaction group after removing the first audience from the second reaction group.

13. The system of claim 1, wherein the control subsystem adds the first audience to a third reaction group, which comprises a plurality of audiences, if a third similarity characterization of (i) the reactions of the first audience to the first program to (ii) the averaged reactions of the third-reaction- group plurality of audiences to the first program satisfies a third specified similarity condition.

14. The system of claim 13, wherein, if the third reaction group satisfies a second group condition, then the control subsystem performs a reaction- group purge of the third reaction group after adding the first audience to the third reaction group.

15. The system of claim 1, wherein:

the reactions of the first audience include emotional reactions and body gestures; emotional reactions include at least one of happiness, laughter, anger, tension, disgust, surprise, fear, and engagement; and

body gestures include at least one of focusing on a point on the display, raising a hand, lowering a hand, sitting, standing, lying, sleeping, clapping, high-five-giving, hugging, and hitting.

16. The system of claim 15, wherein:

the display has output characteristics including power, volume, and brightness; the control subsystem modifies the display's output characteristics if the control subsystem characterizes one of the reactions of the first audience to the first program as a body gesture matching a first body-gesture condition.

17. The system of claim 16, wherein, if the first audience comprises at least two users, then the first body-gesture condition includes the sub-condition that the control subsystem determine that a majority of the at least two users are making substantially the same body gesture for modifying the display's output characteristics.

18. The system of claim 15, wherein:

the display has output characteristics including power, volume, and brightness; the control subsystem modifies the display's output characteristics if the control subsystem characterizes one of the reactions of the first audience to the first program as an emotional reaction matching a first emotional-reaction condition.

19. A method for outputting media content and monitoring audience reactions, the method comprising:

outputting a first media-content program to a first audience of one or more users; generating monitoring video of the first audience;

processing the monitoring video from the camera system to characterize reactions of the first audience to the first program;

controlling provision of the first program to the display;

characterizing similarity of (i) the reactions of the first audience to the first program to (ii) reactions of a second audience to the first program; and

including the first audience and the second audience in a first reaction group if the similarity characterization satisfies a specified similarity condition.

20. A system for outputting media content and monitoring audience reactions, the system comprising:

means for outputting a first media-content program to a first audience of one or more users;

means for generating monitoring video of the first audience;

means for processing the monitoring video from the camera system to characterize reactions of the first audience to the first program;

means for controlling provision of the first program to the display; means for characterizing similarity of the reactions of (i) the first audience to the first program to (ii) reactions of a second audience to the first program; and

means for including the first audience and the second audience in a first reaction group if the similarity characterization satisfies a specified similarity condition.