WO2001099325A2 - Interactive video and watermark enabled video objects - Google Patents

Interactive video and watermark enabled video objects Download PDF

Info

Publication number
WO2001099325A2
WO2001099325A2 PCT/US2001/019254 US0119254W WO0199325A2 WO 2001099325 A2 WO2001099325 A2 WO 2001099325A2 US 0119254 W US0119254 W US 0119254W WO 0199325 A2 WO0199325 A2 WO 0199325A2
Authority
WO
WIPO (PCT)
Prior art keywords
video
information
watermark
objects
user
Prior art date
Application number
PCT/US2001/019254
Other languages
French (fr)
Other versions
WO2001099325A3 (en
Inventor
Tyler J. Mckinley
Kenneth L. Levy
Geoffrey B. Rhoads
Original Assignee
Digimarc Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/597,209 external-priority patent/US6411725B1/en
Application filed by Digimarc Corporation filed Critical Digimarc Corporation
Priority to AU2001266949A priority Critical patent/AU2001266949A1/en
Publication of WO2001099325A2 publication Critical patent/WO2001099325A2/en
Publication of WO2001099325A3 publication Critical patent/WO2001099325A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • G06T1/0085Time domain based watermarking, e.g. watermarks spread over several images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/467Embedding additional information in the video signal during the compression process characterised by the embedded information being invisible, e.g. watermarking

Definitions

  • the invention relates to multimedia signal processing, and in particular relates to encoding information into and decoding information from video objects.
  • Stepganography refers to methods of hiding auxiliary information in other information.
  • Audio and video watermarking are examples of steganography.
  • Digital watermarking is a process for modifying media content to embed a machine-readable code into the data content.
  • a media signal such as an image or audio signal, is modified such that the embedded code is imperceptible or nearly imperceptible to the user, yet may be detected through an automated detection process.
  • digital watermarking is applied to media such as images, audio signals, and video signals. However, it may also be applied to other types of data, including documents (e.g., through line, word or character shifting), software, multi-dimensional graphics models, and surface textures of objects.
  • Digital watermarking systems have two primary components: an embedding component that embeds the watermark in the media content, and a reading component that detects and reads the embedded watermark.
  • the embedding component embeds a watermark by altering data samples of the media content.
  • the reading component analyzes content to detect whether a watermark is present. In applications where the watermark encodes information, the reader extracts this information from the detected watermark.
  • the invention provides methods and systems for associating video objects in a video sequence with object specific actions or information using auxiliary information embedded in video frames or audio tracks.
  • a video object refers to a spatial and temporal portion of a video signal that depicts a recognizable object, such as a character, prop, graphic, etc.
  • Each frame of a video signal may have one or more video objects.
  • the auxiliary information is embedded in video or audio signals using "steganographic" methods, such as digital watermarks.
  • the watermarks transform video objects into "watermark enabled" video objects that provide information, actions or links to additional information or actions during playback of a video or audio-visual program.
  • a similar concept may be applied to audio objects, i.e. portions of audio that are attributable to a particular speaker, character, instrument, artist, etc.
  • One aspect of the invention is a method for encoding substantially imperceptible auxiliary information about a video object into a video signal that includes at least one video object.
  • the method steganographically encodes object specific information about the video object into the video signal. Some examples of this information include identifiers and screen locations of corresponding video objects.
  • the method associates the object specific information with an action. This action is performed automatically or in response to user selection of the video object through a user interface while the video signal is playing.
  • Another aspect of the invention is a method for encoding substantially imperceptible auxiliary information into physical objects so that the information survives the video capture process and links the video to an action.
  • This method steganographically encodes auxiliary information in a physical object in a manner that enables the auxiliary information to be decoded from a video signal captured of the physical object.
  • One example is to place a watermarked image on the surface of the object.
  • the method associates the auxiliary information with an action so that the video signal captured of the physical object is linked to the action.
  • an action is retrieving and displaying information about the object.
  • the watermark may act as a dynamic link to a web site that provides information about the object.
  • Another aspect of the invention is a method for using a watermark that has been encoded into a video signal or in an audio track accompanying the video signal.
  • the watermark conveys information about a video object in the video signal.
  • the method decodes the information from the watermark, receives a user selection of the video object, and executes an action associated with the information about the video object.
  • One example of an action is to retrieve a web site associated with the video object via the watermark.
  • the watermark may include a direct (e.g., URL or network address) or indirect link (e.g., object identifier) to the web site.
  • the object identifier may be used to look up a corresponding action, such as issuing a request to a web server at a desired URL.
  • Object information returned to the user (e.g., web page) may be rendered and superimposed on the same display as the one displaying the video signal, or a separate user interface.
  • the system includes an encoder for encoding a watermark in a video sequence or accompanying audio track corresponding to a video object or objects in the video sequence. It also includes a database system for associating the watermark with an action or information such that the watermark operable to link the video object or objects to a related action or information during playback of the video sequence.
  • Another aspect of the invention is a system for processing a watermark enabled video object in a video signal.
  • the system comprises a watermark decoder and rendering system.
  • the watermark decoder decodes a watermark carrying object specific information from the video signal and linking object specific information to an action-or information.
  • the rendering system renders the action or information.
  • Another aspect of the invention is a method for encoding substantially imperceptible auxiliary information into an audio track of a video signal including at least one video object.
  • This method steganographically encodes object specific information about the video object into the audio track. It also associates the object specific information with an action, where the action is performed in response to user selection of the video object through a user interface while the video signal is playing. Alternatively, the action can be performed automatically as the video is played.
  • Fig. 1 A is a flow diagram depicting a process for encoding and decoding watermarks in content to convey auxiliary information 100 about video objects in the content.
  • Fig. IB illustrates a framework outlining several alternative implementations of linking video objects with actions or information.
  • Fig. 2 is a flow diagram depicting a video creation process in which physical objects are pre-watermarked in a manner that survives video capture and transmission.
  • Fig. 3 is a flow diagram of a video creation process that composites watermarked video objects with a video stream to create a watermarked video sequence.
  • Fig. 4 illustrates an embedding process for encoding auxiliary information about video objects in a video stream.
  • Fig. 5 is a diagram depicting yet another process for encoding auxiliary information about video objects in a video stream.
  • Fig. 6 depicts an example watermark encoding process.
  • Fig. 7 is a diagram depicting decoding processes for extracting watermark information from video content and using it to retrieve and render external information or actions.
  • Fig. 8 illustrates an example configuration of a decoding process for linking video objects to auxiliary information or actions.
  • Fig. 9 illustrates another example configuration of a decoding process for linking video objects to auxiliary information or actions.
  • Fig. 10 is an overview of a personalized interactive video system.
  • Fig. 11 is an overview of a network-enabled embodiment.
  • Fig. 12 is an overview of a dumb terminal embodiment.
  • Fig. 13 is an overview of an intelligent PD embodiment.
  • a video object refers to a video signal depicting an object of a scene in a video sequence.
  • the video object is recognizable and distinguishable from other imagery in the scene.
  • the video object exists in a video sequence for some duration, such as a contiguous set of video frames.
  • a single image instance in a frame corresponding to the object is a video object layer.
  • the video object may comprise a sequence of natural images that occupy a portion of each frame in a video sequence, such as a nearly static talking head or a moving athlete.
  • the video object may be a computer generated rendering of a graphical object that is layered with other renderings or natural images to form each frame in a video sequence.
  • the video object may encompass an entire frame.
  • Fig. 1 A is a flow diagram depicting a process for encoding and decoding watermarks in content to convey auxiliary information 100 about video objects in the content.
  • An embedding process 102 encodes the auxiliary information into a watermark embedded in the video content.
  • a transmitter 104 then distributes the content to viewers, via broadcast, electronic file download over a network, streaming delivery over a network, etc.
  • a receiver 106 captures the video content and places it in a format from which a watermark decoder 108 extracts the auxiliary information.
  • a display 110 displays the video to a viewer.
  • a user interface 114 executes and provides visual, audio, or audio-visual information to the user indicating that the video is embedded with auxiliary information or actions.
  • This user interface may be implemented by superimposing graphical information over the video on the display 110.
  • the decoder can pass auxiliary object information to a separate device, which in turn, executes a user interface. In either case, the user interface receives input from the user, selecting a video object. In response, it performs an action associated with the selected object using the auxiliary object information decoded from the watermark.
  • the watermark may carry information or programmatic action. It may also link to external information or an action, such as retrieval and output of information stored elsewhere in a database, website, etc.
  • Watermark linking enables the action associated with the watermark to be dynamic. In particular, the link embedded in the content may remain the same, but the action or information it corresponds to may be changed.
  • Watermark linking of video objects allows a video object in a video frame to trigger retrieval of information or other action in response to selection by a user. Watermark embedding may be performed at numerous and varied points of the video generation process.
  • the watermark can be embedded immediately into a video object layer after a graphical model is rendered to the video object layer, allowing a robust and persistent watermark to travel from the encoding device or computer to any form of playback of video containing the video object.
  • an actor filmed against a green screen can be embedded directly after the film is transferred to digital format for effects generation, preventing the need to later extract the actor from the background to embed only his image.
  • the ubiquitous pop-up screen that appears next to the news anchor's head can be embedded before the newscast allowing the viewer to click on that image to take them to extra information from a website.
  • Watermarks may be embedded in broadcast video objects in real time.
  • An example is watermarking NBA basketball players as a game is broadcast allowing the view to click on players and receive more information about them.
  • a decoding process may be inserted to decode information about the video object from a watermark embedded in the video signal. This information may then be used to trigger an action, such as fetching graphics and displaying it to the user.
  • the watermark information may be forwarded to a database, which associates an action with the watermark information.
  • a database which associates an action with the watermark information.
  • One form of such a database is detailed in co-pending application 09/571,422, which is hereby incorporated by reference. This database looks up an action associated with watermark information extracted from content.
  • Fig. IB illustrates a system architecture outlining several alternative implementations of linking video objects with actions or information.
  • This diagram divides the system into a creation side, where content is created and encoded, and an end user side, where content and watermark enabled information or actions are rendered.
  • the diagram shows examples of three watermark types and two watermark protocols.
  • type one the watermark is embedded in a physical object before it is recorded in a video signal.
  • type two the watermark is encoded in a video object after it is recorded but before it is broadcast, possibly during a video editing process.
  • this type of watermark may be encoded in a video object of an actor captured in front of a greenscreen as he moves through a scene.
  • the watermark is added as the video is being captured for a live event, such as watermarking a video object depicting the jersey of a basketball player as a video stream is being captured of a game.
  • the watermark is encoded in the video frame area of the desired object, such as where the jersey of the basketball player appears on the video display screen.
  • the watermark is encoded throughout a video frame or corresponding segment of an audio track, and includes information about the object and its location. For example, during the basketball game, the watermark is embedded in the audio track and includes location, size and identification for player 1, then player 2, then player 3, and back to player 1 if he is still in the scene or onto player 2 or 3, etc.
  • Internet connectivity can be included in the video display device or associated set-top box or in a portable display device, such as a personal laptop.
  • the rendering of the linked information can occur on the video display, possibly using picture-in- picture technology so others can still see the original video, or in the portable display device, such as a laptop since Internet browsing can be a personal experience.
  • User interaction with the system such as selecting the object to find linked information can happen with the video display, such as pointing with a remote, or with a portable display device, such as using a mouse on a laptop.
  • Specific implementations can include a variety of combination of these components.
  • the embedding process encodes one or more watermarks into frames of a video sequence, or in some cases, an audio track that accompanies the video sequence. These watermarks carry information about at least one video object in the sequence, and also create an association between a video object and an action or external information.
  • the association may be formed using a variety of methods.
  • One method is to encode an object identifier in a watermark. On the decoding side, this identifier is used as a key or index to an action or information about a video object.
  • the identifier may be a direct link to information or actions (e.g., an address of the information or action), or be linked to the information or actions via a server database.
  • Another method is to encode one object identifier per frame, either in the frame or corresponding audio track segment. Then, the system sends a screen location selected by a user and the identifier to the server.
  • the object identifier plays a similar role as the previous method, namely, it identifies the object.
  • the location information may be used along with the object identifier to form an index into a database to look up a database entry corresponding to a video object.
  • the watermark may contain several identifiers and corresponding locations defining the screen location of a related video object.
  • the screen location selected by the user determines which identifier is sent to the server for linked information or actions.
  • a process at the end-user side maps the location of the user selection to an identifier based on the locations encoded along with the identifiers in the content.
  • a segment of the audio track that is intended to be played with a corresponding video frame or frame sequence may include a watermark or watermarks that carry one or more pairs of identifier and locations. These watermarks may be repeated in audio segments synchronized with video frames that include corresponding linked video objects.
  • the identifier closest to the location of the user interaction is used.
  • a modification includes providing bounding locations in the watermark and determining whether the user's selection is within this area, as opposed to using the closest watermark location to the user's selection.
  • context information available at decoding time may be used to create an association between a video object in a frame and a corresponding action or information in a database.
  • the frame number, screen coordinates of a user selection, time or date may be used in conjunction with information extracted from the watermark to look up a database entry corresponding to a video object in a video sequence.
  • the manner in which the embedded data is used to create an association between video objects and related information or actions impacts how that data is embedded into each frame. For example, if the watermark includes location information, an object identifier can be embedded throughout the frame in which the corresponding object resides, rather than being located in a portion of the frame that the object overlaps. If the frame includes two or more linked video objects, the watermark conveys an object identifier and location for each of the video objects.
  • Additional decoding side issues impacting the encoding process include: 1) enabling the user to select video objects during playback; and 2) mapping a user's input selecting a video object to the selected video object.
  • the user can select a video object in various ways. For example, gestural input devices, such as a mouse, touch screen, etc. enable the user to select a video object by selecting a screen location occupied by that object. The selected location can then be mapped to information extracted from a watermark, such as an object identifier. The object identifier of a video object that overlaps the selected location can be looked up based on location codes embedded in the watermark or by looking up the object identifier extracted from a watermark in a video object layer at the selected location.
  • a user interface on the decoding side provides additional information about watermarked video objects, like graphical icons, menus, etc.
  • the user can select a video object by selecting a graphic, menu item, or some other user interface element associated with that object.
  • graphics or menu items including gestural input devices, keyboards, speech recognition, etc. This approach creates an additional requirement that the decoding side extract watermark information and use it to construct a graphical icon or menu option to the user.
  • the decoding process may derive the information needed for this user interface from the video content, from a watermark in the content, or from out-of-band auxiliary data. In the latter two cases, the embedding process encodes information into the content necessary to generate the user interface on the decoding side.
  • An example will help illustrate an encoding process to facilitate user selection of video objects on the decoding side.
  • a watermark encoder encodes a short title (or number) and location of marked video objects into the video stream containing these objects.
  • the decoding process can extract the title and location information, and display titles at the locations of the corresponding video objects.
  • the display of this auxiliary information can be implemented using small icons or numbers superimposed on the video during playback, or it can be transmitted to a separate device from the device displaying the video.
  • the video receiver can decode the information from the video stream and send it via wireless transmission to an individual user's hand held computer, which in turn, displays the information and receives the user's selection.
  • FIGs. 2-5 illustrate some examples.
  • physical objects 200 are pre-watermarked in a manner that survives the video capture process 202.
  • digital to analog conversion e.g., printing a digital image on a physical object
  • analog to digital conversion e.g., capture via a video camera
  • US Patent 5,862,260 and in co-pending patent application 09/503,881, filed February 14, 2000.
  • These approaches are particularly conducive but not limited to applications where the objects are largely flat and stationary, such as billboards, signs, etc.
  • the video capture process records the image on the surface of these objects, which is encoded with a watermark.
  • the resulting video is then transmitted or broadcast 204.
  • a video creation process composites watermarked video objects 300 with a video stream 302 to create a watermarked video sequence.
  • the watermark may be encoded into video object layers. Examples of watermark encoding and decoding technology are described in US Patent 5,862,260, and in co-pending applications 09/503,881, filed February 14, 2000, and WO 99/10837.
  • a compositing operation 304 overlays each of the video objects onto the video stream in depth order.
  • each of the objects has depth and transparency information (e.g., sometimes referred to as translucency, opacity or alpha).
  • the depth information indicates the relative depth ordering of the layers from a viewpoint of the scene (e.g., the camera position) to the background.
  • the transparency indicates the extent to which pixel elements in a video object layer allow a layer with greater depth to be visible.
  • the video generated from the compositing operation sequence is broadcast or transmitted to viewers.
  • the video objects may be encoded with watermarks as part of a compression process.
  • the MPEG 4 video coding standard specifies a video compression codec in which video object layers are compressed independently.
  • the video object layers need not be composited before they are transmitted to a viewer.
  • an MPEG 4 decoder decompresses the video object layers and composites them to reconstruct the video sequence.
  • the watermark may be encoded into compressed video object layers by modulating DCT coefficients of intra or interframe macroblocks. This watermark can be extracted from the DCT coefficients before the video objects are fully decompressed and composited.
  • Fig. 4 illustrates another embedding process for encoding auxiliary information about video objects in a video stream 400.
  • a user designates a video object and the auxiliary information to be encoded in the video object via a video editing tool 402.
  • a watermark encoding process 404 encodes the auxiliary information into the content.
  • a transmitter 406 then transmits or broadcasts the watermarked content to a viewer.
  • the watermark encoder may encode auxiliary information throughout the entire video frame in which at least one marked video object resides.
  • the user may specify via the editing tool the location of two or more video objects by drawing a boundary around the desired video objects in a video sequence.
  • the encoding process records the screen location information for each object in the relevant frames and associates it with the auxiliary information provided by the user, such as an object identifier.
  • the encoder then creates a watermark message for each frame, including the screen location of an object for that frame and its object identifier. Next, it encodes the watermark message repeatedly throughout the frame.
  • An alternative approach is to encode auxiliary information for an object in the screen location of each frame where a video object layer for that object resides (described fully in figure 6 below).
  • Fig. 5 is a diagram depicting yet another process for embedding auxiliary information about video objects in a video stream. This process is similar to the one shown in Fig. 4, except that the position of video objects is derived from transmitters 500-504 attached to the real world objects depicted in the video scene and attached to video cameras.
  • the transmitters emit a radio signal, including an object identifier.
  • Radio receivers 506 at fixed positions capture the radio signal and provide information to a pre-processor 508 that triangulates the position of each transmitter, including the one on the active camera, and calculates the screen location of each transmitter in the video stream captured by the active camera.
  • the active camera refers to the camera that is currently generating the video stream 510 to be broadcast or transmitted live (or recorded for later distribution). In a typical application, there may be several cameras, yet only one is selected to provide the video stream 510 at a given time.
  • an encoding process 512 selects video objects for which auxiliary information is to be embedded in the video stream.
  • the selection process may be fully or partially automated.
  • a programmed computer selects objects whose screen location falls within a predetermined distance of the 2D screen extents of a video frame, and whose location does not conflict with the location of other objects in the video frame.
  • a conflict may be defined as one where two or more objects are within a predetermined distance of each other in screen space in a video frame. Conflicts are resolved by assigning a priority to each object identifier that controls which video object will be watermark enabled in the case of a screen location conflict.
  • the user may select one or more video objects in frames of the video stream to be associated with embedded watermark information via a video editing system 514.
  • the video editing system may be implemented in computer software that buffers video frame data and associated screen location information, displays this information to the user, and enables the user to edit the screen location information associated with video objects and select video objects for watermark encoding.
  • a watermark encoding process 516 proceeds to encode an object identifier for each selected object.
  • the watermark may be encoded in screen locations and frames occupied by a corresponding video object.
  • object identifiers and corresponding screen location information may be encoded throughout the video frames (or in the audio track of an audio visual work).
  • a transmitter 518 transmits or broadcasts the video stream to viewers.
  • the video stream may also be stored, or compressed and stored for later distribution, transmission or broadcast.
  • the watermarks carrying object identifiers, and other object information, such as screen location information, may be encoded in uncompressed video or audio, or in compressed video or audio.
  • Fig. 6 depicts an example watermark encoding process that may be used in some of the systems described in this document. Depending on the implementation, some of the processing is optional or performed at different times.
  • the watermark encoding process operates on a video stream 600. In some cases the stream is compressed, segmented into video object layers, or both compressed and segmented into video objects as in some video content in MPEG 4 format.
  • the encoder buffers frames of video, or segmented video objects (602).
  • the encoder embeds a different watermark payload into different portions of video frames corresponding to the screen location of the corresponding video objects. For example, in frames containing video object 1 and video object 2, the encoder embeds a watermark payload with an object identifier for object 1 in portions of the frames associated with object 1 and a watermark payload with object identifier for object 2 in portions of the frames associated with object 2.
  • the watermark protocol including the size of the payload, control bits, error correction coding, and orientation/synchronization signal coding can be the same throughout the frame. The only difference in the payloads in this case is the object specific data.
  • a variation of this method may be used to encode a single watermark payload, including identifiers and screen locations for each watermark enabled object, throughout each frame. While this approach increases the payload size, there is potentially more screen area available to embed the payload, at least in contrast to methods that embed different payloads in different portions of a frame.
  • the encoder optionally segments selected video object instances from the frames in which the corresponding objects reside.
  • An input to this process includes the screen locations 606 of the objects.
  • the screen locations may be provided by a user via a video editing tool, or may be calculated based on screen location coordinates derived from transmitters on real world objects.
  • the screen extents may be in a coarse form, meaning that they do not provide a detailed, pixel by pixel definition of the location of a video obj ect instance.
  • the screen extents may be as coarse as a bounding rectangle or a polygonal shape entered by drawing a boundary around an object via a video editing tool.
  • Automated segmentation may be used to provide refined shape, such as binary mask.
  • Several video object segmentation methods have been published, particularly in connection with object based video compression.
  • the implementer may select a suitable method from among the literature that satisfies the demands of the application. Since the watermark encoding method may operate on blocks of pixels and does not need to be precise to the pixel level due to human interaction, the segmentation method need not generate a mask with stringent, pixel level accuracy.
  • video objects are provided in a segmented form.
  • Some examples of these implementations are video captured of a physical object (e.g., actor, set, etc.) against a green screen, where the green color of the screen helps distinguish and define the object shape (e.g., a binary mask where a given green color at a spatial sample in a frame indicates no object, otherwise, the object is present).
  • the encoder computes a bounding region for each object (608), if not already available.
  • the bounding region of a video object instance refers to a bounding rectangle that encompasses the vertical and horizontal screen extents of the instance in a frame.
  • the encoder expands the extents to an integer multiple of a watermark block size (610).
  • the watermark block size refers to a two dimensional screen space in which the watermark corresponding to a video object, or set of objects, is embedded in a frame at a given encoding resolution.
  • the watermark encoder then proceeds to embed a watermark in non-transparent blocks of the bounding region.
  • a non-transparent block is a block within the bounding region that is not overlapped by the video object instance corresponding to the region.
  • the watermark for each block includes an object specific payload, such as an object identifier, as well as additional information for error correction and detection, and signal synchronization and orientation.
  • the synchronization and orientation information can include message start and end codes in the watermark payload as well as a watermark orientation signal used to synchronize the detector and compensate for changes in scaling, translation, aspect ratio changes, and other geometric distortions.
  • an object specific watermark may be encoded throughout a bounding rectangle of the object.
  • This approach simplifies encoding to some extent because it obviates the need for more complex segmentation and screen location calculations. However, it reduces the specificity with which the screen location of the watermark corresponds to the screen location of the video object that it is associated with.
  • Another alternative that gives fine screen location detail, yet simplifies watermark encoding is to embed a single payload with object identifiers and detailed location information for each object. This payload may be embedded repeatedly in blocks that span the entire frame, or even in a separate audio track.
  • the watermark signal may create visible artifacts if it remains the same through a sequence of frames.
  • One way to combat this is to make the watermark signal vary from one frame to the next using a frame dependent watermark key to generate the watermark signal for each block.
  • Image adaptive gain control may also be used to reduce visibility.
  • auxiliary information embedded in a watermark in the video content 700, 702
  • user selection of watermark enabled information or actions 704
  • 3) determining information or actions associated with a video object 706
  • rendering watermarked enabled information or actions to the user 708
  • Rendering may include generating visual, audio or audio-visual output to present information and options for selecting more information or actions to the user, executing a program or machine function, or performing some other action in response to the watermark data.
  • the first process extracts auxiliary information, such as object identifiers and screen locations, from the video stream or an accompanying audio track.
  • the next process implements a user interface to indicate to the user that the video has watermark enabled objects and to process user input selecting watermark enabled information or actions.
  • the third process determines the information or action associated with a selected video object.
  • the fourth renders watermarked enabled information or actions to the user.
  • a decoder may operate continuously or in response to a control signal to read auxiliary information from a watermark, look up related information or actions, and display it to the user.
  • Continuous decoding tends to be less efficient because it may require a watermark decoder to operate on each frame of video or continuously screen an audio track.
  • a more efficient approach is to implement a watermark screen that invokes a watermark decoder only when watermark data is likely to be present.
  • a control signal sent in or with the video content can be used to invoke a watermark decoder.
  • the control signal may be an in-band signal embedded in the video content, such as a video or audio watermark.
  • a watermark detector may look for the presence of a watermark, and when detected, initiate a process of decoding a watermark payload, accessing information or actions linked via an object identifier in the payload, and displaying the linked information or actions to the user.
  • the control signal may be one or more control bits in a watermark payload decoded from a watermark signal.
  • the control signal may also be an out-of-band signal, such as tag in a video file header, or a control signal conveyed in a sub-carrier of a broadcast signal.
  • the control signal can be used to reduce the overhead of watermark decoding operations to instances where watermarked enabled objects are present.
  • the decoder need only attempt a complete decoding of a complete watermark payload when the control signal indicates that at least one video object (e.g., perhaps the entire frame) is watermark enabled.
  • the control signal may trigger the presentation of an icon or some other visual or audio indicator alerting the user that watermark enabled objects are present. For example, it may trigger the display of a small logo superimposed over the display of the video. The viewer may then select the icon to initiate watermark decoding.
  • the watermark decoder proceeds to detect watermarks in the video stream and decode watermark payloads of detected watermarks. Additionally, when watermark payloads for one or more objects are detected, the user interface can present object specific indicators alerting the user about which objects are enabled. The user can then select an indicator to initiate the processes of determining related information or actions and presented the related information or actions to the user. Another way to reduce watermark decoding overhead is to invoke watermark decoding on selected portions of the content in response to user selection.
  • the decoder may be invoked on portions of frames, a series of frames, or a portion of audio content in temporal or spatial proximity to user input.
  • the decoding process may focus a watermark decoding operation on a spatial region around a screen location of a video display selected by the user.
  • the user might issue a command to look for enabled content, and the decoding process would initiate a watermark detector on frames of video or audio content in temporal proximity to the time of the user's request.
  • the decoding process may buffer frames of the most recently received or played audio or video for the purpose of watermark screening in response to such requests.
  • One configuration is video player with an interactive user interface that displays video content and implements watermark enabled features.
  • the player decodes the watermark, displays video content, and enables the user to select video objects via its interactive user interface.
  • the player may have a local database for looking up the related information or action of an identifier extracted from a video object.
  • Fig. 8 illustrates an example configuration of a decoding process for linking video objects to auxiliary information or actions.
  • a local processing system e.g., PC, set-top box, stand-alone device
  • a router 802 that communicates with the local processing system via a network 803 such as the Internet
  • a web server 804 that communicates with the local processing system and the router via the network.
  • the local processing system may be implemented in a variety of consumer electronic devices such as a personal computer (PC), set-top box, wireless telephone handset, television, etc.
  • the router and web server may similarly be implemented in a variety of systems.
  • the router and web server are implemented in server computers. For these applications, communication of data among the local processing system, router and server may be performed using network protocols, such as TCP/IP, and other application level protocols such as XML, HTTP, and HTML.
  • the local processing system 800 receives a video stream 806 via a receiver 808.
  • the type of receiver depends on the nature of the video transmission, such as Internet download or streaming delivery, satellite broadcast, cable television broadcast, television broadcast, playback from portable storage device such as VHS tape, DVD, etc.
  • an appropriate device such as network adapter, satellite dish, tuner, DVD driver, etc. receives the content and converts it to a video signal.
  • This process may also included decompressing a compressed video file.
  • the watermark may be encoded and decoded from compressed video or audio, such as MPEG 4 video objects or audio.
  • the local processing system renders the video content 810.
  • the rendering process includes converting the video signal to a format compatible with the video controller in the computer and writing the video to video memory in the video controller 812.
  • the video controller 812 then displays the video signal on a display device 814.
  • the local processing system buffers frames (816) of audio or video for watermark detecting and decoding.
  • the buffering may be integrated with rendering the video to video memory or may be implemented as a separate process (e.g., allocating separate video buffers in main memory or video memory).
  • the buffer may store frames of compressed video content or decompressed video content from which watermarks are detected and decoded.
  • a watermark detector screens the buffered content for the presence of a watermark (818). If a watermark is present, it sends a message to a user interface application 820, which in turn, generates a graphical logo or other visual or audio signal that indicates the presence of watermarked enabled video objects.
  • a watermark decoder 822 reads one or more watermark payloads from the content. As noted above, the decoder may be triggered by one or more of the following events: 1) the detector finding the presence of a watermark; 2) an out-of-band control signal instructing the decoder to detect and decode a watermark; 3) user selection of the graphical logo, etc.
  • the user interface 820 In addition to displaying an indicator of watermark enabled objects, the user interface 820 also manages input from the user for selecting video objects and for controlling the display of information associated with selected video objects.
  • the user interface can be implemented as an interactive display with graphics that respond to input from a gestural input device, such as a mouse or other cursor control device, touch screen, etc. This interactive display is superimposed on the display of the video stream.
  • a gestural input device such as a mouse or other cursor control device, touch screen, etc.
  • This interactive display is superimposed on the display of the video stream.
  • the user selects a video object by placing a cursor over the video object on the display and entering input, such as clicking on a mouse.
  • the watermark payload contains information for each watermark enabled object in the video content, along with a location codes specifying screen locations of the objects.
  • the decoder preferably decodes the watermark payload in response to detecting presence of a watermark and stores the payload for the most recently displayed video content.
  • the decoder receives the coordinates of the user selection and finds the corresponding location code in the watermark payload information that defines a screen area including those coordinates.
  • the location code is specified at a reference frame resolution, and the user selection coordinates are normalized to this reference resolution.
  • video frames contain one or more watermarks, the payloads in those watermarks are specific to the video objects in which they are embedded.
  • One approach to decoding the video frame is to decode watermark payloads for each watermark detected in the frame, and then store screen location data indicating the location of the watermark containing that payload. The screen coordinates of a user's selection can then be mapped to a payload, and specifically to the object identifier in the payload, based on the screen location data of the watermark.
  • Another approach to decoding is to execute a decode operation on a specific temporal and spatial region in proximity to the temporal and spatial coordinates of a user selection.
  • the temporal coordinates correspond to a frame or set of frames
  • the spatial coordinates correspond to a two-dimensional region in the frame of set of frames.
  • the watermark decoder can enhance the user's chances of selecting a watermarked enabled object by providing graphical feedback in response to user selection of the video frame or object within the frame. For example, the decoder can give the user interface the screen coordinates of areas where a watermark has been detected. Screen areas that correspond to different watermark payloads or different object locations as specified within a watermark payload can be highlighted in different color or some other graphical indicator that distinguishes watermark enabled objects from unmarked objects and each other.
  • the decoder forwards an object identifier (824) for the video object at the selected location to the server 802 via a network interface 826.
  • the decoder may also provide additional information from the watermark or context information from the local processing system. For Internet applications, the decoder sends a message including this information to the server in XML format using HTTP. Before forwarding the message, the user interface may be designed to prompt the user with a dialog box requesting the user to confirm that he or she does want additional information.
  • the network interface 826 forwards the message to the server 802 over the network.
  • the network interface corresponds to the device and accompanying programming that sends and receives data over a communication link.
  • the network interface may be a cellular telephone transceiver.
  • the network interface may be a satellite dish.
  • combinations of technologies may be used for transmitting and receiving functions, such as sending data via telephone network using a modem or network adapter, and receiving data via a satellite dish.
  • the server in response to receiving the message (828), parses it and extracts an index used to look up a corresponding action in a database (830) that associates many such indices to corresponding actions.
  • the index may include the object identifier and possibly other information, such as time or date, a frame identifier of the selected object, its screen location, user information (geographic location, type of device, and demographic information), etc.
  • Several different actions may be assigned to an index. Different actions can be mapped to an object identifier based on context information, such as the time, date, location, user, etc. This enables the server to provide actions that change with changing circumstances of the viewer, content provider, advertiser, etc.
  • Some examples include returning information and hyperlinks to the user interface 820 (e.g., a web page), forming and forwarding a message to another server (e.g., re-directing an HTTP request to a web server), recording a transaction event with information about the selected object and user in a transaction log, downloading to the local processing system other media such as still image, video or audio content for playback, etc.
  • the transaction server may enable the user to purchase a physical object depicted in the video object via an electronic transaction. It may also enable the user to enter into a contract electronically to obtain usage rights in the video content or related content.
  • the server 802 looks up the address of a web server associated with the index (830). It then forwards an HTTP request (832) to the web server 804 at this address and provides the IP address of the local processing system 800. In addition, it may also include in the HTTP request that the web server may use to tailor a response to the local processing system, such as the object identifier, frame identifier, user demographics, etc.
  • the web server receives the request (834) and returns information to the local processing system (836).
  • This information may include hyperlinks to other information and actions, programs that execute on the local processing system, multimedia content (e.g., music, video, graphics, images), etc.
  • multimedia content e.g., music, video, graphics, images
  • One way to deliver the information is in the form of an HTML document, but other formats may be used as well.
  • the local processing system receives the information from the server 804 through the network and the network interface 826.
  • the decoder operates in conjunction with the user interface application such that the information is addressed to the user interface.
  • a TCP/IP connection is established between the user interface application and the network.
  • the server forwards the information to the IP address of the user interface application.
  • the user interface then formats the information for display and superimposes it onto the video display.
  • the user interface application parses the HTML and formats it for display on display device 814.
  • the rendered HTML is layered onto the video frames in the video memory.
  • the video controller 812 then displays a composite of the HTML and the video data.
  • the user interface processes inputs to these links in a similar fashion as an Internet browser program.
  • the user interface may also implement a set of rules that govern how it presents content returned from the network based on context information. For example, the user interface may keep track of information that a user has scene before and change it or tailor it based on user information or user preferences entered by the user. For example, the user can configure the user interface to display information about certain topics (news categories like sports, business, world affairs, local affairs, entertainment, etc.) or actions (e.g., links to certain categories of electronic buying transactions, video or music downloads, etc.).
  • topics news categories like sports, business, world affairs, local affairs, entertainment, etc.
  • actions e.g., links to certain categories of electronic buying transactions, video or music downloads, etc.
  • the user interface when it receives information and links to actions, it filter the information and links based on user preference and provide only information and links in the user's preference.
  • One potential drawback of the above configuration is that it may create conflicts among viewers. People often watch TV in a shared environment, whereas they work on the Internet in a personal environment. This environment creates a conflict when one viewer selects an object to get information that interferes with another viewer's enjoyment of the video program.
  • the system may be configured to have the decoding process in a TV, set-top box, or other receiver 900 of a video stream.
  • the decoder may then transmit watermark IDs, locations, and potentially other context information to the PD 902.
  • the decoder may be located in the PD.
  • the PD may be equipped with a microphone that captures the audio signal emitted from the speaker of the television.
  • the PD digitizes the audio signal and extracts watermarks from it, which include object information used to link video objects to information or actions.
  • the object information may include object identifiers and location codes for video objects in the video program.
  • the PD may also include a camera, and perform similar actions on watermarks in the video frames.
  • Two parts of this configuration are: 1) a transmitting device like the television 900 shown in Fig. 9, set-top box, etc., and 2) a receiving PD 902 such as a personal digital assistant (PDA) with a wireless connection to the Internet, or a remote control.
  • the receiving PD can perform the functions of enabling the user to select a video object, retrieving the linked information or actions for the selected object, and rendering them on its user interface.
  • a PD with a communication link e.g., infrared, radio, etc.
  • the receiving PD acts solely as a user control device of the transmitting device that enables the user to select an object and communicates the selection back to the transmitting device.
  • the transmitting device in response to the user selection, retrieves linked information or actions for the selected object and renders them.
  • a remote control with a user interface (e.g., display and cursor control device for selecting objects) and a two-way communication link with the transmitting device (e.g., infrared, radio, etc.).
  • the transmitter could be a stand-alone device or part of a set-top box that already exists for your TV.
  • the stand-alone device can be a small transmitter that attaches to coaxial cable and transmits a video object identifier and its location during the TV show. If this stand-alone device is connected before the channel has been chosen, it can transmit the IDs and locations for all channels, and the receiving PD can be used to choose the channel you are watching.
  • the receiving PD can transmit an identifier of the channel you are watching to the transmitting device, so it, in turn, only transmits the information for the desired channel.
  • this stand-alone device can be OEM hardware that is added inside the TV by the manufacturer or as a post buying solution (i.e. retro-fit).
  • the set-top box solution may use a Web, Cable or Digital TV set-top box, especially if the existing box is already interactive. Otherwise, OEM hardware could be provided for the set-top box manufacturer.
  • the transmission scheme can use any method, such as IR or radio waves (e.g., Bluetooth wireless communication), to transmit this minimal amount of information.
  • IR ports are advantageous because most laptops and PDAs already have JJR. ports. If the set-top box already has a transmission protocol, the transmission scheme should use that scheme. If this scheme is not applicable with an existing receiving PD, a special attachment can be developed and feed into the receiving PD via existing input devices, such as IR, serial, parallel, USB, or IEEE firewire inputs.
  • Receiving PD The receiving PD may be a laptop computer, Palm pilot, digital cell phone, or an
  • This PD would display the links in their relative location on a screen matching the TV screens aspect ratio. Then, using the PD you can select the desired link, possibly by clicking on the link, pressing the appropriate number key relating to the link number, or saying the link number and using speech recognition (906). Next, the PD sends information about the selected link to a database (e.g., a web server that converts the information into a web page URL and directs the server at this URL to return the corresponding web page to the PD) (908). A user interface application running in the PC then renders the web page (910) on its display. Using this approach, the links are dynamic and the data required to describe a link is minimal. This allows the watermarking and transmitting process to be easier. Most importantly, fewer bits need to be transmitted since only an ID and not the complete link are required.
  • new and hot information can automatically be pushed to the receiving PD, rather than requiring the user to click on the link. For example, if you are watching a basketball game, the current stats of the player with the ball can be pushed. Or, if you are watching a concert, the location on the tour can be presented. This push feature can be always-on or controlled by the user.
  • the configuration shown in Fig. 9 differs from the one shown in Fig. 8 in that decoding of a watermark payload and user selection of a link associated with that payload are performed on separate devices.
  • the functions of receiving and rendering video content, decoding watermark from the content, and linking to information and actions based on the watermark payload can be performed on separate devices.
  • Many of the features and applications detailed in connection with Fig. 8 also apply to the configuration shown in Fig. 9.
  • Segmented video streams such as those supported in MPEG 4 allow the film or video editor to extract a video scene element from the background and embed the isolated video object.
  • the watermark encoder marks a video object layer corresponding to the object in some or all frames in which the object is visible.
  • the editor keys in that frame, defines a new element again and begins a batch embedding along each frame of the time sequence.
  • the viewer will watch the movie on DVD, VHS, or some other video signal format and be able to link directly to the Internet or other database online or offline by selecting a watermark enabled video object.
  • the embedding process may embed a live character that has been shot against a greenscreen. This enables a video editor to embed the actor without first extracting him from the background. This video object will later be composited with computer graphics or other live action shot at another time.
  • Watermark embedding technology described above can be integrated with commercially available video compositing software from Discreet Logic, Adobe or Puffin Designs.
  • Watermarks may also be embedded in two dimensional image renderings of still or animated 3D graphical objects.
  • the embedded object can be composited with a video stream to form a video program, such as a movie or television programming. This embedded object stays in the video content when converted to other formats such as DVD or VHS without an additional watermark embedding.
  • graphical objects that link to information or electronic commerce transactions can be added to a video product, such as a movie, when its converted from one format to another.
  • the video content can be watermark enabled when it is placed on a DVD or VHS for mass distribution.
  • Another application is to embed video objects that are static like the basketball backboard or the sportscaster's table or the Jumbotron. This entails masking out the static video object layer in each frame to isolate it from the background in the video sequence. This may be accomplished by creating two separate video feeds from the same camera using one to create the mask for each "frame" and using the other for the actual broadcast signal. The masked area is marked and the two signals are combined and broadcast.
  • the sportscaster's table could also have a watermark on the actual artwork that scrolls in front of it. This persistent watermark would need no additional masking.
  • Another application is to embed video objects such as the players of a game. Using video object segmentation, this application extracts video objects from the background and embeds them in the video stream before broadcast or other distribution.
  • Another method is to generate different video streams, each potentially including a different watermark or watermark payload linking video objects in the corresponding video stream to actions or information.
  • a watermark is embedded in the video captured from a camera that focuses on a particular character, player, or object.
  • a technician selects the video feed from this camera from among feeds from one or more other cameras to be part of the final video program. For example, a camera following a particular player is encoded with an object identifier associated with that player.
  • the technician selects the video feed from this camera (e.g., the Kobe pattern isolated on the Laker's Kobe Bryant) at intervals during a game and carries the watermark enabling the user to click the frame and access a page of a web site like NBA.com, Lakers.com, etc. that provides information about that player. Also, a transparent frame could be overlaid on this camera that the view could not see, but the detector could. Just enough pixels would be sent to detect the image.
  • this camera e.g., the Kobe pattern isolated on the Laker's Kobe Bryant
  • Yet another method is to compute video objects dynamically at video capture by deriving video object position and screen extents (bounding box, binary mask, shape, etc.) from the real world objects being captured.
  • video object position and screen extents bounding box, binary mask, shape, etc.
  • Watermarks may be inserted into graphical objects in 3D animation used in video games to link characters and other objects to information or actions.
  • Dreamcast, Playstation 2, and PC CD-ROM games all have Internet access. Images that are rendered on the fly can be embedded with the watermark. Canned animation and cut scenes are rendered previously with the watermark in them. These can activate special website interaction, or for playing online, this could allow extra interaction between players.
  • the score area on the bottom of the screen is an excellent place to mark before transmission of the video broadcast.
  • Another opportunity to mark is when a player's statistics are shown on the NFL game between plays or during a timeout.
  • the screen cuts from the live broadcast to canned animation that includes a composite of the player's picture and his states. This is an excellent opportunity for watermark embedding.
  • one method is to embed a watermark or watermarks in relatively static portions of the background (e.g., watermarking portions of video frames depicting the turf of a playing field). This method would work well since it is stationary and usually fills a large part of the TV screen.
  • Graphics used in news broadcasts can be linked to information and actions via watermarks.
  • CNN, ABC, NBC, CBS, etc. have used keyed images over the anchor's shoulder for years. They are canned graphics that are composited during the broadcast. These canned graphics can be embedded with watermarks as described above.
  • the virtual billboards displayed advertising from the typical broadcast advertiser. These images can be watermarked to link the virtual billboards to information or actions, like electronic buying opportunities.
  • the History Channel MTV, VH1, TLC, TNN, all have logos that advertise the channel. These logos are sometimes shown throughout the program hour. These logos can be linked to external actions or information by embedding a watermark in either the video signal or the accompanying audio track.
  • Watermarks may be embedded in the images on large physical objects, such as outdoor signs. These outdoor signs could conceivably be marked and detected onscreen. A typical example would be billboards inside a baseball park or football stadium. When video is captured of these physical objects, the watermarked images on these objects is recorded in the video signal. The watermark is later decoded from the video signal and used to link the video signal to an action or information.
  • Video objects representing advertising or promotions may be watermark enabled.
  • an advertiser such as Ford would produce a watermark enabled ad that would pop up specifically for users to click.
  • the promo could be "NFL on ESPN...Brought to You By FORD" and while that logo or graphic spins there for twenty seconds Ford is offering a promotional discount or freebie for all the people that click on it to visit there site during that time.
  • the video programmer could run the video objects many times so people who miss it could get another chance.
  • the watermark decoding system may employ a user interface to enable the user to control activation of watermark enabled features.
  • the decoding process may default to an "alert off status, where the watermark decoder does not alert the user to watermark enabled features unless he or she turns it on.
  • a watermark detector or decoder may alert the user that there are watermark enabled objects present on screen if he/she so chooses.
  • the decoding system may be programmed to allow the user to determine whether or not he/she is alerted to watermarked enabled features, and how often.
  • the decoding system may enable the user to set preferences for certain types of information, like sports, news, weather, advertisements, promotions, electronic transactions.
  • the decoding system then sets up a filter based on preferences entered by the user, and only alert the user to watermark enabled features when those features relate to the user's preferences.
  • Watermark enabled video objects may be linked to electronic commerce and advertising available on the Internet or from some other information server.
  • video objects may be linked to opportunities to rent or by the content currently being viewed or related content.
  • a watermark enabled logo may be overlayed on a video signal (e.g., from a DVD or other video source) to allow the user to access a website to review the movie, purchase the movie (rent to own), rent/buy the sequel, alert the web site that the rented movie has been viewed to help manage inventory, etc.
  • the program may be transformed into an interactive experience.
  • a sitcom program could include watermark enabled video objects at selected points in the broadcast or at the opener that alerted the viewer to get online.
  • Video advertising of products may be watermark enabled to link video objects representing a product or service to additional information or actions, such as electronic buying transactions.
  • a clothing manufacturer could enable all their broadcast ads.
  • Each piece of clothing on the actor may be watermark enabled and linked to the page on the web site to buy the article.
  • Fig. 5 allows watermark tracking by placing locator devices in physical objects.
  • locator devices are placed in physical objects.
  • One example is to place these locators inside the shoes and on the uniforms of professional athletes during games.
  • These locator chips emit a signal that is received and triangulated by detectors on courtside. Each chip has a unique ID to the player.
  • the signal is passed through a computer system integrated into the production room switcher that embeds watermarks into the video stream captured of the player.
  • the players wear at least two transmitters to give location information relative to the camera position. Using this information, a preprocessor derives the screen location of the corresponding video objects. If transmitters get too close to distinguish a video object, the preprocessor prioritizes each video object based on the producer's prior decision.
  • the player's jersey could be watermarked, and used like a pre-marked static object.
  • audio or video watermarks can be used to link video objects to information or actions, so can they link audio objects to related information or actions.
  • portions of the signal are distinguishable and recognizable as representing a particular audio source, such as a person's voice or vocal component of a song, an instrument, an artist, composer, songwriter, etc. Each of these distinguishable components represent audio objects.
  • Watermarks in the audio or accompanying video track can be used to link audio objects to information or actions pertaining to the action.
  • the user selects a portion of the audio signal that includes a watermark enabled audio object, such as by pressing a button when an audio object of interest is currently playing.
  • a watermark linking process maps the user selection to a corresponding audio object.
  • the solution is to provide consumers with their own personal and personal network- capable device (PD) for rendering the auxiliary information and interacting with it.
  • PD personal and personal network- capable device
  • One implementation is to have the TV, set-top box, or proprietary device transmit the auxiliary information to the PD for rendering.
  • This auxiliary information may consist of identifiers that link the user to information via a network server, web links (i.e. URLs) for direct web access, web searches, and so on.
  • the PD handles the network access. It has a network interface, which may include a combination of hardware, firmware and software, to establish a connection with a remote device and to transfer information to and from the PD.
  • the network interface may include, for example, a computer network interface, a wireless telephone transceiver, a cable modem, satellite dish, etc.
  • Another implementation is to have the device receiving the video stream (e.g., a TV tuner)handle the network access and use the PD only for rendering interactive content associated with the video stream. Yet another implementation is to have the PD receive the auxiliary information directly, thus handling all of the interactivity. There are many usages for this solution, such as finding statistics during sporting events, playing along with a game-show, linking auxiliary entertainment to a video, buying advertised items.
  • video stream e.g., a TV tuner
  • One potential drawback of the prior interactive video schemes is that they may create conflicts among viewers. People often watch TV in a shared environment, whereas they interact, such as working on the Internet, in a personal environment. This environment creates a conflict when one viewer selects an object to get information that interferes with another viewer' s enj oyment of the video program.
  • Interactive video includes any type of entertainment that is watched, including broadcast TV, webcasts, and pre-recorded ⁇ deo such as video programs recorded on VHS and DVDs.
  • auxiliary information refers to the information in the video (or audio) that is designed for interactivity.
  • the auxiliary information may be embedded in the video, possibly as a watermark or within VBI frames, or as side band or out-of-band information.
  • the auxiliary information may be identifiers for network server lookups, web page links (i.e. URLs) for direct web access, web searches, raw code such as XML or HTML, and so on.
  • the solution is to provide consumers with their own network- capable personal device (PD) 1110 for interacting with a TV 1100.
  • the PD renders the auxiliary information associated with the video.
  • the PD may or may not have network connectivity, and may or may not be able to retrieve the information directly.
  • the system begins with a TV, set-top box, or other video receiver
  • the receiver 1200 transmits the auxiliary information to the PD 1202.
  • the auxiliary information may need to be decoded.
  • the auxiliary information can be decoded in the receiver 1200 before transmission, or not.
  • a watermark decoder may be located in the PD.
  • the PD may be equipped with a microphone that captures the audio signal emitted from the speaker of the television.
  • the PD digitizes the audio signal and extracts the auxiliary information from the watermark.
  • the PD may also include a camera, and perform similar actions on watermarks in the video frames.
  • This embodiment can be broken into two parts: 1) a transmitting device like the receiver 1200 shown in Fig. 11, and 2) a receiving PD 1202, such as personal digital assistant (PDA) with a wireless connection to the web.
  • a transmitting device like the receiver 1200 shown in Fig. 11, and 2) a receiving PD 1202, such as personal digital assistant (PDA) with a wireless connection to the web.
  • PDA personal digital assistant
  • the transmitter could be a stand-alone device, part of a set-top box that already exists for your TV, or part of the TV.
  • the stand-alone device can be a small transmitter that attaches to coaxial cable and transmits the auxiliary information, such as a video object identifier and its location during the TV show. If this stand-alone device is un-aware of the current channel, such as a device connected to the cable at the TV's input, it can transmit the auxiliary information for all channels, and the receiving PD can be used to choose the channel you are watching.
  • the receiving PD can transmit an identifier of the channel you are watching to the transmitting device, so it, in turn, only transmits the information for the desired channel, as shown with the optional (dotted lines) transmit channel connection between receiver 1200 and PD 1202.
  • a less complex stand-alone solution is to add this stand-alone device after the channel has been chosen, possibly between your VCR or set-top box and your TV, and have it transmit information for the channel you are watching.
  • this stand-alone device can be OEM hardware that is added inside the TV by the manufacturer or as a post buying solution (i.e. retro-fit).
  • the set-top box solution may use a Web, Cable or Digital TV set-top box, especially if the existing box is already interactive. Otherwise, OEM hardware could be provided for the set-top box manufacturer.
  • the transmission scheme can use any method, such as IR or radio waves (e.g., Bluetooth wireless communication), to transmit this minimal amount of information.
  • IR ports are advantageous because most laptops and PDAs already have IR ports. If the set-top box already has a transmission protocol, the transmission scheme should use that scheme. If this scheme is not applicable with an existing receiving PD, a special attachment can be developed and feed into the receiving PD via existing input devices, such as IR, serial, parallel, USB, or IEEE firewire inputs.
  • the receiving PD may be a laptop computer, Palm pilot, digital cell phone, or an Internet appliance (such as a combined PDA/Cell Phone/Audio/Video device).
  • the PD informs the user how to enable the inter-activity. For example, if the video had one enabled link, the PD could display an icon to information for the user. If the video includes interactive game shows, the PD could display a place to enter your guess and current scores. Alternatively, if several video objects were linked, the PD could display the links in their relative screen location on an image matching the TV screens aspect ratio.
  • the PD sends information about the selection to the network and the desired information is returned (1208).
  • direct web link or web search information is transmitted to the PD.
  • identifier is transmitted, and the network (probably the Internet) server transforms the identifier into a web link or desired web information.
  • the links are dynamic and the data required to describe a link is minimal. This allows the watermarking and transmitting process to be easier. Most importantly, fewer bits need to be transmitted since only an ID and not the complete link are required. For example, if selecting an identifier with the goal of displaying an appropriate web site, the selected link is sent via the Internet to a web server that converts the information into a web page URL, and returns the corresponding web page to the PD.
  • the desired information is rendered on PD 1202.
  • a user interface application may be running on the PD that renders the selected web page on the PD's display.
  • this embodiment is similar to the above embodiment, except that a PD 302 does not have network access, such as an Internet connection.
  • the PD 1302 relies on the video receiver 1300, such as a set-top box and TV, for both the auxiliary information and the network connectivity, as shown in box 1308.
  • the interactivity does not need to detract from the video display, and can be solely or mostly displayed on the PD 1302.
  • the receiver 1300 may want to display an icon or some inobtrusive object to let the user know the video is interactive.
  • the PD could be as simple as a text or video enabled remote control.
  • the PD transmits the request to the receiver 300 and receives the network information from the receiver 1300.
  • the PD transmits the request to the receiver 300 and receives the network information from the receiver 1300.
  • a PD 1402 can receive the auxiliary information directly. Therefore, it does not need to communicate with a video receiver 1400.
  • the PD may have a TV tuner included. The user can select the proper channel and the interactivity begins. The channel selection can even be automated using intelligent audio or video synchronization, assuming the PD has a microphone or camera.
  • the PD may have a special receiver for receiving auxiliary information, such as out-of-band data, linked to the current video. In this case, high data rates can be obtained, and the interactivity could be highly controlled. For example, rather than the auxiliary information containing links to web pages, it could include the web page, thus reducing server load and increasing scalability.
  • the PD is connected to the Internet, new and hot information can automatically be pushed to the receiving PD, rather than requiring the user to click on the link. For example, if you are watching a basketball game, the current stats of the player with the ball can be pushed. Or, if you are watching a concert, the tour dates can be presented. This push feature can be always-on or controlled by the user. In addition, if there is a transmission link between the PD and video receiver, the links do not need to be displayed on the PD. The user can use the PD to highlight and select links on the video receiver via key, mouse or other inputs.
  • the first usage involves obtaining sport statistics.
  • An exemplar process is as follows. While the user is watching TV, players that are on the field can be selected by clicking on their representative link that is displayed on the PD.
  • the representative links can be displayed on the PD by displaying an image of the TV with correct aspect ratio, and displaying numbered links inside that image related to the position of the player on the TV.
  • a web page with game relevant information about that player appears on the PD including links to that player's stats and personal web page.
  • the user may browse the Internet or return to the image of the TV to re-synchronize to the game on TV.
  • the PD displays the current game page, like Buffalo versus Green Bay on NFL.com. From that page you can browse further into the Internet, or re-select the NFL logo to re-synchronize with the game.
  • the second usage involves playing along with a TV game show. While watching the TV game show, auxiliary information is transmitted which allows the PD to display fields and buttons for entering guesses and the user's current ranking among other online contestants.
  • the auxiliary information could be HTML code allowing interactivity to occur on the PD with minimal server interaction, or links, direct (i.e. a URL) or dynamic (an identifier resolved by a server), to web sites that are prepared for the game show interactivity.
  • the third usage involves auxiliary entertainment. For example, while watching an
  • the auxiliary information may connect you to a web site that can stream the musician's most recent song, or the song being discussed.
  • the PD will play the song when the auxiliary information is enabled, and can connect you to a web site to purchase the song and find out more about the musician and his/her music.
  • the final usage involves interactive retail.
  • the auxiliary information includes a link to a web page about the car with options to instantaneously purchase the car.
  • the car could be any item, such as perfume, a computer, a jacket, and so on.
  • the item may be displayed during a regular TV show, as opposed to during an advertisement. With something like pizza, the auxiliary information could allow you to order the pizza for delivery from your PD. Concluding Remarks
  • the methods, processes, and systems described above may be implemented in hardware, software or a combination of hardware and software.
  • the watermark encoding processes may be incorporated into a watermark or media signal encoding system (e.g., video or audio compression codec) implemented in a computer or computer network.
  • watermark decoding including watermark detecting and reading a watermark payload, may be implemented in software, firmware, hardware, or combinations of software, firmware and hardware.
  • the methods and processes described above may be implemented in programs executed from a system's memory (a computer readable medium, such as an electronic, optical or magnetic storage device).
  • watermark enabled content encoded with watermarks as described above may be distributed on packaged media, such as optical disks, flash memory cards, magnetic storage devices, or distributed in an electronic file format. In both cases, the watermark enabled content may be read and the watermarks embedded in the content decoded from machine readable media, including electronic, optical, and magnetic storage media.
  • machine readable media including electronic, optical, and magnetic storage media.

Abstract

Watermark in video signals or the accompanying audio track are used to associate video objects in a video sequence with object specific actions or information (604). A video object refers to a spatial and temporal portion of a video signal that depicts a recognizable object, such as a character, prop, graphic, etc. Each frame of a video signal may have one or more video objects (604). The auxiliary information is embedded in video or audio signals using 'steganographic' methods, such as digital watermarks (612). By encoding object specific information into video or an accompanying audio track, the watermarks transform video objects into 'watermark enabled' video objects that provide information, actions links to additional information or actions during playback of a video or audio-visual program. A similar concept may be applied to audio objects, i.e. portions of audio that are attributable to a particular speaker, character, instrument, artist, etc.

Description

Interactive Video and Watermark Enabled Video Objects
Related Application Data
The subject matter of the present application is related to that disclosed in US Patent
5,862,260, and in co-pending U.S. Patent Applications: 09/503,881, filed February 14, 2000;
60/082,228, filed April 16, 1998;
09/292,569, filed April 15, 1999;
60/134,782, filed May 19, 1999;
60/141,538, filed June 28, 1999; 09/343,104, filed June 29, 1999;
60/141,763, filed June 30, 1999;
09/562,517, filed May 1, 2000;
09/531,076, filed March 18, 2000 and
09/571,422, filed May 15, 2000; which are hereby incorporated by reference.
Technical Field
The invention relates to multimedia signal processing, and in particular relates to encoding information into and decoding information from video objects.
Background and Summary "Steganography" refers to methods of hiding auxiliary information in other information. Audio and video watermarking are examples of steganography. Digital watermarking is a process for modifying media content to embed a machine-readable code into the data content. A media signal, such as an image or audio signal, is modified such that the embedded code is imperceptible or nearly imperceptible to the user, yet may be detected through an automated detection process. Most commonly, digital watermarking is applied to media such as images, audio signals, and video signals. However, it may also be applied to other types of data, including documents (e.g., through line, word or character shifting), software, multi-dimensional graphics models, and surface textures of objects.
Digital watermarking systems have two primary components: an embedding component that embeds the watermark in the media content, and a reading component that detects and reads the embedded watermark. The embedding component embeds a watermark by altering data samples of the media content. The reading component analyzes content to detect whether a watermark is present. In applications where the watermark encodes information, the reader extracts this information from the detected watermark.
The invention provides methods and systems for associating video objects in a video sequence with object specific actions or information using auxiliary information embedded in video frames or audio tracks. A video object refers to a spatial and temporal portion of a video signal that depicts a recognizable object, such as a character, prop, graphic, etc. Each frame of a video signal may have one or more video objects. The auxiliary information is embedded in video or audio signals using "steganographic" methods, such as digital watermarks. By encoding object specific information into video or an accompanying audio track, the watermarks transform video objects into "watermark enabled" video objects that provide information, actions or links to additional information or actions during playback of a video or audio-visual program. A similar concept may be applied to audio objects, i.e. portions of audio that are attributable to a particular speaker, character, instrument, artist, etc.
One aspect of the invention is a method for encoding substantially imperceptible auxiliary information about a video object into a video signal that includes at least one video object. The method steganographically encodes object specific information about the video object into the video signal. Some examples of this information include identifiers and screen locations of corresponding video objects. The method associates the object specific information with an action. This action is performed automatically or in response to user selection of the video object through a user interface while the video signal is playing.
Another aspect of the invention is a method for encoding substantially imperceptible auxiliary information into physical objects so that the information survives the video capture process and links the video to an action. This method steganographically encodes auxiliary information in a physical object in a manner that enables the auxiliary information to be decoded from a video signal captured of the physical object. One example is to place a watermarked image on the surface of the object. The method associates the auxiliary information with an action so that the video signal captured of the physical object is linked to the action. One example of an action is retrieving and displaying information about the object. For example, the watermark may act as a dynamic link to a web site that provides information about the object.
Another aspect of the invention is a method for using a watermark that has been encoded into a video signal or in an audio track accompanying the video signal. The watermark conveys information about a video object in the video signal. The method decodes the information from the watermark, receives a user selection of the video object, and executes an action associated with the information about the video object. One example of an action is to retrieve a web site associated with the video object via the watermark. The watermark may include a direct (e.g., URL or network address) or indirect link (e.g., object identifier) to the web site. In the latter case, the object identifier may be used to look up a corresponding action, such as issuing a request to a web server at a desired URL. Object information returned to the user (e.g., web page) may be rendered and superimposed on the same display as the one displaying the video signal, or a separate user interface.
Another aspect of the invention is a system for creating watermark enabled video objects. The system includes an encoder for encoding a watermark in a video sequence or accompanying audio track corresponding to a video object or objects in the video sequence. It also includes a database system for associating the watermark with an action or information such that the watermark operable to link the video object or objects to a related action or information during playback of the video sequence.
Another aspect of the invention is a system for processing a watermark enabled video object in a video signal. The system comprises a watermark decoder and rendering system. The watermark decoder decodes a watermark carrying object specific information from the video signal and linking object specific information to an action-or information. The rendering system renders the action or information.
Another aspect of the invention is a method for encoding substantially imperceptible auxiliary information into an audio track of a video signal including at least one video object. This method steganographically encodes object specific information about the video object into the audio track. It also associates the object specific information with an action, where the action is performed in response to user selection of the video object through a user interface while the video signal is playing. Alternatively, the action can be performed automatically as the video is played. Further features will become apparent with reference to the following detailed description and accompanying drawings.
Brief Description of the Drawings
Fig. 1 A is a flow diagram depicting a process for encoding and decoding watermarks in content to convey auxiliary information 100 about video objects in the content.
Fig. IB illustrates a framework outlining several alternative implementations of linking video objects with actions or information.
Fig. 2 is a flow diagram depicting a video creation process in which physical objects are pre-watermarked in a manner that survives video capture and transmission. Fig. 3 is a flow diagram of a video creation process that composites watermarked video objects with a video stream to create a watermarked video sequence.
Fig. 4 illustrates an embedding process for encoding auxiliary information about video objects in a video stream. Fig. 5 is a diagram depicting yet another process for encoding auxiliary information about video objects in a video stream.
Fig. 6 depicts an example watermark encoding process.
Fig. 7 is a diagram depicting decoding processes for extracting watermark information from video content and using it to retrieve and render external information or actions. Fig. 8 illustrates an example configuration of a decoding process for linking video objects to auxiliary information or actions.
Fig. 9 illustrates another example configuration of a decoding process for linking video objects to auxiliary information or actions.
Fig. 10 is an overview of a personalized interactive video system. Fig. 11 is an overview of a network-enabled embodiment.
Fig. 12 is an overview of a dumb terminal embodiment. Fig. 13 is an overview of an intelligent PD embodiment.
Detailed Description
The following sections detail ways to encode and decode information, actions and links into video objects in a video sequence. A video object refers to a video signal depicting an object of a scene in a video sequence.
To a viewer, the video object is recognizable and distinguishable from other imagery in the scene. The video object exists in a video sequence for some duration, such as a contiguous set of video frames. A single image instance in a frame corresponding to the object is a video object layer. The video object may comprise a sequence of natural images that occupy a portion of each frame in a video sequence, such as a nearly static talking head or a moving athlete. Alternatively, the video object may be a computer generated rendering of a graphical object that is layered with other renderings or natural images to form each frame in a video sequence. In some cases, the video object may encompass an entire frame. In the systems described below, watermarks are encoded and decoded from video or audio tracks for the purpose of conveying information related to the video objects. A watermark encoding process embeds a watermark into an audio or video signal, or in some cases, the physical object that later becomes a video object through video capture. At playback, a decoding process extracts the watermark. Fig. 1 A is a flow diagram depicting a process for encoding and decoding watermarks in content to convey auxiliary information 100 about video objects in the content. An embedding process 102 encodes the auxiliary information into a watermark embedded in the video content. A transmitter 104 then distributes the content to viewers, via broadcast, electronic file download over a network, streaming delivery over a network, etc. A receiver 106 captures the video content and places it in a format from which a watermark decoder 108 extracts the auxiliary information. A display 110 displays the video to a viewer. As the video is being displayed, a user interface 114 executes and provides visual, audio, or audio-visual information to the user indicating that the video is embedded with auxiliary information or actions. This user interface may be implemented by superimposing graphical information over the video on the display 110. Alternatively, the decoder can pass auxiliary object information to a separate device, which in turn, executes a user interface. In either case, the user interface receives input from the user, selecting a video object. In response, it performs an action associated with the selected object using the auxiliary object information decoded from the watermark. The watermark may carry information or programmatic action. It may also link to external information or an action, such as retrieval and output of information stored elsewhere in a database, website, etc. Watermark linking enables the action associated with the watermark to be dynamic. In particular, the link embedded in the content may remain the same, but the action or information it corresponds to may be changed. Watermark linking of video objects allows a video object in a video frame to trigger retrieval of information or other action in response to selection by a user. Watermark embedding may be performed at numerous and varied points of the video generation process. For 3D animation, the watermark can be embedded immediately into a video object layer after a graphical model is rendered to the video object layer, allowing a robust and persistent watermark to travel from the encoding device or computer to any form of playback of video containing the video object.
For special effects, an actor filmed against a green screen can be embedded directly after the film is transferred to digital format for effects generation, preventing the need to later extract the actor from the background to embed only his image. For network or cable broadcast news, the ubiquitous pop-up screen that appears next to the news anchor's head can be embedded before the newscast allowing the viewer to click on that image to take them to extra information from a website.
Watermarks may be embedded in broadcast video objects in real time. An example is watermarking NBA basketball players as a game is broadcast allowing the view to click on players and receive more information about them. Wherever the video is distributed, a decoding process may be inserted to decode information about the video object from a watermark embedded in the video signal. This information may then be used to trigger an action, such as fetching graphics and displaying it to the user. For example, the watermark information may be forwarded to a database, which associates an action with the watermark information. One form of such a database is detailed in co-pending application 09/571,422, which is hereby incorporated by reference. This database looks up an action associated with watermark information extracted from content. One action is to issue a query to a web server, which in turn, returns a web page to the user via the Internet, or some other communication link or network. Fig. IB illustrates a system architecture outlining several alternative implementations of linking video objects with actions or information. This diagram divides the system into a creation side, where content is created and encoded, and an end user side, where content and watermark enabled information or actions are rendered. On the creation side, the diagram shows examples of three watermark types and two watermark protocols. In type one, the watermark is embedded in a physical object before it is recorded in a video signal. In type two, the watermark is encoded in a video object after it is recorded but before it is broadcast, possibly during a video editing process. For example, this type of watermark may be encoded in a video object of an actor captured in front of a greenscreen as he moves through a scene. In type three, the watermark is added as the video is being captured for a live event, such as watermarking a video object depicting the jersey of a basketball player as a video stream is being captured of a game.
In the first protocol, the watermark is encoded in the video frame area of the desired object, such as where the jersey of the basketball player appears on the video display screen. In the second protocol, the watermark is encoded throughout a video frame or corresponding segment of an audio track, and includes information about the object and its location. For example, during the basketball game, the watermark is embedded in the audio track and includes location, size and identification for player 1, then player 2, then player 3, and back to player 1 if he is still in the scene or onto player 2 or 3, etc.
On the end user side, there are two places for network connectivity, rendering of linked information, and user interaction. Internet connectivity can be included in the video display device or associated set-top box or in a portable display device, such as a personal laptop. The rendering of the linked information can occur on the video display, possibly using picture-in- picture technology so others can still see the original video, or in the portable display device, such as a laptop since Internet browsing can be a personal experience. User interaction with the system, such as selecting the object to find linked information can happen with the video display, such as pointing with a remote, or with a portable display device, such as using a mouse on a laptop. Specific implementations can include a variety of combination of these components.
Embedding Processes The embedding process encodes one or more watermarks into frames of a video sequence, or in some cases, an audio track that accompanies the video sequence. These watermarks carry information about at least one video object in the sequence, and also create an association between a video object and an action or external information. The association may be formed using a variety of methods. One method is to encode an object identifier in a watermark. On the decoding side, this identifier is used as a key or index to an action or information about a video object. The identifier may be a direct link to information or actions (e.g., an address of the information or action), or be linked to the information or actions via a server database.
Another method is to encode one object identifier per frame, either in the frame or corresponding audio track segment. Then, the system sends a screen location selected by a user and the identifier to the server. The object identifier plays a similar role as the previous method, namely, it identifies the object. The location information may be used along with the object identifier to form an index into a database to look up a database entry corresponding to a video object. Alternatively, the watermark may contain several identifiers and corresponding locations defining the screen location of a related video object. The screen location selected by the user determines which identifier is sent to the server for linked information or actions. In other words, a process at the end-user side maps the location of the user selection to an identifier based on the locations encoded along with the identifiers in the content. For example, a segment of the audio track that is intended to be played with a corresponding video frame or frame sequence may include a watermark or watermarks that carry one or more pairs of identifier and locations. These watermarks may be repeated in audio segments synchronized with video frames that include corresponding linked video objects. Then, in the decoding process, the identifier closest to the location of the user interaction is used. A modification includes providing bounding locations in the watermark and determining whether the user's selection is within this area, as opposed to using the closest watermark location to the user's selection.
Other context information available at decoding time may be used to create an association between a video object in a frame and a corresponding action or information in a database. For example, the frame number, screen coordinates of a user selection, time or date may be used in conjunction with information extracted from the watermark to look up a database entry corresponding to a video object in a video sequence.
The manner in which the embedded data is used to create an association between video objects and related information or actions impacts how that data is embedded into each frame. For example, if the watermark includes location information, an object identifier can be embedded throughout the frame in which the corresponding object resides, rather than being located in a portion of the frame that the object overlaps. If the frame includes two or more linked video objects, the watermark conveys an object identifier and location for each of the video objects.
Additional decoding side issues impacting the encoding process include: 1) enabling the user to select video objects during playback; and 2) mapping a user's input selecting a video object to the selected video object. The user can select a video object in various ways. For example, gestural input devices, such as a mouse, touch screen, etc. enable the user to select a video object by selecting a screen location occupied by that object. The selected location can then be mapped to information extracted from a watermark, such as an object identifier. The object identifier of a video object that overlaps the selected location can be looked up based on location codes embedded in the watermark or by looking up the object identifier extracted from a watermark in a video object layer at the selected location. If a user interface on the decoding side provides additional information about watermarked video objects, like graphical icons, menus, etc., then the user can select a video object by selecting a graphic, menu item, or some other user interface element associated with that object. There are many ways to select graphics or menu items, including gestural input devices, keyboards, speech recognition, etc. This approach creates an additional requirement that the decoding side extract watermark information and use it to construct a graphical icon or menu option to the user. The decoding process may derive the information needed for this user interface from the video content, from a watermark in the content, or from out-of-band auxiliary data. In the latter two cases, the embedding process encodes information into the content necessary to generate the user interface on the decoding side. An example will help illustrate an encoding process to facilitate user selection of video objects on the decoding side. Consider an example where a watermark encoder encodes a short title (or number) and location of marked video objects into the video stream containing these objects. The decoding process can extract the title and location information, and display titles at the locations of the corresponding video objects. To make the display less obtrusive to the playback of the video, the display of this auxiliary information can be implemented using small icons or numbers superimposed on the video during playback, or it can be transmitted to a separate device from the device displaying the video. For example, the video receiver can decode the information from the video stream and send it via wireless transmission to an individual user's hand held computer, which in turn, displays the information and receives the user's selection.
There a number of different embedding scenarios for encoding information into a video stream to link video objects with information or actions. Figs. 2-5 illustrate some examples. In Fig. 2, physical objects 200 are pre-watermarked in a manner that survives the video capture process 202. For an example of a watermarking process that survives digital to analog conversion (e.g., printing a digital image on a physical object), and then analog to digital conversion (e.g., capture via a video camera), see US Patent 5,862,260, and in co-pending patent application 09/503,881, filed February 14, 2000. These approaches are particularly conducive but not limited to applications where the objects are largely flat and stationary, such as billboards, signs, etc. The video capture process records the image on the surface of these objects, which is encoded with a watermark. The resulting video is then transmitted or broadcast 204.
In the process of Fig. 3, a video creation process composites watermarked video objects 300 with a video stream 302 to create a watermarked video sequence. The watermark may be encoded into video object layers. Examples of watermark encoding and decoding technology are described in US Patent 5,862,260, and in co-pending applications 09/503,881, filed February 14, 2000, and WO 99/10837.
A compositing operation 304 overlays each of the video objects onto the video stream in depth order. To facilitate automated compositing of the video object layers, each of the objects has depth and transparency information (e.g., sometimes referred to as translucency, opacity or alpha). The depth information indicates the relative depth ordering of the layers from a viewpoint of the scene (e.g., the camera position) to the background. The transparency indicates the extent to which pixel elements in a video object layer allow a layer with greater depth to be visible. The video generated from the compositing operation sequence is broadcast or transmitted to viewers. The video objects may be encoded with watermarks as part of a compression process.
For example, the MPEG 4 video coding standard specifies a video compression codec in which video object layers are compressed independently. In this case, the video object layers need not be composited before they are transmitted to a viewer. At the time of viewing, an MPEG 4 decoder decompresses the video object layers and composites them to reconstruct the video sequence. The watermark may be encoded into compressed video object layers by modulating DCT coefficients of intra or interframe macroblocks. This watermark can be extracted from the DCT coefficients before the video objects are fully decompressed and composited.
Fig. 4 illustrates another embedding process for encoding auxiliary information about video objects in a video stream 400. In this embedding process, a user designates a video object and the auxiliary information to be encoded in the video object via a video editing tool 402. A watermark encoding process 404 encodes the auxiliary information into the content. A transmitter 406 then transmits or broadcasts the watermarked content to a viewer.
The watermark encoder may encode auxiliary information throughout the entire video frame in which at least one marked video object resides. For example, the user may specify via the editing tool the location of two or more video objects by drawing a boundary around the desired video objects in a video sequence. The encoding process records the screen location information for each object in the relevant frames and associates it with the auxiliary information provided by the user, such as an object identifier. The encoder then creates a watermark message for each frame, including the screen location of an object for that frame and its object identifier. Next, it encodes the watermark message repeatedly throughout the frame.
An alternative approach is to encode auxiliary information for an object in the screen location of each frame where a video object layer for that object resides (described fully in figure 6 below).
Fig. 5 is a diagram depicting yet another process for embedding auxiliary information about video objects in a video stream. This process is similar to the one shown in Fig. 4, except that the position of video objects is derived from transmitters 500-504 attached to the real world objects depicted in the video scene and attached to video cameras. The transmitters emit a radio signal, including an object identifier. Radio receivers 506 at fixed positions capture the radio signal and provide information to a pre-processor 508 that triangulates the position of each transmitter, including the one on the active camera, and calculates the screen location of each transmitter in the video stream captured by the active camera. The active camera refers to the camera that is currently generating the video stream 510 to be broadcast or transmitted live (or recorded for later distribution). In a typical application, there may be several cameras, yet only one is selected to provide the video stream 510 at a given time.
Next, an encoding process 512 selects video objects for which auxiliary information is to be embedded in the video stream. The selection process may be fully or partially automated. In a fully automated implementation, a programmed computer selects objects whose screen location falls within a predetermined distance of the 2D screen extents of a video frame, and whose location does not conflict with the location of other objects in the video frame. A conflict may be defined as one where two or more objects are within a predetermined distance of each other in screen space in a video frame. Conflicts are resolved by assigning a priority to each object identifier that controls which video object will be watermark enabled in the case of a screen location conflict.
In a partially automated implementation, the user may select one or more video objects in frames of the video stream to be associated with embedded watermark information via a video editing system 514. The video editing system may be implemented in computer software that buffers video frame data and associated screen location information, displays this information to the user, and enables the user to edit the screen location information associated with video objects and select video objects for watermark encoding.
After calculating video object locations and selecting them for watermark encoding, a watermark encoding process 516 proceeds to encode an object identifier for each selected object. The watermark may be encoded in screen locations and frames occupied by a corresponding video object. Alternatively, object identifiers and corresponding screen location information may be encoded throughout the video frames (or in the audio track of an audio visual work).
After watermark encoding, a transmitter 518 transmits or broadcasts the video stream to viewers. The video stream may also be stored, or compressed and stored for later distribution, transmission or broadcast. The watermarks carrying object identifiers, and other object information, such as screen location information, may be encoded in uncompressed video or audio, or in compressed video or audio.
Fig. 6 depicts an example watermark encoding process that may be used in some of the systems described in this document. Depending on the implementation, some of the processing is optional or performed at different times. The watermark encoding process operates on a video stream 600. In some cases the stream is compressed, segmented into video object layers, or both compressed and segmented into video objects as in some video content in MPEG 4 format. The encoder buffers frames of video, or segmented video objects (602).
In this particular example, the encoder embeds a different watermark payload into different portions of video frames corresponding to the screen location of the corresponding video objects. For example, in frames containing video object 1 and video object 2, the encoder embeds a watermark payload with an object identifier for object 1 in portions of the frames associated with object 1 and a watermark payload with object identifier for object 2 in portions of the frames associated with object 2. To simplify decoder design, the watermark protocol, including the size of the payload, control bits, error correction coding, and orientation/synchronization signal coding can be the same throughout the frame. The only difference in the payloads in this case is the object specific data.
A variation of this method may be used to encode a single watermark payload, including identifiers and screen locations for each watermark enabled object, throughout each frame. While this approach increases the payload size, there is potentially more screen area available to embed the payload, at least in contrast to methods that embed different payloads in different portions of a frame.
Next, the encoder optionally segments selected video object instances from the frames in which the corresponding objects reside. An input to this process includes the screen locations 606 of the objects. As noted above, the screen locations may be provided by a user via a video editing tool, or may be calculated based on screen location coordinates derived from transmitters on real world objects. The screen extents may be in a coarse form, meaning that they do not provide a detailed, pixel by pixel definition of the location of a video obj ect instance. The screen extents may be as coarse as a bounding rectangle or a polygonal shape entered by drawing a boundary around an object via a video editing tool.
Automated segmentation may be used to provide refined shape, such as binary mask. Several video object segmentation methods have been published, particularly in connection with object based video compression. The implementer may select a suitable method from among the literature that satisfies the demands of the application. Since the watermark encoding method may operate on blocks of pixels and does not need to be precise to the pixel level due to human interaction, the segmentation method need not generate a mask with stringent, pixel level accuracy.
In some implementations, video objects are provided in a segmented form. Some examples of these implementations are video captured of a physical object (e.g., actor, set, etc.) against a green screen, where the green color of the screen helps distinguish and define the object shape (e.g., a binary mask where a given green color at a spatial sample in a frame indicates no object, otherwise, the object is present).
Next, the encoder computes a bounding region for each object (608), if not already available. The bounding region of a video object instance refers to a bounding rectangle that encompasses the vertical and horizontal screen extents of the instance in a frame. The encoder expands the extents to an integer multiple of a watermark block size (610). The watermark block size refers to a two dimensional screen space in which the watermark corresponding to a video object, or set of objects, is embedded in a frame at a given encoding resolution.
The watermark encoder then proceeds to embed a watermark in non-transparent blocks of the bounding region. A non-transparent block is a block within the bounding region that is not overlapped by the video object instance corresponding to the region. The watermark for each block includes an object specific payload, such as an object identifier, as well as additional information for error correction and detection, and signal synchronization and orientation. The synchronization and orientation information can include message start and end codes in the watermark payload as well as a watermark orientation signal used to synchronize the detector and compensate for changes in scaling, translation, aspect ratio changes, and other geometric distortions.
There are many possible variations to this method. For example, an object specific watermark may be encoded throughout a bounding rectangle of the object. This approach simplifies encoding to some extent because it obviates the need for more complex segmentation and screen location calculations. However, it reduces the specificity with which the screen location of the watermark corresponds to the screen location of the video object that it is associated with. Another alternative that gives fine screen location detail, yet simplifies watermark encoding is to embed a single payload with object identifiers and detailed location information for each object. This payload may be embedded repeatedly in blocks that span the entire frame, or even in a separate audio track.
In some watermark encoding methods, the watermark signal may create visible artifacts if it remains the same through a sequence of frames. One way to combat this is to make the watermark signal vary from one frame to the next using a frame dependent watermark key to generate the watermark signal for each block. Image adaptive gain control may also be used to reduce visibility.
Decoding Processes
There are a variety of system configurations enabling users to access watermark enabled features in video objects. Before giving some examples, we start by defining decoder processes. The examples then illustrate specific system configurations to implement these processes.
As depicted in Fig. 7, there are five principal decoding processes: 1) decoding auxiliary information embedded in a watermark in the video content (700, 702); 2) user selection of watermark enabled information or actions (704); 3) determining information or actions associated with a video object (706); and 4) rendering watermarked enabled information or actions to the user (708). Rendering may include generating visual, audio or audio-visual output to present information and options for selecting more information or actions to the user, executing a program or machine function, or performing some other action in response to the watermark data. The first process extracts auxiliary information, such as object identifiers and screen locations, from the video stream or an accompanying audio track. The next process implements a user interface to indicate to the user that the video has watermark enabled objects and to process user input selecting watermark enabled information or actions. The third process determines the information or action associated with a selected video object. Finally, the fourth renders watermarked enabled information or actions to the user.
Each of these decoding processes need not be implemented in all applications. A decoder may operate continuously or in response to a control signal to read auxiliary information from a watermark, look up related information or actions, and display it to the user. Continuous decoding tends to be less efficient because it may require a watermark decoder to operate on each frame of video or continuously screen an audio track. A more efficient approach is to implement a watermark screen that invokes a watermark decoder only when watermark data is likely to be present. A control signal sent in or with the video content can be used to invoke a watermark decoder. The control signal may be an in-band signal embedded in the video content, such as a video or audio watermark. For example, a watermark detector may look for the presence of a watermark, and when detected, initiate a process of decoding a watermark payload, accessing information or actions linked via an object identifier in the payload, and displaying the linked information or actions to the user. The control signal may be one or more control bits in a watermark payload decoded from a watermark signal. The control signal may also be an out-of-band signal, such as tag in a video file header, or a control signal conveyed in a sub-carrier of a broadcast signal.
The control signal can be used to reduce the overhead of watermark decoding operations to instances where watermarked enabled objects are present. The decoder need only attempt a complete decoding of a complete watermark payload when the control signal indicates that at least one video object (e.g., perhaps the entire frame) is watermark enabled.
The control signal may trigger the presentation of an icon or some other visual or audio indicator alerting the user that watermark enabled objects are present. For example, it may trigger the display of a small logo superimposed over the display of the video. The viewer may then select the icon to initiate watermark decoding. In response, the watermark decoder proceeds to detect watermarks in the video stream and decode watermark payloads of detected watermarks. Additionally, when watermark payloads for one or more objects are detected, the user interface can present object specific indicators alerting the user about which objects are enabled. The user can then select an indicator to initiate the processes of determining related information or actions and presented the related information or actions to the user. Another way to reduce watermark decoding overhead is to invoke watermark decoding on selected portions of the content in response to user selection. For example, the decoder may be invoked on portions of frames, a series of frames, or a portion of audio content in temporal or spatial proximity to user input. For example, the decoding process may focus a watermark decoding operation on a spatial region around a screen location of a video display selected by the user. Alternatively, the user might issue a command to look for enabled content, and the decoding process would initiate a watermark detector on frames of video or audio content in temporal proximity to the time of the user's request. The decoding process may buffer frames of the most recently received or played audio or video for the purpose of watermark screening in response to such requests.
Example Configurations
One configuration is video player with an interactive user interface that displays video content and implements watermark enabled features. In this configuration, the player decodes the watermark, displays video content, and enables the user to select video objects via its interactive user interface. The player may have a local database for looking up the related information or action of an identifier extracted from a video object.
Fig. 8 illustrates an example configuration of a decoding process for linking video objects to auxiliary information or actions. In this configuration, there are three primary systems involved in the decoding process: 1) A local processing system (e.g., PC, set-top box, stand-alone device) 800 responsible for receiving video content, playing it on a display, and decoding watermarks from the content. 2) A router 802 that communicates with the local processing system via a network 803 such as the Internet; and 3) a web server 804 that communicates with the local processing system and the router via the network. The local processing system may be implemented in a variety of consumer electronic devices such as a personal computer (PC), set-top box, wireless telephone handset, television, etc. The router and web server may similarly be implemented in a variety of systems. In typical Internet applications, the router and web server are implemented in server computers. For these applications, communication of data among the local processing system, router and server may be performed using network protocols, such as TCP/IP, and other application level protocols such as XML, HTTP, and HTML.
The local processing system 800 receives a video stream 806 via a receiver 808. The type of receiver depends on the nature of the video transmission, such as Internet download or streaming delivery, satellite broadcast, cable television broadcast, television broadcast, playback from portable storage device such as VHS tape, DVD, etc. In each case, an appropriate device, such as network adapter, satellite dish, tuner, DVD driver, etc. receives the content and converts it to a video signal. This process may also included decompressing a compressed video file. However, as noted above, the watermark may be encoded and decoded from compressed video or audio, such as MPEG 4 video objects or audio. The local processing system renders the video content 810. In a PC, the rendering process includes converting the video signal to a format compatible with the video controller in the computer and writing the video to video memory in the video controller 812. The video controller 812 then displays the video signal on a display device 814.
As the video is being rendered, the local processing system buffers frames (816) of audio or video for watermark detecting and decoding. In a PC, the buffering may be integrated with rendering the video to video memory or may be implemented as a separate process (e.g., allocating separate video buffers in main memory or video memory). Also, depending on the nature of the video signal and encoding process, the buffer may store frames of compressed video content or decompressed video content from which watermarks are detected and decoded.
A watermark detector screens the buffered content for the presence of a watermark (818). If a watermark is present, it sends a message to a user interface application 820, which in turn, generates a graphical logo or other visual or audio signal that indicates the presence of watermarked enabled video objects. A watermark decoder 822 reads one or more watermark payloads from the content. As noted above, the decoder may be triggered by one or more of the following events: 1) the detector finding the presence of a watermark; 2) an out-of-band control signal instructing the decoder to detect and decode a watermark; 3) user selection of the graphical logo, etc.
In addition to displaying an indicator of watermark enabled objects, the user interface 820 also manages input from the user for selecting video objects and for controlling the display of information associated with selected video objects. In a PC environment, the user interface can be implemented as an interactive display with graphics that respond to input from a gestural input device, such as a mouse or other cursor control device, touch screen, etc. This interactive display is superimposed on the display of the video stream. In this environment, the user selects a video object by placing a cursor over the video object on the display and entering input, such as clicking on a mouse.
The specific response to this input depends on the implementation of the watermark decoder and how the content has been watermarked. In one class of implementations, the watermark payload contains information for each watermark enabled object in the video content, along with a location codes specifying screen locations of the objects. In this type of implementation, the decoder preferably decodes the watermark payload in response to detecting presence of a watermark and stores the payload for the most recently displayed video content. In response to user input selecting a video object, the decoder receives the coordinates of the user selection and finds the corresponding location code in the watermark payload information that defines a screen area including those coordinates. The location code is specified at a reference frame resolution, and the user selection coordinates are normalized to this reference resolution.
In another class of implementations, video frames contain one or more watermarks, the payloads in those watermarks are specific to the video objects in which they are embedded. There are a couple of alternative ways of mapping the location of a user selection to a corresponding watermark payload. One approach to decoding the video frame is to decode watermark payloads for each watermark detected in the frame, and then store screen location data indicating the location of the watermark containing that payload. The screen coordinates of a user's selection can then be mapped to a payload, and specifically to the object identifier in the payload, based on the screen location data of the watermark.
Another approach to decoding is to execute a decode operation on a specific temporal and spatial region in proximity to the temporal and spatial coordinates of a user selection. The temporal coordinates correspond to a frame or set of frames, while the spatial coordinates correspond to a two-dimensional region in the frame of set of frames. If the decoder can decode a watermark payload from the region, then it proceeds to extract the object identifier and possibly other information from the payload. If the decoder is unsuccessful in decoding a payload from the region, it may signal the user interface, which in rum, provides visual feedback to the user that the attempt to access a watermark enabled feature has failed, or it may search frames more distant in time from the user's selection for a watermark before notifying the user of a failure.
The watermark decoder can enhance the user's chances of selecting a watermarked enabled object by providing graphical feedback in response to user selection of the video frame or object within the frame. For example, the decoder can give the user interface the screen coordinates of areas where a watermark has been detected. Screen areas that correspond to different watermark payloads or different object locations as specified within a watermark payload can be highlighted in different color or some other graphical indicator that distinguishes watermark enabled objects from unmarked objects and each other.
The decoder forwards an object identifier (824) for the video object at the selected location to the server 802 via a network interface 826. The decoder may also provide additional information from the watermark or context information from the local processing system. For Internet applications, the decoder sends a message including this information to the server in XML format using HTTP. Before forwarding the message, the user interface may be designed to prompt the user with a dialog box requesting the user to confirm that he or she does want additional information. The network interface 826 forwards the message to the server 802 over the network.
While this example is particularly directed to computer networks like the Internet, similar systems may be built for other types of networks, like satellite broadcast networks, wireless phone networks, etc. In these types of networks, the network interface corresponds to the device and accompanying programming that sends and receives data over a communication link. In the case of wireless device, the network interface may be a cellular telephone transceiver. In the case of the satellite broadcast network, the network interface may be a satellite dish. Note that combinations of technologies may be used for transmitting and receiving functions, such as sending data via telephone network using a modem or network adapter, and receiving data via a satellite dish. The server, in response to receiving the message (828), parses it and extracts an index used to look up a corresponding action in a database (830) that associates many such indices to corresponding actions. The index may include the object identifier and possibly other information, such as time or date, a frame identifier of the selected object, its screen location, user information (geographic location, type of device, and demographic information), etc. Several different actions may be assigned to an index. Different actions can be mapped to an object identifier based on context information, such as the time, date, location, user, etc. This enables the server to provide actions that change with changing circumstances of the viewer, content provider, advertiser, etc. Some examples include returning information and hyperlinks to the user interface 820 (e.g., a web page), forming and forwarding a message to another server (e.g., re-directing an HTTP request to a web server), recording a transaction event with information about the selected object and user in a transaction log, downloading to the local processing system other media such as still image, video or audio content for playback, etc.
Another action that may be linked to the video object is connecting the user to a transaction server. The transaction server may enable the user to purchase a physical object depicted in the video object via an electronic transaction. It may also enable the user to enter into a contract electronically to obtain usage rights in the video content or related content.
In the example configuration depicted in Fig. 8, the server 802 looks up the address of a web server associated with the index (830). It then forwards an HTTP request (832) to the web server 804 at this address and provides the IP address of the local processing system 800. In addition, it may also include in the HTTP request that the web server may use to tailor a response to the local processing system, such as the object identifier, frame identifier, user demographics, etc.
The web server receives the request (834) and returns information to the local processing system (836). This information may include hyperlinks to other information and actions, programs that execute on the local processing system, multimedia content (e.g., music, video, graphics, images), etc. One way to deliver the information is in the form of an HTML document, but other formats may be used as well.
The local processing system receives the information from the server 804 through the network and the network interface 826. The decoder operates in conjunction with the user interface application such that the information is addressed to the user interface. For Internet applications, a TCP/IP connection is established between the user interface application and the network. The server forwards the information to the IP address of the user interface application. The user interface then formats the information for display and superimposes it onto the video display. For example, when the information is returned in the form of HTML, the user interface application parses the HTML and formats it for display on display device 814. The rendered HTML is layered onto the video frames in the video memory. The video controller 812 then displays a composite of the HTML and the video data. In the event that the HTML includes hyperlinks, the user interface processes inputs to these links in a similar fashion as an Internet browser program. Just like the servers may map a watermark payload to different actions for different circumstances, the user interface may also implement a set of rules that govern how it presents content returned from the network based on context information. For example, the user interface may keep track of information that a user has scene before and change it or tailor it based on user information or user preferences entered by the user. For example, the user can configure the user interface to display information about certain topics (news categories like sports, business, world affairs, local affairs, entertainment, etc.) or actions (e.g., links to certain categories of electronic buying transactions, video or music downloads, etc.). Then, when the user interface receives information and links to actions, it filter the information and links based on user preference and provide only information and links in the user's preference. One potential drawback of the above configuration is that it may create conflicts among viewers. People often watch TV in a shared environment, whereas they work on the Internet in a personal environment. This environment creates a conflict when one viewer selects an object to get information that interferes with another viewer's enjoyment of the video program.
One solution is to provide consumers with their own personal and portable Internet personal device (PD) as shown in Fig. 9. The system may be configured to have the decoding process in a TV, set-top box, or other receiver 900 of a video stream. The decoder may then transmit watermark IDs, locations, and potentially other context information to the PD 902. As another alternative, the decoder may be located in the PD. For example, the PD may be equipped with a microphone that captures the audio signal emitted from the speaker of the television. The PD digitizes the audio signal and extracts watermarks from it, which include object information used to link video objects to information or actions. For example, the object information may include object identifiers and location codes for video objects in the video program. The PD may also include a camera, and perform similar actions on watermarks in the video frames. Two parts of this configuration are: 1) a transmitting device like the television 900 shown in Fig. 9, set-top box, etc., and 2) a receiving PD 902 such as a personal digital assistant (PDA) with a wireless connection to the Internet, or a remote control. The receiving PD can perform the functions of enabling the user to select a video object, retrieving the linked information or actions for the selected object, and rendering them on its user interface. One example of such a device is a PD with a communication link (e.g., infrared, radio, etc.) to the transmitting device for receiving object information and a communication link with a network, database, server, etc. for retrieving the linked information or actions for the selected object. As another alternative, the receiving PD acts solely as a user control device of the transmitting device that enables the user to select an object and communicates the selection back to the transmitting device. The transmitting device, in response to the user selection, retrieves linked information or actions for the selected object and renders them. One example of such a device is a remote control with a user interface (e.g., display and cursor control device for selecting objects) and a two-way communication link with the transmitting device (e.g., infrared, radio, etc.).
Transmitting Device
The transmitter could be a stand-alone device or part of a set-top box that already exists for your TV. The stand-alone device can be a small transmitter that attaches to coaxial cable and transmits a video object identifier and its location during the TV show. If this stand-alone device is connected before the channel has been chosen, it can transmit the IDs and locations for all channels, and the receiving PD can be used to choose the channel you are watching.
Alternatively, the receiving PD can transmit an identifier of the channel you are watching to the transmitting device, so it, in turn, only transmits the information for the desired channel.
A less complex stand-alone solution, thus less expensive to manufacture and sell, is to add this stand-alone device after the channel has been chosen, possibly between your VCR or set-top box and your TV, and have it transmit information for the channel you are watching. Finally, this stand-alone device can be OEM hardware that is added inside the TV by the manufacturer or as a post buying solution (i.e. retro-fit).
The set-top box solution may use a Web, Cable or Digital TV set-top box, especially if the existing box is already interactive. Otherwise, OEM hardware could be provided for the set-top box manufacturer.
The transmission scheme can use any method, such as IR or radio waves (e.g., Bluetooth wireless communication), to transmit this minimal amount of information. IR ports are advantageous because most laptops and PDAs already have JJR. ports. If the set-top box already has a transmission protocol, the transmission scheme should use that scheme. If this scheme is not applicable with an existing receiving PD, a special attachment can be developed and feed into the receiving PD via existing input devices, such as IR, serial, parallel, USB, or IEEE firewire inputs. Receiving PD The receiving PD may be a laptop computer, Palm pilot, digital cell phone, or an
Internet appliance (such as a combined PDA/Cell Phone/AudioNideo device). This PD would display the links in their relative location on a screen matching the TV screens aspect ratio. Then, using the PD you can select the desired link, possibly by clicking on the link, pressing the appropriate number key relating to the link number, or saying the link number and using speech recognition (906). Next, the PD sends information about the selected link to a database (e.g., a web server that converts the information into a web page URL and directs the server at this URL to return the corresponding web page to the PD) (908). A user interface application running in the PC then renders the web page (910) on its display. Using this approach, the links are dynamic and the data required to describe a link is minimal. This allows the watermarking and transmitting process to be easier. Most importantly, fewer bits need to be transmitted since only an ID and not the complete link are required.
Alternatively, if the receiving PD is connected to the Internet, new and hot information can automatically be pushed to the receiving PD, rather than requiring the user to click on the link. For example, if you are watching a basketball game, the current stats of the player with the ball can be pushed. Or, if you are watching a concert, the location on the tour can be presented. This push feature can be always-on or controlled by the user.
The configuration shown in Fig. 9 differs from the one shown in Fig. 8 in that decoding of a watermark payload and user selection of a link associated with that payload are performed on separate devices. The functions of receiving and rendering video content, decoding watermark from the content, and linking to information and actions based on the watermark payload can be performed on separate devices. Many of the features and applications detailed in connection with Fig. 8 also apply to the configuration shown in Fig. 9.
The following sections illustrate several different application scenarios and related watermarking systems and methods that demonstrate the diversity of the technology described above.
Previously Segmented Video
Segmented video streams, such as those supported in MPEG 4 allow the film or video editor to extract a video scene element from the background and embed the isolated video object. The watermark encoder marks a video object layer corresponding to the object in some or all frames in which the object is visible. When the scene element is not large enough to be encoded with at least one watermark block, the editor keys in that frame, defines a new element again and begins a batch embedding along each frame of the time sequence.
The viewer will watch the movie on DVD, VHS, or some other video signal format and be able to link directly to the Internet or other database online or offline by selecting a watermark enabled video object.
Video Objects Captured Through Greenscreeens
The embedding process may embed a live character that has been shot against a greenscreen. This enables a video editor to embed the actor without first extracting him from the background. This video object will later be composited with computer graphics or other live action shot at another time. Watermark embedding technology described above can be integrated with commercially available video compositing software from Discreet Logic, Adobe or Puffin Designs.
Rendered 3D Object Layers
Watermarks may also be embedded in two dimensional image renderings of still or animated 3D graphical objects. The embedded object can be composited with a video stream to form a video program, such as a movie or television programming. This embedded object stays in the video content when converted to other formats such as DVD or VHS without an additional watermark embedding. Conversely, graphical objects that link to information or electronic commerce transactions can be added to a video product, such as a movie, when its converted from one format to another. For example, the video content can be watermark enabled when it is placed on a DVD or VHS for mass distribution. Physical Objects Captured in Video
Another application is to embed video objects that are static like the basketball backboard or the sportscaster's table or the Jumbotron. This entails masking out the static video object layer in each frame to isolate it from the background in the video sequence. This may be accomplished by creating two separate video feeds from the same camera using one to create the mask for each "frame" and using the other for the actual broadcast signal. The masked area is marked and the two signals are combined and broadcast.
The sportscaster's table could also have a watermark on the actual artwork that scrolls in front of it. This persistent watermark would need no additional masking.
Real Time Object Embedding
Another application is to embed video objects such as the players of a game. Using video object segmentation, this application extracts video objects from the background and embeds them in the video stream before broadcast or other distribution.
Another method is to generate different video streams, each potentially including a different watermark or watermark payload linking video objects in the corresponding video stream to actions or information. In this case, a watermark is embedded in the video captured from a camera that focuses on a particular character, player, or object. In a video production process, a technician selects the video feed from this camera from among feeds from one or more other cameras to be part of the final video program. For example, a camera following a particular player is encoded with an object identifier associated with that player. The technician selects the video feed from this camera (e.g., the Kobe Kamera isolated on the Laker's Kobe Bryant) at intervals during a game and carries the watermark enabling the user to click the frame and access a page of a web site like NBA.com, Lakers.com, etc. that provides information about that player. Also, a transparent frame could be overlaid on this camera that the view could not see, but the detector could. Just enough pixels would be sent to detect the image.
Yet another method is to compute video objects dynamically at video capture by deriving video object position and screen extents (bounding box, binary mask, shape, etc.) from the real world objects being captured.
Games
Watermarks may be inserted into graphical objects in 3D animation used in video games to link characters and other objects to information or actions. Dreamcast, Playstation 2, and PC CD-ROM games all have Internet access. Images that are rendered on the fly can be embedded with the watermark. Canned animation and cut scenes are rendered previously with the watermark in them. These can activate special website interaction, or for playing online, this could allow extra interaction between players.
Embedding Graphic Overlays The score area on the bottom of the screen is an excellent place to mark before transmission of the video broadcast.
Real Time embedding is ready for delivery. Every NFL and NBA broadcast now has sophisticated graphics that are keyed on screen.
In addition, another opportunity to mark is when a player's statistics are shown on the NFL game between plays or during a timeout. The screen cuts from the live broadcast to canned animation that includes a composite of the player's picture and his states. This is an excellent opportunity for watermark embedding.
In addition to the real time embedding examples above, one method is to embed a watermark or watermarks in relatively static portions of the background (e.g., watermarking portions of video frames depicting the turf of a playing field). This method would work well since it is stationary and usually fills a large part of the TV screen.
News Broadcasts
Graphics used in news broadcasts can be linked to information and actions via watermarks. CNN, ABC, NBC, CBS, etc. have used keyed images over the anchor's shoulder for years. They are canned graphics that are composited during the broadcast. These canned graphics can be embedded with watermarks as described above.
Virtual Billboards
The virtual billboards displayed advertising from the typical broadcast advertiser. These images can be watermarked to link the virtual billboards to information or actions, like electronic buying opportunities.
Feature Films
Feature films that were not embedded in the original post-production can be embedded afterwards on their way to video, DVD, or other format for electronic or packaged media distribution. Logos and other Graphic Overlays
Many channels now keep a logo at the bottom right corner of their screen. The History Channel, MTV, VH1, TLC, TNN, all have logos that advertise the channel. These logos are sometimes shown throughout the program hour. These logos can be linked to external actions or information by embedding a watermark in either the video signal or the accompanying audio track.
Watermarked Signs
Watermarks may be embedded in the images on large physical objects, such as outdoor signs. These outdoor signs could conceivably be marked and detected onscreen. A typical example would be billboards inside a baseball park or football stadium. When video is captured of these physical objects, the watermarked images on these objects is recorded in the video signal. The watermark is later decoded from the video signal and used to link the video signal to an action or information.
Watermark Enabled Advertising Video objects representing advertising or promotions may be watermark enabled. For example, an advertiser such as Ford would produce a watermark enabled ad that would pop up specifically for users to click. The promo could be "NFL on ESPN...Brought to You By FORD" and while that logo or graphic spins there for twenty seconds Ford is offering a promotional discount or freebie for all the people that click on it to visit there site during that time. The video programmer could run the video objects many times so people who miss it could get another chance.
User Alerts and Preferences
The watermark decoding system may employ a user interface to enable the user to control activation of watermark enabled features. For example, the decoding process may default to an "alert off status, where the watermark decoder does not alert the user to watermark enabled features unless he or she turns it on. By querying the screen every few seconds, a watermark detector or decoder may alert the user that there are watermark enabled objects present on screen if he/she so chooses. The decoding system may be programmed to allow the user to determine whether or not he/she is alerted to watermarked enabled features, and how often.
In addition, the decoding system may enable the user to set preferences for certain types of information, like sports, news, weather, advertisements, promotions, electronic transactions. The decoding system then sets up a filter based on preferences entered by the user, and only alert the user to watermark enabled features when those features relate to the user's preferences.
Watermark Enabled Commerce
Watermark enabled video objects may be linked to electronic commerce and advertising available on the Internet or from some other information server.
For example, video objects may be linked to opportunities to rent or by the content currently being viewed or related content. At the beginning or end of the film, a watermark enabled logo may be overlayed on a video signal (e.g., from a DVD or other video source) to allow the user to access a website to review the movie, purchase the movie (rent to own), rent/buy the sequel, alert the web site that the rented movie has been viewed to help manage inventory, etc.
Introducing Interactivity into Video Programming
By incorporating watermark enabled video into a television program, the program may be transformed into an interactive experience. For example, a sitcom program could include watermark enabled video objects at selected points in the broadcast or at the opener that alerted the viewer to get online.
Interactive Shopping
Video advertising of products, such as clothing, may be watermark enabled to link video objects representing a product or service to additional information or actions, such as electronic buying transactions. For example, a clothing manufacturer could enable all their broadcast ads. Each piece of clothing on the actor may be watermark enabled and linked to the page on the web site to buy the article.
Real Time Derivation of Video Object Spatial and Temporal Extents
The technology shown in Fig. 5 allows watermark tracking by placing locator devices in physical objects. One example is to place these locators inside the shoes and on the uniforms of professional athletes during games. These locator chips emit a signal that is received and triangulated by detectors on courtside. Each chip has a unique ID to the player. The signal is passed through a computer system integrated into the production room switcher that embeds watermarks into the video stream captured of the player. The players wear at least two transmitters to give location information relative to the camera position. Using this information, a preprocessor derives the screen location of the corresponding video objects. If transmitters get too close to distinguish a video object, the preprocessor prioritizes each video object based on the producer's prior decision.
Alternatively, the player's jersey could be watermarked, and used like a pre-marked static object.
Linking Audio Objects with Watermarks
Just as audio or video watermarks can be used to link video objects to information or actions, so can they link audio objects to related information or actions. In an audio signal, portions of the signal are distinguishable and recognizable as representing a particular audio source, such as a person's voice or vocal component of a song, an instrument, an artist, composer, songwriter, etc. Each of these distinguishable components represent audio objects. Watermarks in the audio or accompanying video track can be used to link audio objects to information or actions pertaining to the action.
To access linked information or actions, the user selects a portion of the audio signal that includes a watermark enabled audio object, such as by pressing a button when an audio object of interest is currently playing. Using the temporal location of the user selection in the audio signal, a watermark linking process maps the user selection to a corresponding audio object. The systems and processes described above may be used to retrieve and render information or actions linked to the selected audio object.
Personal Device For Interactive Video There have been and will be many schemes for interactive video, including TV broadcasts, webcasts, and pre-recorded VHS or DVD. However, none of the schemes solve the problem of the physical and psychological differences between a TV and an interactive display, such as an Internet terminal. Most importantly, people watch TV from 10-yards away in a shared environment, whereas they interact with a display, such as working on the Internet, in a 10-foot personal environment. Specifically, it is hard enough to get three people to agree on a TV show, let alone when to view a web page and which one. Additionally, a TV is currently a bad display for text.
The solution is to provide consumers with their own personal and personal network- capable device (PD) for rendering the auxiliary information and interacting with it. One implementation is to have the TV, set-top box, or proprietary device transmit the auxiliary information to the PD for rendering. This auxiliary information may consist of identifiers that link the user to information via a network server, web links (i.e. URLs) for direct web access, web searches, and so on. The PD handles the network access. It has a network interface, which may include a combination of hardware, firmware and software, to establish a connection with a remote device and to transfer information to and from the PD. The network interface may include, for example, a computer network interface, a wireless telephone transceiver, a cable modem, satellite dish, etc.
Another implementation is to have the device receiving the video stream (e.g., a TV tuner)handle the network access and use the PD only for rendering interactive content associated with the video stream. Yet another implementation is to have the PD receive the auxiliary information directly, thus handling all of the interactivity. There are many usages for this solution, such as finding statistics during sporting events, playing along with a game-show, linking auxiliary entertainment to a video, buying advertised items.
One potential drawback of the prior interactive video schemes is that they may create conflicts among viewers. People often watch TV in a shared environment, whereas they interact, such as working on the Internet, in a personal environment. This environment creates a conflict when one viewer selects an object to get information that interferes with another viewer' s enj oyment of the video program.
Before describing the solution, here are some definitions. Interactive video includes any type of entertainment that is watched, including broadcast TV, webcasts, and pre-recorded λ deo such as video programs recorded on VHS and DVDs. In addition, although described in terms of video, this solution works for audio broadcasts, such as radio. Auxiliary information refers to the information in the video (or audio) that is designed for interactivity. The auxiliary information may be embedded in the video, possibly as a watermark or within VBI frames, or as side band or out-of-band information. The auxiliary information may be identifiers for network server lookups, web page links (i.e. URLs) for direct web access, web searches, raw code such as XML or HTML, and so on. The solution, as shown in Fig. 10, is to provide consumers with their own network- capable personal device (PD) 1110 for interacting with a TV 1100. The PD renders the auxiliary information associated with the video. The PD may or may not have network connectivity, and may or may not be able to retrieve the information directly.
Network-enabled embodiment As shown in Fig. 11, the system begins with a TV, set-top box, or other video receiver
1200 of a video stream. The receiver 1200 then transmits the auxiliary information to the PD 1202.
The auxiliary information may need to be decoded. In such a case, the auxiliary information can be decoded in the receiver 1200 before transmission, or not. For example, when using a watermark, a watermark decoder may be located in the PD. Specifically, the PD may be equipped with a microphone that captures the audio signal emitted from the speaker of the television. The PD digitizes the audio signal and extracts the auxiliary information from the watermark. The PD may also include a camera, and perform similar actions on watermarks in the video frames.
This embodiment can be broken into two parts: 1) a transmitting device like the receiver 1200 shown in Fig. 11, and 2) a receiving PD 1202, such as personal digital assistant (PDA) with a wireless connection to the web.
Transmitting Device The transmitter could be a stand-alone device, part of a set-top box that already exists for your TV, or part of the TV. The stand-alone device can be a small transmitter that attaches to coaxial cable and transmits the auxiliary information, such as a video object identifier and its location during the TV show. If this stand-alone device is un-aware of the current channel, such as a device connected to the cable at the TV's input, it can transmit the auxiliary information for all channels, and the receiving PD can be used to choose the channel you are watching. Alternatively, the receiving PD can transmit an identifier of the channel you are watching to the transmitting device, so it, in turn, only transmits the information for the desired channel, as shown with the optional (dotted lines) transmit channel connection between receiver 1200 and PD 1202. A less complex stand-alone solution, thus less expensive to manufacture and sell, is to add this stand-alone device after the channel has been chosen, possibly between your VCR or set-top box and your TV, and have it transmit information for the channel you are watching. Finally, this stand-alone device can be OEM hardware that is added inside the TV by the manufacturer or as a post buying solution (i.e. retro-fit). The set-top box solution may use a Web, Cable or Digital TV set-top box, especially if the existing box is already interactive. Otherwise, OEM hardware could be provided for the set-top box manufacturer.
The transmission scheme can use any method, such as IR or radio waves (e.g., Bluetooth wireless communication), to transmit this minimal amount of information. IR ports are advantageous because most laptops and PDAs already have IR ports. If the set-top box already has a transmission protocol, the transmission scheme should use that scheme. If this scheme is not applicable with an existing receiving PD, a special attachment can be developed and feed into the receiving PD via existing input devices, such as IR, serial, parallel, USB, or IEEE firewire inputs. Receiving PD
The receiving PD may be a laptop computer, Palm pilot, digital cell phone, or an Internet appliance (such as a combined PDA/Cell Phone/Audio/Video device). First, the PD informs the user how to enable the inter-activity. For example, if the video had one enabled link, the PD could display an icon to information for the user. If the video includes interactive game shows, the PD could display a place to enter your guess and current scores. Alternatively, if several video objects were linked, the PD could display the links in their relative screen location on an image matching the TV screens aspect ratio.
Then, using the PD you can enable the desired interactivity, possibly by clicking on a link, pressing the appropriate number key relating to a link number, saying a link number and using speech recognition, typing a response or selecting an enable button (1204). Next, the PD sends information about the selection to the network and the desired information is returned (1208). In some cases direct web link or web search information is transmitted to the PD. In other cases, and identifier is transmitted, and the network (probably the Internet) server transforms the identifier into a web link or desired web information. Using this latter approach, the links are dynamic and the data required to describe a link is minimal. This allows the watermarking and transmitting process to be easier. Most importantly, fewer bits need to be transmitted since only an ID and not the complete link are required. For example, if selecting an identifier with the goal of displaying an appropriate web site, the selected link is sent via the Internet to a web server that converts the information into a web page URL, and returns the corresponding web page to the PD.
Finally, the desired information is rendered on PD 1202. For example, a user interface application may be running on the PD that renders the selected web page on the PD's display.
Dumb Terminal Embodiment As shown in Fig. 12, this embodiment is similar to the above embodiment, except that a PD 302 does not have network access, such as an Internet connection. Thus, the PD 1302 relies on the video receiver 1300, such as a set-top box and TV, for both the auxiliary information and the network connectivity, as shown in box 1308. Importantly, the interactivity does not need to detract from the video display, and can be solely or mostly displayed on the PD 1302. Obviously, the receiver 1300 may want to display an icon or some inobtrusive object to let the user know the video is interactive. In this embodiment, the PD could be as simple as a text or video enabled remote control.
The specific details of the process and apparatus are so similar to above that they are not re-written, so as not to unduly lengthen this application. The difference is that when network interaction is desired, the PD transmits the request to the receiver 300 and receives the network information from the receiver 1300. In summary, there are more transmissions between the PD 1302 and receiver 1300.
Intelligent PD Embodiment As shown in Fig. 13, in this final embodiment, a PD 1402 can receive the auxiliary information directly. Therefore, it does not need to communicate with a video receiver 1400. For example, the PD may have a TV tuner included. The user can select the proper channel and the interactivity begins. The channel selection can even be automated using intelligent audio or video synchronization, assuming the PD has a microphone or camera. The PD may have a special receiver for receiving auxiliary information, such as out-of-band data, linked to the current video. In this case, high data rates can be obtained, and the interactivity could be highly controlled. For example, rather than the auxiliary information containing links to web pages, it could include the web page, thus reducing server load and increasing scalability. The details of the process and apparatus are so similar to the above network-enabled embodiment that they are not re-written, so as not to unduly lengthen this application. The difference is there is no transmission required between the PD 1402 and receiver 1400 since the PD1402 receives the auxiliary information directly. The user interaction, steps 1204 and 1208, are identical to the network-enabled embodiment.
Alternative Configurations If the PD is connected to the Internet, new and hot information can automatically be pushed to the receiving PD, rather than requiring the user to click on the link. For example, if you are watching a basketball game, the current stats of the player with the ball can be pushed. Or, if you are watching a concert, the tour dates can be presented. This push feature can be always-on or controlled by the user. In addition, if there is a transmission link between the PD and video receiver, the links do not need to be displayed on the PD. The user can use the PD to highlight and select links on the video receiver via key, mouse or other inputs. Although, this interferes a little with others watching the video, it still reduces the obtrusive nature of interactive TV in shared environments by displaying the majority of the interactive information on the PD. This configuration is especially useful for reducing the price of the dumb terminal closer to that of a standard remote. Example Usages
A few enlightening example usages are described below. The first usage involves obtaining sport statistics. An exemplar process is as follows. While the user is watching TV, players that are on the field can be selected by clicking on their representative link that is displayed on the PD. The representative links can be displayed on the PD by displaying an image of the TV with correct aspect ratio, and displaying numbered links inside that image related to the position of the player on the TV. Upon the user selecting the desired link, a web page with game relevant information about that player appears on the PD, including links to that player's stats and personal web page. At this point the user may browse the Internet or return to the image of the TV to re-synchronize to the game on TV.
Alternatively, only the current game link (instead of links for each player) could be included in the auxiliary information. In this case, when the user selects the NFL logo on the PD, the PD displays the current game page, like Buffalo versus Green Bay on NFL.com. From that page you can browse further into the Internet, or re-select the NFL logo to re-synchronize with the game.
Finally, you may select that interesting statistics automatically be displayed on the PD, i.e. hot information. For example, with the previous game the user may always see the score, quarter, play time, current down, yards to first, and number of time outs left.
The second usage involves playing along with a TV game show. While watching the TV game show, auxiliary information is transmitted which allows the PD to display fields and buttons for entering guesses and the user's current ranking among other online contestants. The auxiliary information could be HTML code allowing interactivity to occur on the PD with minimal server interaction, or links, direct (i.e. a URL) or dynamic (an identifier resolved by a server), to web sites that are prepared for the game show interactivity. The third usage involves auxiliary entertainment. For example, while watching an
Interview with a musician, the auxiliary information may connect you to a web site that can stream the musician's most recent song, or the song being discussed. The PD will play the song when the auxiliary information is enabled, and can connect you to a web site to purchase the song and find out more about the musician and his/her music. The final usage involves interactive retail. For example, during an advertisement for a new car, the auxiliary information includes a link to a web page about the car with options to instantaneously purchase the car. Obviously, the car could be any item, such as perfume, a computer, a jacket, and so on. In addition, the item may be displayed during a regular TV show, as opposed to during an advertisement. With something like pizza, the auxiliary information could allow you to order the pizza for delivery from your PD. Concluding Remarks
Having described and illustrated the principles of the technology with reference to specific implementations, it will be recognized that the technology can be implemented in many other, different, forms. To provide a comprehensive disclosure without unduly lengthening the specification, applicants incorporate by reference the patents and patent applications referenced above. These patents and patent applications provide additional implementation details. They describe ways to implement processes and components of the systems described above. Processes and components described in these applications may be used in various combinations, and in some cases, interchangeably with processes and components described above.
The methods, processes, and systems described above may be implemented in hardware, software or a combination of hardware and software. For example, the watermark encoding processes may be incorporated into a watermark or media signal encoding system (e.g., video or audio compression codec) implemented in a computer or computer network. Similarly, watermark decoding, including watermark detecting and reading a watermark payload, may be implemented in software, firmware, hardware, or combinations of software, firmware and hardware. The methods and processes described above may be implemented in programs executed from a system's memory (a computer readable medium, such as an electronic, optical or magnetic storage device). Additionally, watermark enabled content encoded with watermarks as described above may be distributed on packaged media, such as optical disks, flash memory cards, magnetic storage devices, or distributed in an electronic file format. In both cases, the watermark enabled content may be read and the watermarks embedded in the content decoded from machine readable media, including electronic, optical, and magnetic storage media. The particular combinations of elements and features in the above-detailed embodiments are exemplary only; the interchanging and substitution of these teachings with other teachings in this and the incorporated-by-reference patents/applications are also contemplated.

Claims

We claim:
1. A method for encoding substantially imperceptible auxiliary information into a video signal including at least one video object, the method comprising: steganographically encoding object specific information about the video object into the video signal; and associating the object specific information with an action, where the action is performed in response to user selection of the video object through a user interface while the video signal is playing.
2. The method of claim 1 wherein the action comprises retrieving information about the video object.
3. The method of claim 2 wherein the action comprises retrieving information about the video object from a remote device.
4. The method of claim 1 wherein the action comprises executing a program.
5. The method of claim 1 wherein the video signal is steganographically encoded with at least two identifiers, each identifier corresponding to distinct video objects in frames of the video signal, and each identifier being associated with actions relating to the corresponding video objects.
6. The method of claim 1 wherein the object specific information is encoded in a watermark signal that covers a portion of screen area of frames in the video signal where the video object is located.
7. The method of claim 1 wherein object specific information for at least two different video objects in the video signal is steganographically encoded in different portions of frames of the video signals where the corresponding video objects are located.
8. The method of claim 1 wherein the object specific information includes a screen location information indicating where the video object is located in the video signal.
9. The method of claim 8 wherein object specific information is encoded for at least two different video objects in the video signal, and the object specific information includes location information indicating where the video objects are located in the video signal.
10. The method of claim 1 wherein the object specific information is encoded in a prerecorded video object, which forms part of the video signal.
11. The method of claim 10 wherein the pre-recorded video object is composited with video frames to form the video signal.
12. The method of claim 10 wherein the pre-recorded video object is composited with at least one other video object to form the video signal, where the video objects are each steganographically encoded with object specific information.
13. The method of claim 1 wherein the video object is encoded with the object specific information as part of a process of capturing the video signal of physical objects, and the object specific information pertains to the physical objects captured in the video signal.
14. The method of claim 13 wherein the object specific information is encoded as part of a process of capturing the video signal during a live broadcast or transmission of the video signal.
15. The method of claim 13 wherein object specific information is encoded for at least two different video objects depicted in frames of the video signal.
16. The method of claim 1 wherein the object specific information is encoded in graphical image that is composited with the video signal.
17. The method of claim 1 wherein object specific information is encoded for at least two different video objects such that the object specific information is synchronized with corresponding video objects depicted in the video signal during playback.
18. A method for encoding substantially imperceptible auxiliary information into a video signal including at least one video object, the method comprising: steganographically encoding auxiliary information in a physical object in a manner that enables the auxiliary information to be decoded from a video signal captured of the physical object; associating the auxiliary information with an action so that the video signal captured of the physical object is linked to the action.
19. The method of claim 18 wherein the action includes retrieving information about the physical object.
20. The method of claim 19 including: establishing an automated service that is responsive to auxiliary information extracted from the video signal to link the physical object to the action.
21. A method for using a watermark that has been encoded into a video signal or in an audio track accompanying the video signal, where the watermark conveys information about a video object in the video signal, the method comprising: decoding the information from the watermark; receiving a user selection of the video object; and executing an action associated with the information about the video object.
22. The method of claim 21 wherein the video signal includes watermark information for at least two different video objects in the video signal, and the watermark information associates the video objects with object specific actions or information.
23. The method of claim 21 wherein the audio track includes watermark information for at least two different video objects appearing in the same frames of the video signal, and the watermark information associates the video objects with object specific actions or information.
24. The method of claim 21 wherein the action includes using the information about the video object to retrieve additional information related to the video object, and further including: rendering the retrieved information during playback of the video signal.
25. The method of claim 21 including: sending the information about the video object to a database; using the information about the video object to look up additional information related to the video object. >
26. The method of claim 25 including: using the additional information to form a request to a remote system, which in response to the request, returns information to a user that made the user selection of the video object.
27. The method of claim 26 wherein the remote system is a web server and the web server returns information about the video object to the user.
28. The method of claim 26 wherein the information returned from the remote system includes one or more programs.
29. A machine readable media on which is stored a program for performing the method of claim 21.
30. A system for creating watermark enabled video objects comprising: an encoder for encoding a watermark in a video sequence or accompanying audio track corresponding to a video object or objects in the video sequence; and a database system for associating the watermark with an action or information such that the watermark is operable to link the video object or objects to a related action or information during playback of the video sequence.
31. The system of claim 30 wherein the watermark is operable to link a corresponding video object to an action or information when a user selects the video object during playback of the video sequence.
32. The system of claim 30 wherein the encoder encodes a video object with a watermark and composites the encoded video object with another video signal to create the video sequence.
33. A system for processing a watermark enabled video object in a video signal comprising: a watermark decoder for decoding a watermark carrying object specific information from the video signal and linking object specific information to an action or information; and a rendering system for rendering the action or information.
34. The system of claim 33 including a user interface for enabling a user to select a watermark enabled video object during playback of the video signal.
35. The system of claim 34 wherein the user interface includes the rendering system for rendering the action or information of the selected video object.
36. The system of claim 34 wherein the user interface is in a separate device from the watermark decoder.
37. The system of claim 33 including a network interface for communicating information decoded from a watermark to a remote device, which in response to the information, links the information to an action or additional information about a video object.
38. A method for encoding substantially imperceptible auxiliary information into an audio track of a video signal including at least one video object, the method comprising: steganographically encoding object specific information about the video object into the audio track; and associating the object specific information with an action, where the action is performed in response to user selection of the video object through a user interface while the video signal is playing.
39. The method of claim 38 wherein the object specific information includes an identifier and screen location of the video object.
40. The method of claim 38 wherein the object specific information includes information for at least two different video objects.
41. A personal device enabling a user to interact with a video program being rendered on a separate display device, the personal device comprising: a receiver for receiving auxiliary information associated with the video program; and a user interface for using the auxiliary information to display information about the video program and to solicit user input directed to a feature of the video program; wherein the user interface provides a display that is distinct from the separate display on which the video program is rendered.
42. The personal device of claim 41 wherein the separate display device displays enabled areas of the video program indicating that features of the video program are interactive.
43. The personal device of claim 42 including a network interface for retrieving information that is linked to the video program via the auxiliary information from a remote device on a network.
44. The personal device of claim 41 including a network interface for retrieving information that is linked to the video program via the auxiliary information from a remote device on a network.
45. The personal device of claim 44 in which the personal device receives the decoded auxiliary information transmitted from the video receiver.
46. The personal device of claim 44 including a decoder for decoding the auxiliary information.
47. The personal device of claim 41 in which the receiver receives the auxiliary information after the auxiliary information has been decoded.
48. The personal device of claim 41 wherein the personal device includes a network interface that enables the personal device to send a request for information linked to the video program via the auxiliary information such that rendering of the linked information on the personal device or distinct display device is synchronized with the video program.
49. An interactive video method comprising: receiving a video program and auxiliary information associated with the video program that facilitates user interactivity with the video program; rendering the video program on a display device; rendering an interactive user interface that is associated with the video program via the auxiliary information on a personal device that is distinct from the display device.
50. The method of claim 49 including: transmitting the auxiliary information from a video receiver to the personal device.
51. The method of claim 50 wherein the auxiliary information includes an index to or address of information that is stored on a remote device.
52. The method of claim 51 wherein the personal device transmits information about a user selection of a feature in the video program to a network connected device, which in turn, fetches external information from a network relating to the feature.
53. A method of making an electronic purchase comprising: receiving auxiliary information associated with a video program from a video receiver; displaying an interactive user interface enabling a user to select a feature in the video program; in response to a user selection of a feature, accessing an electronic transaction server linked to that feature via the auxiliary information.
54. The method of claim 53 wherein the electronic purchase involves a product currently being displayed in the video program.
55. The method of claim 53 wherein the electronic purchase involves a product currently being displayed as part of a non-advertised TV show.
56. The method of claim 53 wherein the electronic purchase involves ordering food during a TV program.
57. A device enabling a user to interact with a video program being rendered on a display device, the device comprising: a receiver for receiving auxiliary information associated with the video program; and a user interface for using the auxiliary information to render interactive user interface elements linked to information about the video program.
58. The device of claim 57 wherein the video program is a sporting event and the linked information includes statistics related to the sporting event.
59. The device of claim 57 wherein at least one of the user interface elements links to an electronic transaction server that enables a user to purchase a product or service related to the video program.
60. The device of claim 58 wherein the product or service is advertised in the video program.
61. The device of claim 58 wherein the video program is a non-advertised TV show and the product or service is currently being displayed as part of the non-advertised TV show.
62. The device of claim 58 wherein a user interface element links a user to a service for ordering food during a broadcast of the video program.
PCT/US2001/019254 2000-06-20 2001-06-15 Interactive video and watermark enabled video objects WO2001099325A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001266949A AU2001266949A1 (en) 2000-06-20 2001-06-15 Interactive video and watermark enabled video objects

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US09/597,209 US6411725B1 (en) 1995-07-27 2000-06-20 Watermark enabled video objects
US09/597,209 2000-06-20
US66075600A 2000-09-13 2000-09-13
US09/660,756 2000-09-13

Publications (2)

Publication Number Publication Date
WO2001099325A2 true WO2001099325A2 (en) 2001-12-27
WO2001099325A3 WO2001099325A3 (en) 2003-11-06

Family

ID=27082765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/019254 WO2001099325A2 (en) 2000-06-20 2001-06-15 Interactive video and watermark enabled video objects

Country Status (2)

Country Link
AU (1) AU2001266949A1 (en)
WO (1) WO2001099325A2 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10218946A1 (en) * 2002-04-22 2003-11-06 Deutsche Telekom Ag Applying watermark to image transmitted via data line, e.g. video film or sequence with soundtrack, by replacing image or sound object of less relevance with watermark
EP1514269A1 (en) * 2002-06-18 2005-03-16 Lg Electronics Inc. System and method for playing content information using an interactive disc player
EP1998565A1 (en) * 2006-03-01 2008-12-03 L.A.B. Inc. Video-linked controlled object external device controller and video recording medium used for same
EP2183715A2 (en) * 2007-07-30 2010-05-12 Yahoo! Inc. Textual and visual interactive advertisements in videos
US8433306B2 (en) 2009-02-05 2013-04-30 Digimarc Corporation Second screens and widgets
WO2017007224A1 (en) * 2015-07-06 2017-01-12 엘지전자 주식회사 Broadcasting signal transmission device, broadcasting signal reception device, broadcasting signal transmission method, and broadcasting signal reception method
US9788043B2 (en) 2008-11-07 2017-10-10 Digimarc Corporation Content interaction methods and systems employing portable devices
US10555051B2 (en) 2016-07-21 2020-02-04 At&T Mobility Ii Llc Internet enabled video media content stream
US10657380B2 (en) 2017-12-01 2020-05-19 At&T Mobility Ii Llc Addressable image object

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5745569A (en) * 1996-01-17 1998-04-28 The Dice Company Method for stega-cipher protection of computer code
US5822432A (en) * 1996-01-17 1998-10-13 The Dice Company Method for human-assisted random key generation and application for digital watermark system
US5841978A (en) * 1993-11-18 1998-11-24 Digimarc Corporation Network linking method using steganographically embedded data objects
US5896454A (en) * 1996-03-08 1999-04-20 Time Warner Entertainment Co., L.P. System and method for controlling copying and playing of digital programs
US6031815A (en) * 1996-06-27 2000-02-29 U.S. Philips Corporation Information carrier containing auxiliary information, reading device and method of manufacturing such an information carrier
US6177931B1 (en) * 1996-12-19 2001-01-23 Index Systems, Inc. Systems and methods for displaying and recording control interface with television programs, video, advertising information and program scheduling information
US6269394B1 (en) * 1995-06-07 2001-07-31 Brian Kenner System and method for delivery of video data over a computer network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5841978A (en) * 1993-11-18 1998-11-24 Digimarc Corporation Network linking method using steganographically embedded data objects
US6269394B1 (en) * 1995-06-07 2001-07-31 Brian Kenner System and method for delivery of video data over a computer network
US5745569A (en) * 1996-01-17 1998-04-28 The Dice Company Method for stega-cipher protection of computer code
US5822432A (en) * 1996-01-17 1998-10-13 The Dice Company Method for human-assisted random key generation and application for digital watermark system
US5896454A (en) * 1996-03-08 1999-04-20 Time Warner Entertainment Co., L.P. System and method for controlling copying and playing of digital programs
US6031815A (en) * 1996-06-27 2000-02-29 U.S. Philips Corporation Information carrier containing auxiliary information, reading device and method of manufacturing such an information carrier
US6177931B1 (en) * 1996-12-19 2001-01-23 Index Systems, Inc. Systems and methods for displaying and recording control interface with television programs, video, advertising information and program scheduling information

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10218946A1 (en) * 2002-04-22 2003-11-06 Deutsche Telekom Ag Applying watermark to image transmitted via data line, e.g. video film or sequence with soundtrack, by replacing image or sound object of less relevance with watermark
EP1514269A1 (en) * 2002-06-18 2005-03-16 Lg Electronics Inc. System and method for playing content information using an interactive disc player
EP1514269A4 (en) * 2002-06-18 2009-05-20 Lg Electronics Inc System and method for playing content information using an interactive disc player
EP1998565A1 (en) * 2006-03-01 2008-12-03 L.A.B. Inc. Video-linked controlled object external device controller and video recording medium used for same
EP1998565A4 (en) * 2006-03-01 2011-06-08 L A B Inc Video-linked controlled object external device controller and video recording medium used for same
EP2183715A2 (en) * 2007-07-30 2010-05-12 Yahoo! Inc. Textual and visual interactive advertisements in videos
EP2183715A4 (en) * 2007-07-30 2012-06-06 Yahoo Inc Textual and visual interactive advertisements in videos
US9788043B2 (en) 2008-11-07 2017-10-10 Digimarc Corporation Content interaction methods and systems employing portable devices
US8433306B2 (en) 2009-02-05 2013-04-30 Digimarc Corporation Second screens and widgets
US11070854B2 (en) 2015-07-06 2021-07-20 Lg Electronics Inc. Broadcasting signal transmission device, broadcasting signal reception device, broadcasting signal transmission method, and broadcasting signal reception method
US10721502B2 (en) 2015-07-06 2020-07-21 Lg Electronics Inc. Broadcasting signal transmission device, broadcasting signal reception device, broadcasting signal transmission method, and broadcasting signal reception method
US10939145B2 (en) 2015-07-06 2021-03-02 Lg Electronics Inc. Broadcasting signal transmission device, broadcasting signal reception device, broadcasting signal transmission method, and broadcasting signal reception method
WO2017007224A1 (en) * 2015-07-06 2017-01-12 엘지전자 주식회사 Broadcasting signal transmission device, broadcasting signal reception device, broadcasting signal transmission method, and broadcasting signal reception method
US11483599B2 (en) 2015-07-06 2022-10-25 Lg Electronics Inc. Broadcasting signal transmission device, broadcasting signal reception device, broadcasting signal transmission method, and broadcasting signal reception method
US10555051B2 (en) 2016-07-21 2020-02-04 At&T Mobility Ii Llc Internet enabled video media content stream
US10979779B2 (en) 2016-07-21 2021-04-13 At&T Mobility Ii Llc Internet enabled video media content stream
US11564016B2 (en) 2016-07-21 2023-01-24 At&T Mobility Ii Llc Internet enabled video media content stream
US10657380B2 (en) 2017-12-01 2020-05-19 At&T Mobility Ii Llc Addressable image object
US11216668B2 (en) 2017-12-01 2022-01-04 At&T Mobility Ii Llc Addressable image object
US11663825B2 (en) 2017-12-01 2023-05-30 At&T Mobility Ii Llc Addressable image object

Also Published As

Publication number Publication date
WO2001099325A3 (en) 2003-11-06
AU2001266949A1 (en) 2002-01-02

Similar Documents

Publication Publication Date Title
US8442264B2 (en) Control signals in streaming audio or video indicating a watermark
JP5145032B2 (en) Synchronize broadcast content with corresponding network content
CN1264349C (en) Interactive TV using remote control with built-in phone
US8436891B2 (en) Hyperlinked 3D video inserts for interactive television
US7536706B1 (en) Information enhanced audio video encoding system
JP5204285B2 (en) Annotation data receiving system linked by hyperlink, broadcast system, and method of using broadcast information including annotation data
US20150296263A1 (en) System And Method In A Television Controller For Providing User-Selection Of Objects In A Television Program
EP0982947A2 (en) Audio video encoding system with enhanced functionality
US20150264451A1 (en) Method, apparatus and system for providing access to product data
US20050229227A1 (en) Aggregation of retailers for televised media programming product placement
US20030079224A1 (en) System and method to provide additional information associated with selectable display areas
JP5576667B2 (en) Information transmission display system
US20080133604A1 (en) Apparatus and method for linking basic device and extended devices
EP2494541A1 (en) Multiple-screen interactive screen architecture
JP2004507989A (en) Method and apparatus for hyperlinks in television broadcasting
CN101606171A (en) The apparatus and method of access and first media data correlation combiner information
CN101151673A (en) Method and device for providing multiple video pictures
WO2008074597A1 (en) Automatically embedding information concerning items appearing in interactive video by using rfid tags
WO2010077918A2 (en) Embedded video advertising method and system
WO2001099325A2 (en) Interactive video and watermark enabled video objects
JP2009529828A (en) System and method for mapping media content to a website
US20150106200A1 (en) Enhancing a user's experience by providing related content
Srivastava Broadcasting in the new millennium: A prediction

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP