WO2000064148A1 - Method and apparatus for efficient video processing - Google Patents

Method and apparatus for efficient video processing Download PDF

Info

Publication number
WO2000064148A1
WO2000064148A1 PCT/US2000/010451 US0010451W WO0064148A1 WO 2000064148 A1 WO2000064148 A1 WO 2000064148A1 US 0010451 W US0010451 W US 0010451W WO 0064148 A1 WO0064148 A1 WO 0064148A1
Authority
WO
WIPO (PCT)
Prior art keywords
encoder
decoder
information
frame
segment
Prior art date
Application number
PCT/US2000/010451
Other languages
French (fr)
Other versions
WO2000064148A9 (en
Inventor
Adityo Prakash
Eniko F. Prakash
Original Assignee
Pulsent Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pulsent Corporation filed Critical Pulsent Corporation
Priority to AU44685/00A priority Critical patent/AU4468500A/en
Publication of WO2000064148A1 publication Critical patent/WO2000064148A1/en
Publication of WO2000064148A9 publication Critical patent/WO2000064148A9/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/12Systems in which the television signal is transmitted via one channel or a plurality of parallel channels, the bandwidth of each channel being less than the bandwidth of the television signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/196Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/507Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction using conditional replenishment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation

Definitions

  • the present invention relates to the compression of motion video data, and more
  • Compression is the means by which digital motion video
  • the first type of video compression focuses on the reduction of spatial redundancy.
  • Spatial redundancy refers to taking advantage of the correlation among neighboring pixels in order to derive a more efficient representation of the important information in an image frame.
  • These methods are more appropriately termed still image compression routines, as they do not attempt to address the issue of temporal, or frame to frame, redundancy, as explained in section 2.2. They work reasonably well on individual video image frames.
  • a critical element in video compression is reducing
  • One of the first commonly used methods of image compression was the DCT, or
  • DCT operates by representing each digital image frame as a series of cosine waves or frequencies. Afterwards, the coefficients of the cosine series are quantized. The higher frequency coefficients are quantized more harshly than those of the lower
  • the wavelet transformation compression scheme was devised. This system is similar to the DCT. The only substantial difference is that the image frame is represented as a series of wavelets, or windowed oscillations, instead of as a series of cosine waves.
  • the goal of fractal compression is to take an image and determine the single
  • a fractal is an object that is self-similar at different scales, or resolutions, i.e. no matter what resolution you look at, the object remains the same. Theoretically, fantastic compression ratios could occur as simple equations describe complex images.
  • Fractal compression is not a viable method of general compression. The high
  • Block Based Motion Estimation Block Matching is the process by which a block ' of the image is subdivided into uniform size blocks and each block is tracked from one frame to another and represented by a motion vector instead of having the block re-coded and placed into the bitstream for a second time. Examples of compression routines that use block matching include MPEG, and all its variants.
  • MPEG operates by performing a still image compression on the first frame and transmitting it. It then divides the same frame into 16 pixel by 16 pixel square blocks and attempts to find each block within the next frame. For each block that still exists in the subsequent frame, MPEG needs only transmit the motion vector, or movement, of the
  • residue frame which is coded using JPEG and sent to the receiver to complete the image frame.
  • the encoder divides the second image frame into blocks and the routine
  • a keyframe is an image frame which is
  • block method is inherently crude, as the blocks do not have any relationship with real objects in the image.
  • a given block may comprise a part of an object, a whole object, or even multiple dissimilar objects with unrelated motion.
  • neighboring objects will have similar motion.
  • blocks do not correspond to real objects, block based systems cannot use this information to further reduce the bitstream
  • This invention represents a novel approach to the problem of video compression.
  • the goal of video compression is to represent accurately a sequence
  • the smart decoder also makes the same predictions about the subsequent images in the related sequence of images as the encoder.
  • the encoder can simply send the difference between the prediction and the actual values, thus also reducing the bitstream,
  • Compression of digital motion video is the process by which superfluous or redundant information, both spatial and temporal, contained within a sequence of related video frames (frames) is removed.
  • Video compression allows the sequence of frames to
  • This invention is novel in that it. uses a smart decoder to take much of the
  • absent from the bitstream is the information regarding the structural information inherent within the image frame, such as geometry, color, and brightness, which, in a complex frame is a significant amount of video information. Further, absent from the bitstream is information regarding any decision made by the encoder such as segment ordering, segment association and disassociation, etc.
  • Fig. 1 is an overview drawing of the encoder for use with a compatible decoder as will be described later with respect to Fig. 2.
  • the encoder works as follows:
  • the encoder obtains a reference image frame
  • the encoder encodes the image frame from step 1;
  • step 3 The encoded image from step 2 is reconstructed by the encoder, in the same manner as the decoder will;
  • the encoder segments the reconstructed image from step 4; Alternatively, the encoder segments the original reference image frame from step 1;
  • step 4 The segments determined in step 4 are ordered by the encoder, in the same manner as the decoder will;
  • the encoder obtains a new image frame
  • step 6 is determined by motion matching
  • the encoder encodes the kinetic information
  • the encoder attempts to fill each of the background residues from step 9 and 10.
  • the encoder determines the difference between the predicted fill and the actual fill for each of the background residue areas.
  • the encoder determines the local residue areas in the second image frame, from the segment motion information
  • the encoder orders the local residues from step 13, in the same manner as the, decoder will;
  • the encoder encodes the local residues from step 13.
  • the encoder determines any special instructions associated with the segment information
  • the encoder transmits the following information, and reconstructs the second frame, and continues at step 6:
  • Fig 2 is an overview drawing of the decoder system with a compatible encoder as described in Fig 1.. The decoder system works as follows:
  • the decoder receives a first encoded image frame from step 3 of the encoder description
  • step 2 The encoded image frame from step 1 is reconstructed by the decoder in the same manner as the encoder;
  • the reconstructed image frame from step 2 is segmented by the decoder.
  • the reconstructed image frame is not segmented by the decoder
  • the decoder receives a flag from the encoder stating whether the second frame from step 19 and 20 of the encoder description is a keyframe, i.e. not represented in relation to any other frame. If so, then the decoder returns to step 1.
  • the decoder receives motion information regarding the segments determined
  • step 3 from the encoder
  • the decoder begins to reconstruct a subsequent image frame using the
  • the decoder determines where areas, previously hidden, are now revealed, also known as the background residue
  • the decoder attempts to fill the background residue locations from step 6;
  • the decoder receives additional background residue information plus flags denoting the coding method for the additional background residue information from step 8 from the encoder;
  • the decoder decodes the additional background residue information
  • the computed background residue information and the added background residue information is added to the second image frame.
  • the decoder determines the location of the local segment, residues.
  • the decoder receives coded local segment residue information plus flags denoting the coding method for each local segment residue location;
  • the decoder decodes the local segment residue information
  • the decoded local segment residue information is added to the second frame.
  • the encoder receives the special instructions, if any, for each segments
  • Fig 3 is an overview drawing of the encoder/smart decoder system.
  • the encoder obtains, encodes and transmits the reference frame
  • decoder 3. Identical segments in the reference frame ai'e determined by both encoder and decoder;
  • the encoder obtains a new image frame
  • the encoder determines the motion of segments from step 3 by means of motion matching frame from step 5;
  • the encoder encodes motion information
  • the encoder determines previously, hidden areas, also known as background residue, which is now exposed in the second frame.
  • the encoder attempts to mathematical predict the image at the background residue regions.
  • the encoder determines if the mathematical prediction was good based upon the difference between the guess and the prediction. The encoder computes additional background residue if necessary.
  • the encoder determines structural information for the local segment residues
  • the encoder Based on the structural information from step 12, regarding the local residues, the encoder encodes the local segment residues. 14. The encoder determines if based upon the kinetic information of the segments, if the second frame should be coded in reference to the first frame. If not, it is coded as a keyframe and the routine begins at step 1.
  • the decoder receives the segment kinetic information from the encoder in step 7.
  • the decoder determines and orders the same background residue at the encoder did in step 8.
  • the decoder determines and orders the same local segment residues as determines in step 11 and 12.
  • the decoder receives the local segment residues information from the encoder and flags denoting the coding scheme.
  • the encoder receives the special information, if any, regarding each segment.
  • the encoder receives the reference frame, in this case, a picture of an automobile moving left to right with a mountain in the background.
  • the reference frame generally refers to the frame which any other frame is described in relation to.
  • Fig. 5 is the part of the flow diagram illustrating the procedure by which the encoder initially processes the reference frame.
  • Step 110 begins the process, specifically, the encoder receives the picture described in Fig.4.
  • the encoder encodes Fig 4, into a video format, and transmits it to the receptor at step 130.
  • the encoder reconstructs the encoded frame at step 140.
  • Segmentation is the process by which a digital image is subdivided into its component parts, i.e. segments, where each segment represents an area bounded by a radical or sharp change in values within the image.
  • segmentation can be done in a plurality of ways.
  • One such way is the watershed method where each pixel is connected to every other pixel in the image frame.
  • the watershed method segments the image by disconnecting pixels based upon a variety of algorithms. The remaining connected pixels belong to the same segment.
  • the encoder segments the reconstructed reference frame to determine the inherent structural features of the image.
  • the encoder segments the original image frame for the same purpose.
  • the encoder determines that the segments of Fig. 2 are the car, the wheels, the windows, the street, the sun, and the background.
  • the encoder orders the segments based upon a predetermined criteria and marks them Segments 1 through 8, respectively, as seen in Fig 7. Segmentation permits the encoder to perform efficient motion matching, motion prediction, and efficient residue coding as explained further in this description. 4.
  • the encoder encodes the kinetic or motion information regarding the movement of each segment.
  • the kinetic information is determined through a process known as motion matching.
  • Motion matching is the procedure of matching similar regions, often segments, from the first frame to the second frame. At each pixel within a digital image frame, an image is represented by numerical value. Matching occurs when a region in the first frame has identical or near identical pixel values with a region in the second frame.
  • a segment is matched with a segment in another frame when the absolute value of the difference in pixel values between the segments is below a predetermined threshold. While the absolute value of the pixel difference is often used to because it is simple and accounts for negative numbers any number of function would suffice.
  • frame 1 we have a soccer ball, with black and white squares.
  • frame 2 we have a soccer ball, with black and white squares.
  • the kinetic info ⁇ nation transmitted to the decoder can be reduced if related segments can be considered as single groups so that the encoder only needs to transmit one main representative motion vector to the decoder along with motion vector offsets to represent the individual motion of each segment within the group. Grouping is possible if there is previous kinetic information about the segments or if there is multi-scale information about the segments. Multi-scaling will be explained in section 4.2 of the encoder discussion.
  • the encoder determines if the first frame is a keyframe, i.e. not described in relation to other frames. If the first frame is a keyframe,
  • step 320 will execute the motion grouping routine, described here as section 4.1.
  • step 310 goes to step 330, where
  • step 340 executes the Multi-scaling routine in section 4.2, otherwise at step 350, the
  • the encoder cannot group the segments and then, at step 350, encoder determines that it cannot group any segments together.
  • Motion vector grouping begins at step 510 in Fig 10, where the previous motion vector of each segment is
  • the motion vector for the group is determined by combining the
  • motion vector difference i.e. the difference between the segment's motion vector and the
  • the encoder orders the groups. However, before the motion
  • step 555 described here in section 4.1.1.
  • the encoder considers a segment.
  • the encoder considers a segment.
  • the encoder determines if there is previous motion information for the segment so that its
  • step 640 The motion vector offset is initially predicted at step 650 as a function of the
  • the encoder makes the final calculation for the motion vector offset.
  • motion vector calculation could be the difference between the initial motion vector and
  • the encoder determines if there are any more segments, if so, then at
  • step 680 the encoder considers the next segment and continues at step 620. Otherwise
  • Multi-scaling grouping is an alternative to grouping segments by previous motion.
  • multi-scaling in may used in conjunction with motion grouping.
  • the stitching on a football may
  • the encoder considers the coarsest image scale (i.e. lowest resolution) for the frame
  • step 420 determines which segments have remained visible.
  • step 430 invisible segments which are wholly contained within a given visible segment
  • step 450 the encoder considers the next segment and continues at
  • step 430 Otherwise the Multi-scaling grouping process ceases.
  • the residue is the portion of the image left over after the structural information has been moved. Residue falls under two classifications; new
  • the encoder determines where the previously obstructed image regions occur.
  • the encoder orders the region using a predetermined ordering system. Using the information surrounding the regions, the encoder makes a mathematical guess as to the structure of the regions. Yet, the encoder also knows precisely what images were
  • Step 740 the encoder considers a region
  • the encoder will encode the
  • the encoder stores a flag
  • step 750 the encoder determines if there are ' any more newly unobstructed regions. If so the next region is considered and the routine continues at step 730, else it ceases at step 799.
  • the local residue is the portion of the image in the neighborhood of a segment, left over after the segments have .been moved, i.e. the car and mountain appear smaller in the subsequent frame.
  • the structure of the residue will depend on how different the new segments are from the previous segments. It may be a well-defined region, or set of regions, or it may be patchy. Different types of coding methods are ideal for different types of local residue. Since the decoder knows the segment motion, it knows where most of the local residues will be located.
  • the encoder determines the locations of the local residues and orders the regions where the local residues occurs using a pre-determined ordering scheme at section 820.
  • the encoder considers the first local residue, and makes a decision as the most efficient method of coding it and encodes it at step 840.
  • the encoder stores a flag denoting the coding mechanism as well as the coded residue at step 810.
  • step 850 If there are more local residue locations, step 860 will consider the next local
  • step 880 the keyframe routine at Fig 15a, step 880.
  • step 880 the encoder determines if the second frame should be coded as a keyframe. If yes, then step 885, the encoder discards the kinetic information, the background residue, and the local segment residues and continues at step
  • routine transmits the kinetic information, the background residue,
  • the encoder transmits embedded commands and instructions regarding each segment into the bitstream as necessary. Examples of these commands include, but are not limited to, getting static web pages, obtaining another video bitstream, waiting for text, etc.
  • the encoder can embed these commands at any point within the bitstream subsequent to the decoder ordering the segments.
  • Fig 14a is an example of one point where the commands are be embedded within the data stream.
  • the encoder At step 1610, the encoder considers the first segment. At step 1620, it transmits a special instruction. At step 1630, the encoder determines if there are any special instructions for the segment. If yes, then at step 1640, the instructions are
  • step 1650 the encoder determines if there are any more segments. If there are no special instructions associated with the segment, the encoder proceeds directly to step 1650. If there are more segments, at step 1660, the encoder considers the next segments are continues to step 1620, otherwise the routine ends at step 1699.
  • the decoder receives the encoded reference frame of a picture of an automobile moving left to right with a mountain in the background ( See Fig. 4).
  • the reference frame generally refers to the frame which other, subsequent frames are described in relation to.
  • Fig. 16 illustrates the flow diagram of the above process.
  • Step 910 begins the process where the decoder receives an encoded image frame.
  • the decoder reconstructs the encoded image frame.
  • the decoder receives a keyframe flag. This flag denotes whether the second frame is a keyframe or can it be reconstructed from the kinetic and residue information. If the second frame is a keyframe, then the decoder returns to step 910,
  • segmentation is the process by which a digital image is subdivided into its components parts, i.e. segments, where each segment represents ah area bounded by a radical or sharp change in values within the image.
  • the decoder segments the reconstructed reference frame to determine the inherent structural features of the image.
  • the decoder' determines that the segments in Fig. 4 are the car, the wheels, the doors, the windows, the street, the mountain and the background.
  • the decoder will order the
  • the decoder receives a keyframe flag from the encoder. This flag tells the encoder if the first frame is a keyframe.
  • the decoder receives the kinetic information regarding the movement of each segment.
  • the kinetic information tells the decoder the position of the segment in the new frame relative to its position in the previous frame.
  • the kinetic information is reduced if the segments with related motion can be grouped together and represented by one motion vector.
  • the kinetic information received by the decoder depends oh several factors: to wit; 1) the reference frame is a key frame, and 2) if not, is multi-scaling information available.
  • the decoder determines if the reference frame is
  • a keyframe i.e. a frame not defined in relation to any other frame. If so, then there is no
  • the decoder determines if there is multi-scale information available. If the first frame is
  • step 1 150 the motion vectors are
  • step 11 10 if the decoder determines that the first frame is not the
  • section 4.2 it may use the multi-scale grouping described in step 4.1
  • Multi-scale grouping only occurs when the first frame is a keyframe and there is
  • the decoder considers the coarsest image scale
  • step 1220 determine which segments have remained visible.
  • step 1240 If there are more
  • step 1260 the decoder considers the next segment and continues at
  • the encoder determines if there are any more segments, if so, then
  • the encoder considers the next segment and continues at step 1320.
  • the residue is the portion of the image left over after the structural information
  • Residue falls under two classifications; background and local residues.
  • the decoder considers the background residue regions and orders the regions at step 1420.
  • it makes a mathematical prediction on the structure of the first background residue location.
  • the decoder receives a flag denoting how good the prediction was and if correction is needed.
  • Step 1450 makes a decision, if the prediction is sufficient, the routine continues at step 1470, otherwise at step 1460, receives the encoded region and the flag denoting the coding scheme and reconstructs as necessary. If there are more background residue locations, at step 1470, the decoder, at step 1480, considers the next region and continues at step 1430. Otherwise the decoder goes to step 1490 where reconstruction continues and the process ceases.
  • the local segment residue is the portion of the image, in the neighborhood of the segment, left over after the segment has been moved, i.e. the car and the mountain appear smaller in the subsequent frame. Also, as explain before, the structure of the local residue may be varied. The decoder knows that most of the local residues will appear around the segments.
  • the decoder considers the first segment.
  • the decoder receives a flag denoting the coding method and receives the encoded local residue for that segment.
  • Step 1530 determines if there are any more segments and if not end at 1590 where reconstruction concludes. Otherwise at step 1540 the decoder considers the next segment and continues at step 1520. The routine ends at step 1599. 6. Special instructions
  • the decoder is capable of receiving and executing commands embedded within the bitstream and associated with the various segments.
  • the encoder and decoder are synchronized and are working with the same reference frame, the encoder is not required to transmit the structural information associated with the commands.
  • the embedded commands are held in abeyance until a user-driven event, i.e. a mouseclick, occurs.
  • Fig 24 is an example of one potential way to embed the commands.
  • the decoder considers the first segment, at step 1720 it received a special instruction flag.
  • the decoder determines, at step 1730, if there are special instructions or commands associated with the segment. If so, the decoder receives the commands at step 1740.
  • the decoder determine if there are any more segments. If there were no special instructions or commands, the decoder goes to step 1750 directly.
  • the decoder at step 1760, considers the next segment and continues at step 1720, otherwise the routine ends at step 1799.
  • the decoder determines if the user-driven event has occurred. If it has, the decoder determines which segment the user-driven event refers to at step 1820. At step 1830, the associated command is executed. The decoder proceeds to step 1840. If the user-driven event has not occurred, the routine proceeds directly to step 1840. At step 1840, if the termination command has been sent, the routine exits at step 1899, otherwise the routine continues at step 1810. 6.
  • the second frame is reconstructed into a video format based upon the kinetic motion of the segments, and local segment residues and the background residues.

Abstract

A video compression method and apparatus is disclosed. The present invention includes a 'smart' or active decoder (Fig. 3) that performs much of the transmission and the instruction burden that would otherwise be required of the encoder, thus greatly reducing the overhead and resulting in a much smaller encoded bitstream. Thus, the corresponding (i.e., compatible) encoder of the present invention can produce an encoded bitstream with a greatly reduced overhead. This is achieved by encoding a reference frame (Fig. 3, element 7) based on the structural information inherent to the image (e.g., image segmentation, geometry, color, and/or brightness), and then predicting other frames relative to the structural information. Typically, the description of a predicted frame would include kinetic information (Fig. 3, element 6) (e.g., segment motion data and/or inexact matches and appearance of new information, and portion of the segment evolution that is captured by motion per se etc.). Because the decoder is capable of independently determining the structural information (and relationships thereamong) underlying the predicted frame, such information need not be explicitly transmitted to the decoder. Rather, the encoder need only send information that the encoder knows the decoder cannot determine on its own.

Description

METHOD AND APPARATUS FOR EFFICIENT VIDEO PROCESSING
1. Brief Introduction
The present invention relates to the compression of motion video data, and more
particularly for a synchronized encoder and smart decoder system for the efficient transmittal and storage of motion video data. As consumers desire- more motion video
intensive modes of communications, the limited bandwidth of current transmission
modes, such as broadcast, cable, telephone lines, etc. becomes prohibitive. The
introductions of the Internet, and the subsequent popularity the world wide web, video
conferencing, digital and interactive television require more efficient ways of utilizing
existing bandwidth. Further, motion video intensive applications require immense
storage capacity. The advent of multi-media capabilities on most computer systems have
taxed tradition storage devices such as hard drives, to the limit.
Compression, as used in this patent, is the means by which digital motion video
can be represented efficiently and cheaply. The ultimate goal of video compression is to
reduce the bitstream, or video information flow, of the motion video sequences as much
as possible, while retaining enough information so that the decoder or receiver can
reconstruct the video image sequences in a manner adequate for the specific application,
such as television, videoconferencing, etc. The benefit of compression is that it allows
more information to be transmitted in a given amount of time, or stored in a given storage
medium.
Most digital signals contain a substantial amount of redundant, superfluous,
information. For example, a stationary Video scene produces nearly identical images in
each scene. Compression attempts to remove the superfluous information so that the related image frames can be represented in terms of the previous, thus eliminating the
need to transmit the entire scene for each video frame.
2. Previous attempts
There have been numerous attempts at adequately compressing video imagery. These methods generally fall into one of the following two categories: 1) Spatial
redundancy reduction, and 2) Temporal redundancy reduction.
2.1 Spatial Redundancy Removal
The first type of video compression focuses on the reduction of spatial redundancy. Spatial redundancy refers to taking advantage of the correlation among neighboring pixels in order to derive a more efficient representation of the important information in an image frame. These methods are more appropriately termed still image compression routines, as they do not attempt to address the issue of temporal, or frame to frame, redundancy, as explained in section 2.2. They work reasonably well on individual video image frames. However, a critical element in video compression is reducing
temporal redundancy, in other words, not having to retransmit, store, or otherwise fully
represent, information seen in previous frames. Common still image compression
schemes include JPEC, Wavelets, and Fractals. t2.1.1 JPEG/DCT based image compression
One of the first commonly used methods of image compression was the DCT, or
direct cosine transformation, compression system, which is at the heart of JPEG.
DCT operates by representing each digital image frame as a series of cosine waves or frequencies. Afterwards, the coefficients of the cosine series are quantized. The higher frequency coefficients are quantized more harshly than those of the lower
frequencies are. The result of the quantization is large number of zero coefficients, which can be encoded very efficiently. However, JPEG and similar compression schemes do
not address this crucial issue of temporal redundancy.
2.1.2 Wavelets
As a slight improvement to the DCT compression scheme, the wavelet transformation compression scheme was devised. This system is similar to the DCT. The only substantial difference is that the image frame is represented as a series of wavelets, or windowed oscillations, instead of as a series of cosine waves.
2.1.3 Fractals
The goal of fractal compression is to take an image and determine the single
function or set of functions, which fully describe the image frame. A fractal is an object that is self-similar at different scales, or resolutions, i.e. no matter what resolution you look at, the object remains the same. Theoretically, fantastic compression ratios could occur as simple equations describe complex images.
Fractal compression is not a viable method of general compression. The high
compression rations only work on specially constructed images, and only with considerable help from a person guiding the compression process. Fractal Compression
is a computationally intensive process.
2.2 Temporal and Spatial Redundancy Removal
Adequate motion video compression requires reduction of both temporal and
spatial redundancies within the sequence of frames that comprise video. Temporal redundancy removal is concerned with the removal from the bitstream, information that had already been coded in previous image frames. Block matching is the basis for most currently used effective means of temporal redundancy removal. 2.2.1 Block Based Motion Estimation Block Matching is the process by which a block' of the image is subdivided into uniform size blocks and each block is tracked from one frame to another and represented by a motion vector instead of having the block re-coded and placed into the bitstream for a second time. Examples of compression routines that use block matching include MPEG, and all its variants.
MPEG operates by performing a still image compression on the first frame and transmitting it. It then divides the same frame into 16 pixel by 16 pixel square blocks and attempts to find each block within the next frame. For each block that still exists in the subsequent frame, MPEG needs only transmit the motion vector, or movement, of the
block along with sufficient identifying information. As the block moves from frame to
frame, it may not remain the same. The difference is known as the residue. Additionally,
as blocks move, previously hidden areas may become visible for the first time. This is also known as the residue. Collectively, the remaining information after the block motion is sent is known as the residue frame, which is coded using JPEG and sent to the receiver to complete the image frame.
Next, the encoder divides the second image frame into blocks and the routine
continues until a new keyframe is inserted. A keyframe is an image frame which is
completely self-contained, not described in relation to any other image frame.
Although state of the art, block matching is highly inefficient and fails to
take advantage of the known general physical characteristics of images. For example, the
block method is inherently crude, as the blocks do not have any relationship with real objects in the image. A given block may comprise a part of an object, a whole object, or even multiple dissimilar objects with unrelated motion. In addition, often, neighboring objects will have similar motion. However, since blocks do not correspond to real objects, block based systems cannot use this information to further reduce the bitstream
Another major limitation of block based matches is the residue frame coding. The residue frame created after block based matching will generally be noisy and patchy and does not lend itself to good compression via standard image compression schemes such as DCT, wavelets, or fractals. 2.3 Alternatives It is well recognized that the current state of the art needs improvement, specifically the block based method is extremely inefficient and does not produce an optimally compressed bitstream for motion video information. To that end, the latest compression schemes, such as MPEG4 allows for the inclusion of the structural information, if available, of selected items within the frames instead of merely using arbitrary sized blocks. While, some compression gains are achieved, the overhead information is substantially increased because in addition to the motion and residue information these schemes require that the structural or shape information for each item must be sent to the
receiver. This is because all current compression schemes use a dumb receiver, one,
which is incapable of making determinations for itself.
Additionally, as mentioned above, the current compression methods code the residue
frame merely another image frame to be compressed by JPEG, without attempting to
determine is more efficient methods are possible.
3. Novel Approaches
This invention represents a novel approach to the problem of video compression. As described above, the goal of video compression is to represent accurately a sequence
of video frames with the smallest bitstream, or video information flow. As previously stated, spatial redundancy reduction methods above are inappropriate for motion video compression. Further, the current temporal and spatial redundancy reduction methods such as MPEG2 waste precious bitstream space by having to transmit a lot of overhead information. This invention solves that problem by using a smart decoder. This smart decoder determines much of the overhead information, thus obviating the necessity of transmitting such information, and therefore reducing the bitstream accordingly.
The smart decoder also makes the same predictions about the subsequent images in the related sequence of images as the encoder. Thus, the encoder can simply send the difference between the prediction and the actual values, thus also reducing the bitstream,
DETAILED DESCRIPTION
Introduction/Summary
Compression of digital motion video is the process by which superfluous or redundant information, both spatial and temporal, contained within a sequence of related video frames (frames) is removed. Video compression allows the sequence of frames to
be represented by a reduced bitstream, or data flow, while retaining its capacity to be reconstructed in a visually sufficient manner.
Traditional methods of video compression place most of the compression burden,
i.e. computational and transmittal, on the encoder, while minimally using the decoder. A
tradition video encoder/decoder system requires that the encoder makes all the
calculations, inform the decoder of its decisions, then transmit the video data to the encoder along with instructions for reconstruction of each image.
This invention is novel in that it. uses a smart decoder to take much of the
transmission and instructional burden from the encoder which results in a much smaller bitstream. Specifically, absent from the bitstream is the information regarding the structural information inherent within the image frame, such as geometry, color, and brightness, which, in a complex frame is a significant amount of video information. Further, absent from the bitstream is information regarding any decision made by the encoder such as segment ordering, segment association and disassociation, etc.
Fig. 1 is an overview drawing of the encoder for use with a compatible decoder as will be described later with respect to Fig. 2. The encoder works as follows:
1. The encoder obtains a reference image frame;
2. The encoder encodes the image frame from step 1;
3. The encoded image from step 2 is reconstructed by the encoder, in the same manner as the decoder will;
4. The encoder segments the reconstructed image from step 4; Alternatively, the encoder segments the original reference image frame from step 1;
5. The segments determined in step 4 are ordered by the encoder, in the same manner as the decoder will;
6. The encoder obtains a new image frame;
7. The motion or kinetic information of each segment, determined in step 4, from the reconstructed, or original image in step 3, to the new image frame in
step 6 is determined by motion matching;
8. The encoder encodes the kinetic information;
9. Based on the motion information from step 8, previously hidden regions, also known as the background residue, in the first frame may be exposed in the second frame; 10. The encoder orders the Background residues, in the same manner as the decoder will;
11. The encoder attempts to fill each of the background residues from step 9 and 10.
12. The encoder determines the difference between the predicted fill and the actual fill for each of the background residue areas.
13. The encoder determines the local residue areas in the second image frame, from the segment motion information;
14. The encoder orders the local residues from step 13, in the same manner as the, decoder will;
15. The encoder encodes the local residues from step 13.
16. The encoder determines any special instructions associated with the segment information
17. If the image can be reasonably reconstructed primarily from the kinetic information, with assistance from the background residue and the local segment residues, the encoder transmits the following information, and reconstructs the second frame, and continues at step 6:
a. Flag denoting that the second frame is not a keyframe;
b. The kinetic information for the segments;
c. The special instructions for the segments; d. The background residue information along with flags denoting coding; e. The local residue information along with flags denoting coding; 18. If the image cannot not be reconstructed in relation to the reference frame, the image is encoded as a flag transmitted to inform the decoder, and the encoder continues at step 2. Fig 2 is an overview drawing of the decoder system with a compatible encoder as described in Fig 1.. The decoder system works as follows:
1. The decoder receives a first encoded image frame from step 3 of the encoder description;
2. The encoded image frame from step 1 is reconstructed by the decoder in the same manner as the encoder;
3. The reconstructed image frame from step 2 is segmented by the decoder. Alternatively, the reconstructed image frame is not segmented by the decoder
4. The decoder receives a flag from the encoder stating whether the second frame from step 19 and 20 of the encoder description is a keyframe, i.e. not represented in relation to any other frame. If so, then the decoder returns to step 1.
5. The decoder receives motion information regarding the segments determined
in step 3 from the encoder;
6. The decoder begins to reconstruct a subsequent image frame using the
segments obtained in step 3 and motion information obtained in step 4;
7. Based on the motion information from step 4 regarding the segments determined in step 3, the decoder determines where areas, previously hidden, are now revealed, also known as the background residue;
8. The previously background residue locations from step 6 are ordered in the
same manner as in the encoder; 9. The decoder attempts to fill the background residue locations from step 6;
10. The decoder receives additional background residue information plus flags denoting the coding method for the additional background residue information from step 8 from the encoder;
11. The decoder decodes the additional background residue information;
12. The computed background residue information and the added background residue information is added to the second image frame.
13. Based on the motion information from step 4 regarding the segments determined in step 3, the decoder determines the location of the local segment, residues.
14. The local segment residue locations are ordered in the same manner as the
encoder does;
15. The decoder receives coded local segment residue information plus flags denoting the coding method for each local segment residue location;
16. The decoder decodes the local segment residue information;
17. The decoded local segment residue information is added to the second frame.
18. The encoder receives the special instructions, if any, for each segments
19. Reconstruction of the second frame is complete;
20. If there are more frames, the routine continues at step 4
Fig 3 is an overview drawing of the encoder/smart decoder system. The
encoder/smart decoder system works as follows:
1. The encoder obtains, encodes and transmits the reference frame;
2. The reference frame from step 2 is reconstructed by both encoder and
decoder; 3. Identical segments in the reference frame ai'e determined by both encoder and decoder;
4. The segments from step 3, are ordered in the same way by both the encoder and decoder;
5. The encoder obtains a new image frame;
6. The encoder determines the motion of segments from step 3 by means of motion matching frame from step 5;
7. The encoder encodes motion information;
8. Based on motion information from step 7, the encoder determines previously, hidden areas, also known as background residue, which is now exposed in the second frame.
9. The encoder attempts to mathematical predict the image at the background residue regions.
10. The encoder determines if the mathematical prediction was good based upon the difference between the guess and the prediction. The encoder computes additional background residue if necessary.
11. Based on segment information in step 3, and the motion information from step 7, the encoder determines structural information for the local segment residues;
12. Structural information for the local residues from step 11 are ordered by the
decoder.
13. Based on the structural information from step 12, regarding the local residues, the encoder encodes the local segment residues. 14. The encoder determines if based upon the kinetic information of the segments, if the second frame should be coded in reference to the first frame. If not, it is coded as a keyframe and the routine begins at step 1.
15. The decoder receives the segment kinetic information from the encoder in step 7.
16. The decoder determines and orders the same background residue at the encoder did in step 8.
17. The decoder makes the identical guess as to the structure of the background
residue as the encoder did in step
18. The decoder determines and orders the same local segment residues as determines in step 11 and 12.
19. The decoder receives the local segment residues information from the encoder and flags denoting the coding scheme.
20. The decoder receives the additional background residue information from the
encoder.
21. The encoder receives the special information, if any, regarding each segment.
22. Based upon the kinetic information, the local segment residues, and the
background residues, both the encoder and decoder identically reconstruct the
second frame.
23. The second frame is now the reference frame and the process continues at step
5.
ENCODER WRITE-UP
2. Reference Frame Transmission Referring to Fig 4, the encoder receives the reference frame, in this case, a picture of an automobile moving left to right with a mountain in the background. The reference frame generally refers to the frame which any other frame is described in relation to.
Fig. 5 is the part of the flow diagram illustrating the procedure by which the encoder initially processes the reference frame. Step 110 begins the process, specifically, the encoder receives the picture described in Fig.4. At step 120, the encoder encodes Fig 4, into a video format, and transmits it to the receptor at step 130. The encoder reconstructs the encoded frame at step 140.
3. Segmentation
Segmentation is the process by which a digital image is subdivided into its component parts, i.e. segments, where each segment represents an area bounded by a radical or sharp change in values within the image.
Persons well versed in the art of computer vision will be aware that segmentation can be done in a plurality of ways. One such way is the watershed method where each pixel is connected to every other pixel in the image frame. As seen in Fig 6, the watershed method segments the image by disconnecting pixels based upon a variety of algorithms. The remaining connected pixels belong to the same segment.
Referring to Fig. 6, At step 210, the encoder segments the reconstructed reference frame to determine the inherent structural features of the image. Alternatively, at step 210, the encoder segments the original image frame for the same purpose. The encoder determines that the segments of Fig. 2 are the car, the wheels, the windows, the street, the sun, and the background. At step 220, the encoder orders the segments based upon a predetermined criteria and marks them Segments 1 through 8, respectively, as seen in Fig 7. Segmentation permits the encoder to perform efficient motion matching, motion prediction, and efficient residue coding as explained further in this description. 4. Kinetic Information
Once segmentation has been accomplished, the encoder encodes the kinetic or motion information regarding the movement of each segment.
The kinetic information is determined through a process known as motion matching. Motion matching is the procedure of matching similar regions, often segments, from the first frame to the second frame. At each pixel within a digital image frame, an image is represented by numerical value. Matching occurs when a region in the first frame has identical or near identical pixel values with a region in the second frame.
Generally speaking, a segment is matched with a segment in another frame when the absolute value of the difference in pixel values between the segments is below a predetermined threshold. While the absolute value of the pixel difference is often used to because it is simple and accounts for negative numbers any number of function would suffice.
In Fig 7a, we see an example of motion matching of a soccer ball between frames
1 and 2. In frame 1, we have a soccer ball, with black and white squares. In frame 2, we
have a brownish orange basketball next to the soccer ball. Subtraction of the pixels
values contained within the basketball in frame 2 from the soccer ball in frame 1 yield a
relatively arbitrary set of non-zero differences. Thus the soccer ball and basketball will
not be matched. However, subtraction of the soccer ball in frame 2 from the soccer ball in frame 1 yields a set of mostly zero and close to zero values. Thus the two soccer balls would be considered matched. The kinetic infoπnation transmitted to the decoder can be reduced if related segments can be considered as single groups so that the encoder only needs to transmit one main representative motion vector to the decoder along with motion vector offsets to represent the individual motion of each segment within the group. Grouping is possible if there is previous kinetic information about the segments or if there is multi-scale information about the segments. Multi-scaling will be explained in section 4.2 of the encoder discussion.
Referring to Fig 8, at step 310, the encoder determines if the first frame is a keyframe, i.e. not described in relation to other frames. If the first frame is a keyframe,
then there isn't any previous kinetic information and grouping is only possible if there is multi-scale information regarding the image frame. However, if the first frame is not a keyframe, then there will be some previous kinetic information to group segments. Therefore, if the first frame is not a keyframe, step 320, will execute the motion grouping routine, described here as section 4.1.
However, if the first frame is a keyframe, then step 310, goes to step 330, where
the encoder determines if there is any multi-scale information available to it. If there is, then step 340 executes the Multi-scaling routine in section 4.2, otherwise at step 350, the
encoder decides not to group any segments.
If the first frame is a keyframe, and thus previous kinetic information is not
available, and there is no multi-scale information available either, the encoder cannot group the segments and then, at step 350, encoder determines that it cannot group any segments together.
4.1 Motion Vector Grouping Motion vector grouping only occurs when there is previous motion information so
that the encoder can determine which segments to associate. Motion vector grouping begins at step 510 in Fig 10, where the previous motion vector of each segment is
considered. Segments which exhibit similar motion vectors are grouped together at step
520. At step 530, the motion vector for the group is determined by combining the
motion vectors within the groups. Thus, for each segment within the group, only the
motion vector difference, i.e. the difference between the segment's motion vector and the
characteristic motion vector will be eventually transmitted. (See step 540) One example
of a characteristic motion vector would be an average motion vector.
At step 550, the encoder orders the groups. However, before the motion
information can be transmitted, further reduction might occur through motion prediction
at step 555, described here in section 4.1.1. Once the motion information is determined it
is stored at step 560.
4.1.1 Motion Prediction
Referring to Fig 1 1, at step 610, the encoder considers a segment. At step 620,
the encoder determines if there is previous motion information for the segment so that its
motion can be predicted. If there isn't any previous motion information, the encoder
chooses the next segment and continues.
If there is previous motion information the encoder predicts the motion of the
segment at step 630 and compares its prediction to the actual motion of the segment at
step 640. The motion vector offset is initially predicted at step 650 as a function of the
actual and predicted motion vectors. An example of a motion vector calculation would
be the difference between the actual and predicted motion vectors. At step 660 the encoder makes the final calculation for the motion vector offset. An example of the final
motion vector calculation could be the difference between the initial motion vector and
the characteristic motion vector.
At step 670, the encoder determines if there are any more segments, if so, then at
step 680, the encoder considers the next segment and continues at step 620. Otherwise
the prediction routine ends.
4.2 Multi-Scale Grouping
Multi-scaling grouping is an alternative to grouping segments by previous motion.
Moreover, multi-scaling in may used in conjunction with motion grouping. Multi-scaling
is the process of creating lower resolution versions of an image. An example of creating
multiple scales is through the repeated application of a smoothing function. The result of
creating lower resolution images is that as the resolution decreases, only larger, more
dominant features remain visible. Thus for example, the stitching on a football may
become invisible at lower resolutions, yet the football itself remains discernible
An example of the multi-scale processes is as follows: referring to Fig. 9 at step
410, the encoder considers the coarsest image scale (i.e. lowest resolution) for the frame
and at step 420 determines which segments have remained visible. The coarsest image
scale is used because at that point, only the absolute largest, most dominant features
remain, usually corresponding to the outline of major objects remain visible. While
smaller, less dominant segments are no longer discernible at the lower resolutions. At
step 430, invisible segments which are wholly contained within a given visible segment
are associated with the segment and considered one group. This is because the smaller,
now invisible segments are often share a relationship with the larger object and will likely have similar kinetic information. A decision is made at step 440. If there are more
visible segments, at step 450, the encoder considers the next segment and continues at
step 430. Otherwise the Multi-scaling grouping process ceases.
5. Residue Coding
Referring to Figs. 12-15, the residue is the portion of the image left over after the structural information has been moved. Residue falls under two classifications; new
information and local residues.
5.1 New information
As shown in Fig. 12, as the segment moves, previously hidden or obstructed areas may become visible for the first time. In Fig. 12, three regions become visible as the car moves. They are the area behind the back of the car and the two areas behind the wheels. These are marked regions 1 through 3, respectively. Referring to Fig 13, at step 710, the encoder determines where the previously obstructed image regions occur. At step 720, the encoder orders the region using a predetermined ordering system. Using the information surrounding the regions, the encoder makes a mathematical guess as to the structure of the regions. Yet, the encoder also knows precisely what images were
revealed at these regions. Thus at Step 740, the encoder considers a region and
determines if the mathematical prediction was sufficient by comparing the guess with the
actual image. If the prediction was not close, at step 770, the encoder will encode the
region or the difference and store the encoded information with a flag denoting the coding mechanism. Otherwise, if the guess was close enough, the encoder stores a flag
denoting that fact at step 745. At step 750, the encoder determines if there are' any more newly unobstructed regions. If so the next region is considered and the routine continues at step 730, else it ceases at step 799.
5.2 Local residues
Referring to Fig. 14, the local residue is the portion of the image in the neighborhood of a segment, left over after the segments have .been moved, i.e. the car and mountain appear smaller in the subsequent frame. The structure of the residue will depend on how different the new segments are from the previous segments. It may be a well-defined region, or set of regions, or it may be patchy. Different types of coding methods are ideal for different types of local residue. Since the decoder knows the segment motion, it knows where most of the local residues will be located.
Referring to Fig 15, at step 810, the encoder determines the locations of the local residues and orders the regions where the local residues occurs using a pre-determined ordering scheme at section 820. At step 830, the encoder considers the first local residue, and makes a decision as the most efficient method of coding it and encodes it at step 840. The encoder stores a flag denoting the coding mechanism as well as the coded residue at
step 850. If there are more local residue locations, step 860 will consider the next local
residue location and continue at step 840, otherwise the at step 870, the encoder executes
the keyframe routine at Fig 15a, step 880.
Referring to Fig 15a, at step 880, the encoder determines if the second frame should be coded as a keyframe. If yes, then step 885, the encoder discards the kinetic information, the background residue, and the local segment residues and continues at step
120. Otherwise, the routine transmits the kinetic information, the background residue,
and the local segment residues to the decoder at step 890. 6. Special Commands
The encoder transmits embedded commands and instructions regarding each segment into the bitstream as necessary. Examples of these commands include, but are not limited to, getting static web pages, obtaining another video bitstream, waiting for text, etc.
The encoder can embed these commands at any point within the bitstream subsequent to the decoder ordering the segments. Fig 14a, is an example of one point where the commands are be embedded within the data stream.
Referring to Fig 14a, at step 1610, the encoder considers the first segment. At step 1620, it transmits a special instruction. At step 1630, the encoder determines if there are any special instructions for the segment. If yes, then at step 1640, the instructions are
transmitted to the decoder and at step 1650 the encoder determines if there are any more segments. If there are no special instructions associated with the segment, the encoder proceeds directly to step 1650. If there are more segments, at step 1660, the encoder considers the next segments are continues to step 1620, otherwise the routine ends at step 1699.
DECODER DESCRIPTION
2. Reference Frame Reception
Referring to Fig 16, the decoder receives the encoded reference frame of a picture of an automobile moving left to right with a mountain in the background ( See Fig. 4). The reference frame generally refers to the frame which other, subsequent frames are described in relation to. Fig. 16 illustrates the flow diagram of the above process. Step 910 begins the process where the decoder receives an encoded image frame. At step 920, the decoder reconstructs the encoded image frame.
At step 930, the decoder receives a keyframe flag. This flag denotes whether the second frame is a keyframe or can it be reconstructed from the kinetic and residue information. If the second frame is a keyframe, then the decoder returns to step 910,
where it received the keyframe as a first frame, otherwise the routine continues. 3. Segmentation
As previously described, segmentation is the process by which a digital image is subdivided into its components parts, i.e. segments, where each segment represents ah area bounded by a radical or sharp change in values within the image.
Referring to Fig 17, at step 1010, the decoder segments the reconstructed reference frame to determine the inherent structural features of the image. The decoder' determines that the segments in Fig. 4 are the car, the wheels, the doors, the windows, the street, the mountain and the background. At step 1020, the decoder will order the
segments based upon the same predetermined criteria as the encoder and mark the
segments as 1 through 10 as seen in Fig 7.
4. Kinetic Information
Once segmentation has been accomplished, the decoder receives a keyframe flag from the encoder. This flag tells the encoder if the first frame is a keyframe. The
decoder receives the kinetic information regarding the movement of each segment. The kinetic information tells the decoder the position of the segment in the new frame relative to its position in the previous frame. The kinetic information is reduced if the segments with related motion can be grouped together and represented by one motion vector. The kinetic information received by the decoder depends oh several factors: to wit; 1) the reference frame is a key frame, and 2) if not, is multi-scaling information available.
Referring to Fig. 18, at step 11 10, the decoder determines if the reference frame is
a keyframe, i.e. a frame not defined in relation to any other frame. If so, then there is no
previous motion information for potential grouping of segments, therefore the decoder
attempts to use multi-scale information for segment grouping, if available. At step 1 120,
the decoder determines if there is multi-scale information available.. If the first frame is
a keyframe and there is multi-scale information available to the decoder, the decoder will
initially group related segments together using the multi-scale routine executed at step
1 130, and described in section 4.1 of the description. Conversely, if there is no multi-
scale information available for the first frame, then at step 1 150, the motion vectors are
transmitted by the encoder and received by the decoder.
However, at step 11 10, if the decoder determines that the first frame is not the
keyframe, then it executes the motion grouping routine at step 1 140, and described in
section 4.2. Alternatively, it may use the multi-scale grouping described in step 4.1
4.1 Multi-Scale Grouping
Multi-scale grouping only occurs when the first frame is a keyframe and there is
multi-scale information available to the decoder.
Referring to Fig. 19 at step 1210, the decoder considers the coarsest image scale
for the frame and at step 1220 determine which segments have remained visible. At step
1230, invisible segments which are wholly contained within the a given visible segment
are associated with the segment. A decision is made at step 1240. If there are more
visible segments, at step 1260, the decoder considers the next segment and continues at
99 step 1230. Otherwise the Multi-scaling grouping process receives the motion vectors and
motion vector offsets for the segments then ceases.
4.2 Motion Vector Grouping Referring to Fig 20, at step 1310, the decoder considers a segment. At step 1320, the
encoder determines if there is previous motion information for the segment so that its
motion can be predicted. If there isn't any previous motion information, the encoder
chooses the next segment and continues.
If there is previous motion information the encoder predicts the motion of the
segment at step 1330 and receives the motion vector prediction correction at step 1340.
At step 1350, the encoder determines if there are any more segments, if so, then
at step 1360, the encoder considers the next segment and continues at step 1320.
Otherwise the prediction routine ends. 5. Residue Coding
The residue is the portion of the image left over after the structural information
has been moved. Residue falls under two classifications; background and local residues.
5.1 Background residue
As shown in Fig 12, as the car moves, previously hidden or obstructed areas may become visible for the first time. The decoder knows where these areas are and orders them using a predetermined ordering scheme. In Fig 12. Three regions become unobstructed, specifically, behind the car, and behind the two wheels. These regions are marked Regions 1 through 3, as seen in Fig 12.
Referring to Fig 21, at step 1410, the decoder considers the background residue regions and orders the regions at step 1420. At step 1430, it makes a mathematical prediction on the structure of the first background residue location. At step 1440, the decoder receives a flag denoting how good the prediction was and if correction is needed. Step 1450 makes a decision, if the prediction is sufficient, the routine continues at step 1470, otherwise at step 1460, receives the encoded region and the flag denoting the coding scheme and reconstructs as necessary. If there are more background residue locations, at step 1470, the decoder, at step 1480, considers the next region and continues at step 1430. Otherwise the decoder goes to step 1490 where reconstruction continues and the process ceases.
5.2 Local residues
Referring to Fig 15, as previously explained, the local segment residue is the portion of the image, in the neighborhood of the segment, left over after the segment has been moved, i.e. the car and the mountain appear smaller in the subsequent frame. Also, as explain before, the structure of the local residue may be varied. The decoder knows that most of the local residues will appear around the segments.
Referring to Fig. 23, at step 1510, the decoder considers the first segment. At step 1520, the decoder receives a flag denoting the coding method and receives the encoded local residue for that segment. Step 1530 determines if there are any more segments and if not end at 1590 where reconstruction concludes. Otherwise at step 1540 the decoder considers the next segment and continues at step 1520. The routine ends at step 1599. 6. Special instructions
In addition to structural information regarding the image frame, the decoder is capable of receiving and executing commands embedded within the bitstream and associated with the various segments. As before, because the encoder and decoder are synchronized and are working with the same reference frame, the encoder is not required to transmit the structural information associated with the commands. The embedded commands are held in abeyance until a user-driven event, i.e. a mouseclick, occurs. Fig 24 is an example of one potential way to embed the commands.
Referring to Fig 24, at step 1710, the decoder considers the first segment, at step 1720 it received a special instruction flag. The decoder determines, at step 1730, if there are special instructions or commands associated with the segment. If so, the decoder receives the commands at step 1740. At step 1750, the decoder determine if there are any more segments. If there were no special instructions or commands, the decoder goes to step 1750 directly.
If there are more segments, the decoder, at step 1760, considers the next segment and continues at step 1720, otherwise the routine ends at step 1799.
Referring to Fig 25, at step 1810, the decoder determines if the user-driven event has occurred. If it has, the decoder determines which segment the user-driven event refers to at step 1820. At step 1830, the associated command is executed. The decoder proceeds to step 1840. If the user-driven event has not occurred, the routine proceeds directly to step 1840. At step 1840, if the termination command has been sent, the routine exits at step 1899, otherwise the routine continues at step 1810. 6. Reconstruction
The second frame is reconstructed into a video format based upon the kinetic motion of the segments, and local segment residues and the background residues.
Video format
The description in the previous sections titled encoder and decoder description
defines a specific new video format.

Claims

WHAT IS CLAIMED IS: L A method of transmitting video information comprising: (a) obtaining a first video frame containing image data; (b) obtaining structural information inherent in said image data; (c) obtaining a second video frame to be encoded relative to said first video frame; (d) computing kinetic information for describing said second video frame in terms of said structural information of said first video frame; and (e) transmitting said kinetic information to a decoder for use in reconstructing said second video frame based on said decoder's generation of said structural information of said first video frame.
PCT/US2000/010451 1999-04-17 2000-04-17 Method and apparatus for efficient video processing WO2000064148A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU44685/00A AU4468500A (en) 1999-04-17 2000-04-17 Method and apparatus for efficient video processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12985499P 1999-04-17 1999-04-17
US60/129,854 1999-04-17

Publications (2)

Publication Number Publication Date
WO2000064148A1 true WO2000064148A1 (en) 2000-10-26
WO2000064148A9 WO2000064148A9 (en) 2002-05-02

Family

ID=22441924

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2000/010451 WO2000064148A1 (en) 1999-04-17 2000-04-17 Method and apparatus for efficient video processing
PCT/US2000/010439 WO2000064167A1 (en) 1999-04-17 2000-04-17 Method and apparatus for efficient video processing

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/US2000/010439 WO2000064167A1 (en) 1999-04-17 2000-04-17 Method and apparatus for efficient video processing

Country Status (6)

Country Link
EP (1) EP1180308A4 (en)
JP (2) JP4943586B2 (en)
KR (1) KR20020047031A (en)
AU (2) AU4468000A (en)
IL (1) IL145956A0 (en)
WO (2) WO2000064148A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002051158A2 (en) * 2000-12-20 2002-06-27 Pulsent Corporation Method of filling exposed areas in digital images
US7305033B2 (en) 2002-10-16 2007-12-04 Telecommunications Advancement Organization Of Japan Method of encoding and decoding motion picture, motion picture encoding device and motion picture decoding device

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW548993B (en) 2000-12-19 2003-08-21 Pulsent Corp Method and apparatus for deblurring and re-blurring image segments
US7792390B2 (en) 2000-12-19 2010-09-07 Altera Corporation Adaptive transforms
AU2002241713A1 (en) 2000-12-20 2002-07-01 Pulsent Corporation Efficiently adaptive double pyramidal coding
US6690823B2 (en) 2002-01-31 2004-02-10 Pts Corporation Method and apparatus for partitioning an arbitrarily-shaped area
US6909749B2 (en) 2002-07-15 2005-06-21 Pts Corporation Hierarchical segment-based motion vector encoding and decoding
US7639741B1 (en) 2002-12-06 2009-12-29 Altera Corporation Temporal filtering using object motion estimation
US7618706B2 (en) * 2004-01-01 2009-11-17 Dsm Ip Assets B.V. Process for making high-performance polyethylene multifilament yarn
KR20060088461A (en) * 2005-02-01 2006-08-04 엘지전자 주식회사 Method and apparatus for deriving motion vectors of macro blocks from motion vectors of pictures of base layer when encoding/decoding video signal
FR2917872A1 (en) * 2007-06-25 2008-12-26 France Telecom METHODS AND DEVICES FOR ENCODING AND DECODING AN IMAGE SEQUENCE REPRESENTED USING MOTION TUBES, COMPUTER PROGRAM PRODUCTS AND CORRESPONDING SIGNAL.

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026182A (en) * 1995-10-05 2000-02-15 Microsoft Corporation Feature segmentation
US6057884A (en) * 1997-06-05 2000-05-02 General Instrument Corporation Temporal and spatial scaleable coding for video object planes

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06197334A (en) * 1992-07-03 1994-07-15 Sony Corp Picture signal coding method, picture signal decoding method, picture signal coder, picture signal decoder and picture signal recording medium
TW301098B (en) * 1993-03-31 1997-03-21 Sony Co Ltd
US5703646A (en) * 1993-04-09 1997-12-30 Sony Corporation Picture encoding method, picture encoding apparatus and picture recording medium
JPH0746595A (en) * 1993-05-21 1995-02-14 Nippon Telegr & Teleph Corp <Ntt> Moving image encoder and decoder
EP0625853B1 (en) * 1993-05-21 1999-03-03 Nippon Telegraph And Telephone Corporation Moving image encoder and decoder
US5654760A (en) * 1994-03-30 1997-08-05 Sony Corporation Selection of quantization step size in accordance with predicted quantization noise
EP0720373A1 (en) * 1994-12-30 1996-07-03 Daewoo Electronics Co., Ltd Method and apparatus for encoding a video signal using region-based motion vectors
JP4223571B2 (en) * 1995-05-02 2009-02-12 ソニー株式会社 Image coding method and apparatus
EP1003324B1 (en) * 1995-05-08 2003-07-16 Digimarc Corporation Forgery-resistant documents with images conveying secret data and related methods
US5812787A (en) * 1995-06-30 1998-09-22 Intel Corporation Video coding scheme with foreground/background separation
EP0783820B1 (en) * 1995-08-02 2001-10-10 Koninklijke Philips Electronics N.V. Method and system for coding an image sequence
JPH09161071A (en) * 1995-12-12 1997-06-20 Sony Corp Device and method for making areas correspond to each other
KR19990072122A (en) * 1995-12-12 1999-09-27 바자니 크레이그 에스 Method and apparatus for real-time image transmission

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026182A (en) * 1995-10-05 2000-02-15 Microsoft Corporation Feature segmentation
US6057884A (en) * 1997-06-05 2000-05-02 General Instrument Corporation Temporal and spatial scaleable coding for video object planes

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002051158A2 (en) * 2000-12-20 2002-06-27 Pulsent Corporation Method of filling exposed areas in digital images
WO2002051158A3 (en) * 2000-12-20 2003-04-24 Pulsent Corp Method of filling exposed areas in digital images
US7133566B2 (en) 2000-12-20 2006-11-07 Altera Corporation Method of filling exposed areas in digital images
US7305033B2 (en) 2002-10-16 2007-12-04 Telecommunications Advancement Organization Of Japan Method of encoding and decoding motion picture, motion picture encoding device and motion picture decoding device

Also Published As

Publication number Publication date
WO2000064167A1 (en) 2000-10-26
WO2000064148A9 (en) 2002-05-02
JP2003524314A (en) 2003-08-12
EP1180308A1 (en) 2002-02-20
JP5130381B2 (en) 2013-01-30
AU4468500A (en) 2000-11-02
KR20020047031A (en) 2002-06-21
JP4943586B2 (en) 2012-05-30
EP1180308A4 (en) 2009-12-23
AU4468000A (en) 2000-11-02
IL145956A0 (en) 2002-07-25
JP2011142663A (en) 2011-07-21

Similar Documents

Publication Publication Date Title
US6600786B1 (en) Method and apparatus for efficient video processing
US5594504A (en) Predictive video coding using a motion vector updating routine
JP5130381B2 (en) Method and apparatus for efficient video processing
EP1294194B1 (en) Apparatus and method for motion vector estimation
EP0528293B1 (en) Apparatus for reducing quantization artifacts in an interframe hybrid coding system with motion compensation
US20090304090A1 (en) Method for Scalable Video Coding
WO2002065784A1 (en) Motion information coding and decoding method
Kauff et al. Functional coding of video using a shape-adaptive DCT algorithm and an object-based motion prediction toolbox
US20070047642A1 (en) Video data compression
US5654761A (en) Image processing system using pixel-by-pixel motion estimation and frame decimation
KR0159370B1 (en) Method and apparatus for encoding a video signals using a boundary of an object
US6909748B2 (en) Method and system for image compression using block size heuristics
JP3950211B2 (en) Motion vector encoding device
US5760845A (en) Method for determining motion vectors based on a block matching motion estimation technique
JP2000503174A (en) Method for motion estimation
US6606414B1 (en) Method and device for coding a digitized image
KR100240344B1 (en) Adaptive vertex coding apparatus and method
Lee et al. Three-dimensional DCT/WT compression using motion vector segmentation for low bit-rate video coding
Ismaeil et al. An efficient, similarity-based error concealment method for block-based coded images
Strat Object-based encoding: next-generation video compression
KR100203658B1 (en) Apparatus for estimating motion of contour in object based encoding
KR100238890B1 (en) Improved texture motion vector estimating apparatus and method
Li et al. Extended signal-theoretic techniques for very low bit-rate video coding
Xie et al. Layered coding system for very low bitrate videophone
Katsaggelos et al. Exploitation of Spatio-Temporal Inter-Correlation Among Motion, Segmentation and Intensity Fields for Very Low Bit Rate Coding of Video

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
COP Corrected version of pamphlet

Free format text: PAGES 1/28-28/28, DRAWINGS, REPLACED BY NEW PAGES 1/29-29/29; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

NENP Non-entry into the national phase

Ref country code: JP