WO2010025646A1 - Method, system and device of the subtitle matching process - Google Patents

Method, system and device of the subtitle matching process Download PDF

Info

Publication number
WO2010025646A1
WO2010025646A1 PCT/CN2009/073240 CN2009073240W WO2010025646A1 WO 2010025646 A1 WO2010025646 A1 WO 2010025646A1 CN 2009073240 W CN2009073240 W CN 2009073240W WO 2010025646 A1 WO2010025646 A1 WO 2010025646A1
Authority
WO
WIPO (PCT)
Prior art keywords
subtitle
user terminal
video
information
transcoded
Prior art date
Application number
PCT/CN2009/073240
Other languages
French (fr)
Chinese (zh)
Inventor
吴治国
李智斌
赵雷
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2010025646A1 publication Critical patent/WO2010025646A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/278Subtitling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/23614Multiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • H04N21/25825Management of client data involving client display capabilities, e.g. screen resolution of a mobile phone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • H04N21/25833Management of client data involving client hardware characteristics, e.g. manufacturer, processing or storage capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2662Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4348Demultiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data
    • H04N21/4516Management of client data or end-user data involving client characteristics, e.g. Set-Top-Box type, software version or amount of memory available
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6582Data stored in the client, e.g. viewing habits, hardware capabilities, credit card number

Definitions

  • the present invention relates to the field of video communications, and in particular, to a method, system and apparatus for subtitle matching processing.
  • the mobile video service led by mobile TV has gradually begun to be commercialized.
  • the video content is played by the display device of the mobile terminal. Due to the processing capability of the mobile terminal and the limitation of the screen size, the video content provided by the current mobile video service to the user is mostly by MPEG (Moving Pictures Experts Group)-1.
  • the video files of conventional resolutions such as MPEG-2 and MPEG-4 are obtained by ordinary video transcoding and reduced resolution. Most of the video files are embedded with subtitles corresponding to their contents, so that the user can directly Understanding of the content.
  • the inventor has found that for the video content provided to the mobile terminal, since the transcoding and the resolution conversion processing are performed, the size of the subtitle embedded in the video content is also greatly reduced. However, it becomes blurred. The user of the current mobile terminal can hardly and clearly recognize the subtitle information when watching the video, which will affect the user's complete viewing and understanding of the video content, thereby affecting the user experience. Summary of the invention
  • the present invention provides a method, system and device for subtitle matching processing.
  • a method for processing a subtitle matching process includes: receiving a play request from a user terminal; acquiring device capability information of the user terminal according to the play request; The user terminal provides a transcoded caption bitmap group.
  • the embodiment of the present invention further provides a method for creating a subtitle bitmap group, which specifically includes: acquiring a subtitle area from a video image, and acquiring time information and location information corresponding to the subtitle area; The area and the time information and the location information are generated, and the original subtitle bitmap group is generated; and the original subtitle bitmap group is transcoded according to the resolution to form a transcoded subtitle bitmap group.
  • an embodiment of the present invention further provides an apparatus for creating a subtitle, the apparatus comprising: a subtitle area obtaining module: configured to acquire a subtitle area in a video image and location information and time information of the subtitle area, according to the subtitle a region, and location information and time information of the subtitle region, generating an original subtitle bitmap group; and a transcoding module: configured to perform transcoding processing on the original subtitle bitmap group according to a resolution to form a transcoded subtitle bitmap group.
  • an embodiment of the present invention further provides an apparatus for performing subtitle matching processing, where the apparatus includes: a receiving module: receiving a play request from a user terminal, where the play request carries device capability information of the user terminal or saves the user The user agent profile database address information of the device capability information of the terminal; the sending module: configured to provide the user terminal with a transcoded caption bitmap group according to the device capability information of the user terminal.
  • the embodiment of the present invention further provides a system for subtitle matching processing, including: a subtitle creation device, configured to generate a transcoded subtitle bitmap group according to a resolution; and a subtitle matching processing device, configured to receive a play request from the user terminal, The play request carries device capability information of the user terminal, and provides a transcoded caption bitmap group to the terminal according to the device capability information of the terminal.
  • a subtitle creation device configured to generate a transcoded subtitle bitmap group according to a resolution
  • a subtitle matching processing device configured to receive a play request from the user terminal, The play request carries device capability information of the user terminal, and provides a transcoded caption bitmap group to the terminal according to the device capability information of the terminal.
  • the embodiment of the present invention further provides a user terminal, including: a play requesting module: generating a play request, where the play request includes device capability information of the user terminal or address information of a user agent file database; the video play module: Playing video, and transcoding subtitle bitmap group; communication module: for transmitting a play request; receiving video, and transcoding subtitle bitmap group.
  • a play requesting module generating a play request, where the play request includes device capability information of the user terminal or address information of a user agent file database
  • the video play module Playing video, and transcoding subtitle bitmap group
  • communication module for transmitting a play request; receiving video, and transcoding subtitle bitmap group.
  • the present invention can transmit a subtitle bitmap group adapted to the resolution of the screen according to the display device capabilities of different terminals, thereby enabling the user to clearly recognize the subtitle text while watching the video.
  • FIG. 1 is a structural diagram of a caption matching processing system according to an embodiment of the present invention.
  • FIG. 2 is a structural diagram of a subtitle matching processing apparatus according to an embodiment of the present invention.
  • FIG. 3 is a structural diagram of a caption making apparatus according to an embodiment of the present invention.
  • FIG. 4 is a structural diagram of the user terminal according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of a method for creating a caption bitmap according to an embodiment of the present invention
  • FIG. 6 is a flowchart of the method for processing subtitle matching according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a method for transmitting a video and a transcoded subtitle bitmap group according to an embodiment of the present invention
  • FIG. 8 is a flowchart of the method for processing a subtitle matching according to an embodiment of the present invention.
  • FIG. 1 is a system for subtitle matching processing according to an embodiment of the present invention.
  • the system includes: a subtitle matching processing device 101, a subtitle creation device 102, and a user agent archive database 103. among them:
  • a caption matching processing device 101 configured to receive a play request from a user terminal, where the play request carries device capability information of the user terminal, and provide a transcoded caption bitmap group for the terminal according to device capability information of the terminal,
  • the device is further configured to receive a play request for carrying the user agent file database address information from the user terminal, and obtain device capability information of the user terminal according to the user agent file database address information, and further according to the device capability information of the terminal.
  • Subtitle creation apparatus 102 configured to create the subtitle bitmap group according to a resolution, and the apparatus obtains a series of transcoding subtitle bitmap groups of different resolutions corresponding to the video content by processing the original video.
  • the subtitle creation device 102 first determines the coarse position of the subtitle region in the video image, which may be the coordinates in the video image. Information, pixel point number information, etc.
  • the subtitle area is segmented from the video image sequence, and time information and location information corresponding to the subtitle area are simultaneously acquired, for example, a playback start time point and an end time point corresponding to the subtitle area, or the subtitle area corresponds to
  • the video frame start frame number and end frame number, the corresponding position of the subtitle area in the screen may be the coordinate value of the center point of the subtitle area on the screen, and the like.
  • a caption bitmap group is generated based on the caption region acquired as described above, and time information and position information corresponding to the caption region.
  • the original subtitle bitmap group is subjected to resolution transcoding according to the physical size and resolution of different screens of common types of user terminals, and a series of transcoding subtitle bitmap groups of different resolutions are obtained.
  • the captioning device may further perform transcoding processing on the original video according to different resolutions to obtain a series of transcoded videos of different resolutions.
  • the subtitle matching processing system may further include a user agent file database 103.
  • the user agent file database stores device information of various user terminals, and the device information may include: a device name, a type, a screen size, and a screen resolution. Wait.
  • the caption matching processing apparatus 101 of the present invention may further include: a receiving module 1011, a transmitting module 1012, as shown in FIG. 2, wherein:
  • the receiving module 1011 is configured to receive a play request sent by the user terminal, and further analyze, by using the play request, the device capability information of the user terminal, which may be a model, a category, a size of the screen, and a resolution of the screen.
  • the module may also parse the user agent file database address information for storing the user terminal device information in the play request, such as the database URL (Uniform Resource Locator) address.
  • the URL address can be expressed in the form of an IP (Internet Protocol) protocol or domain name.
  • IP Internet Protocol
  • the address information carries device capability information of the user terminal, such as a terminal model.
  • the sending module 1012 is configured to send, according to device capability information of the user terminal obtained by the receiving module 1011, a transcoded video adapted to the device and a transcoded caption bitmap group adapted to the device.
  • the form of the transcoded video and subtitle bitmap group that it sends may be in the form of a file, or a form of a media stream.
  • the captioning device 102 may further include the following modules, as shown in FIG. 3, including: a caption region obtaining module 1021, a transcoding module 1022, where:
  • the subtitle area obtaining module 1021 is configured to detect a subtitle area in the video image, and position information and time information of the subtitle area, and generate a subtitle bitmap group according to the subtitle area, and the position information and the time information of the subtitle area.
  • the subtitle area obtaining module 1021 may further include the following units: a subtitle area coarse detecting unit 1021a, a subtitle area confirming unit 1021b, a subtitle area positioning unit 1021c, a subtitle area tracking unit 1021d, and a subtitle bitmap group. Forming a unit 1021e, as shown in FIG. 3, wherein:
  • Subtitle area coarse detecting unit 1021a This unit is used for coarse detection of the subtitle area in the video image, and determines the rough position and range of the subtitle area in the screen. There are various detection methods, and preferably image texture is adopted. Information is used as the basis for testing.
  • Subtitle area confirming unit 1021b This unit is used to confirm the possible subtitle area detected by the coarse detection, and the confirmed subtitle area is the subtitle area that can be acquired. There are various methods for confirmation, and it is preferable to perform the confirmation of the subtitle area by the method based on the region texture constraint.
  • Subtitle area locating unit 1021c The unit is used for locating the exact position of the confirmed subtitle area, and the subtitle area positioning can be implemented by various methods, such as edge point density or pixel gradation value in the horizontal and vertical directions in the pixel domain.
  • the positioning information is obtained on the projection contour, or the subtitle region is located by using the block texture intensity projection in the compressed domain. After the subtitle area is located, the system will obtain the size information and position information of the subtitle area relative to the original video image, such as the length and width of the subtitle area, the coordinates of the center, and the like.
  • Subtitle area tracking unit 1021d The module is configured to acquire time information of the subtitle area, such as a start time and an end time of the subtitle area playing, or a start time and duration or a start frame number corresponding to the subtitle area.
  • the end frame number can also be the starting frame number and the number of persistent frames.
  • There are various methods for implementing the function of the module such as a projection contour based caption tracking method, or a motion vector in the compressed domain for caption tracking.
  • Subtitle bitmap group forming unit 1021e The start frame or the end frame obtained after the positioning tracking may also be the corresponding background frame for subtitle region segmentation, and the subtitle region is segmented from the original video image.
  • the division of the subtitle area can be performed by a method based on Max or Min of a multi-frame, or by other methods such as division of a subtitle based on a histogram. After the corresponding frame in the original video is divided by the subtitle area, the original subtitle bitmap group is formed.
  • Transcoding module 1022 performing transcoding at different resolutions on the original subtitle bitmap group formed by the subtitle acquisition module Rational, forming a transcoding subtitle bitmap group of different resolutions.
  • the transcoding module 1022 may further include: a subtitle bitmap transcoding unit 1022a, which is compatible with the existing video transcoding function.
  • the video transcoding unit 1022b is as shown in FIG. 3, wherein:
  • Subtitle bitmap transcoding unit 1022a According to different display devices of common types of user terminals, such as the physical size of the screen, the size of the screen resolution, etc., respectively, the original subtitle bitmap group is subjected to resolution transcoding processing to obtain a series of different resolutions. The transcoded caption bitmap group of the rate is used for providing to the user terminal of the different display devices.
  • the video transcoding unit 1022b performs resolution transcoding processing on the original video according to different display devices of the common type user terminal, such as the physical size of the screen, the size of the screen resolution, etc., to obtain a series of transcoding videos of different resolutions. Used for user terminals provided to the above different display devices.
  • the subtitle matching processing system may further include a user agent profile database.
  • the user agent profile database 103 stores device information of various user terminals, and the device information may include: a name, a type, a size of the screen, a resolution of the screen, and the like.
  • the play request sent by the user terminal to the video service module may carry the address information of the user agent profile database 103, such as the URL address of the database, which may be in the form of an IP address or a domain name.
  • the video service module obtains the device information of the user terminal from the user agent profile database 103 according to the address information of the user agent profile database 103, and then sends the transcoded video adapted to the device to the user terminal, and the device thereof.
  • a suitable transcoding subtitle bitmap set may be used in the user agent profile database 103.
  • the caption matching processing system and the caption creating device and the caption matching processing device in the system may be integrated into the existing streaming media server in actual applications, or may separately provide services to the user terminal.
  • the user terminal is configured to send a play request to the video server, where the play request carries the terminal device capability information, where the information may be the screen size of the terminal, or the model, category, and the like of the terminal.
  • the play request sent by the user terminal may not carry the device capability information, but the address information of the user agent file database storing the device capability information, such as the URL address of the user agent file database. , can be expressed in the form of an IP address or a domain name.
  • the user terminal is further configured to receive and play the transcoded video and the transcoded subtitle bitmap group sent by the video service module.
  • the type of the user terminal may be a conventional display terminal such as a television set, a PC display, or the like, or may be a mobile terminal with a display device, such as a mobile TV, a mobile phone with a video playing function, a mobile TV, and the like.
  • the user terminal may further include the following units, as shown in FIG. 4: a play request module 201, a video play module 202, and a communication module 203.
  • the play request module 201 according to the user's needs, such as the video content that the user needs to watch, may be the name of the video or the number of the video, and the capabilities of the own device, which may be the size of the display screen, the model of the device, the processor The processing capability, etc., generates a play request.
  • the video playing module 202 plays the transcoded video delivered by the video service module, and the transcoded subtitle bitmap group. When playing the video, each subtitle bitmap in the subtitle bitmap group appears in the screen according to the time information thereof, and correspondingly The video content is simultaneously played, and the position of the subtitle bitmap in the screen is determined by the position information in the subtitle bitmap, and the position information may be coordinate information, center position information, or the like.
  • Communication module 203 for transmitting a play request; receiving a transcoded video adapted to its device, and a transcoded caption bitmap group adapted to its device.
  • FIG. 5 a specific step of subdividing the subtitle area from the original video and making a series of transcoding subtitle bitmap groups of different resolutions according to different resolutions is as shown in FIG. 5, which includes:
  • Step 501 First, determine a video that needs to be processed
  • Step 502 detecting a range of the subtitle area, and obtaining a rough area of the subtitle area in the video image.
  • the preferred detection method is a subtitle area detection method based on a DCT (Discrete Cosine Transform) coefficient. Other methods that enable sub-area range detection are also possible;
  • Step 503 Confirm the detected subtitle area to obtain a subtitle area that can be obtained.
  • morphological filtering can be used to connect the gap between characters and characters and to eliminate noise.
  • the method of confirming the subtitle area is preferably based on the method of region texture constraint, and other methods for confirming the subtitle area are also
  • Step 504 Position the confirmed subtitle area, obtain position information of the subtitle area, and obtain a size and a position of the subtitle area relative to the original video image.
  • a preferred positioning method is a method of using block texture intensity projection in the compressed domain. Other methods that enable subtitle area positioning are also possible;
  • Step 505 Track the subtitle area that is located, and obtain time information corresponding to the subtitle area.
  • the subtitle playback corresponds to the start time and end time of the video
  • the subtitle playback corresponds to the start frame number and the end frame number of the video.
  • the tracking method used is preferably a method of tracking using a motion vector in a compressed domain, and other methods capable of tracking subtitle regions and determining playback time information corresponding to the subtitle region are also possible;
  • Step 506 The subtitle area obtained in step 503 is separated from the original video image.
  • a preferred method of segmentation is to use a pre-fusion and background method. Other methods for realizing segmentation of subtitle regions are also possible;
  • Step 507 Firstly, all the subtitle regions separated from the original video image, together with their associated time information and location information, first generate a single subtitle unit, and all the subtitle units together constitute the original subtitle bitmap group of the video.
  • the information contained in the single caption unit may include: caption bit map data, caption bit map position information, for example: coordinates of the center of the caption bit map, etc., play time information of the caption bit map, for example, the play corresponding to the caption bit map
  • the start time and the end time may also be the start frame number and the end frame number corresponding to the caption bitmap.
  • This embodiment provides a recommended storage mode for the caption unit. As shown in the following Table 1, it should be noted that the storage mode is not the only way. Other ways in which the information can be stored are also possible. table
  • Step 508 Transcoding the original video according to different resolutions to form a set of transcoding video of different resolutions.
  • Step 509 Perform transcoding processing on the generated original subtitle bitmap group according to different resolutions to form a series of transcoded subtitle bitmap groups of different resolutions. 509 and 508 have no timing relationship.
  • a set of different resolution transcoded videos formed in the above step 508, and a set of different resolution transcoded subtitle bitmaps generated in the above step 509 will be provided to user terminals of different hardware device capabilities.
  • the embodiment of the invention provides a method for subtitle matching processing, which includes the following steps as shown in FIG. 6:
  • Step 601 The mobile terminal initiates a request for playing a streaming video to the streaming server as the user terminal, where the request includes a video name or a video number, and device capability information of the mobile terminal, such as a physical size of the screen and a supported resolution.
  • the request may be sent through a Request message of the HTTP (Hypertext Transfer Protocol) protocol, and the specific format of the message is as follows:
  • Step 602 After receiving the play request of the mobile terminal, the streaming media server selects the most suitable mobile from the transcoded video file group and the subtitle bitmap group according to the physical size and the supported resolution of the mobile terminal screen in the play request.
  • a transcoded video file and a transcoded caption bitmap group of the terminal screen or its resolution and sent to the mobile terminal (in this embodiment, the support screen should have a physical size of 2.8 and a resolution of 240 X 320) Video and subtitle bitmap group). It can be sent in the form of TS (Transport Stream) or RTP (Real-time Transport).
  • the streaming media server directly sends the transcoded video and the transcoded subtitle bitmap data in the TS stream by the following two preferred methods.
  • the method 1 is as shown in Figure 7: Transcoding video data And the audio data and the transcoded subtitle bitmap group are respectively packaged to generate three packetized elementary streams VPES (Video Packetized Elementary Stream), APES (Audio Packetized Elementary Stream) and P CPES (Caption Packetized Elementary Stream), and assigns PID (Package IDentity) to the three PES streams in the PMT (Program Map Table), and VPES, APES and CPES are multiplexed. After the TS stream is generated, it is sent to the mobile terminal by the streaming media server.
  • Method 2 Since the data of the transcoded subtitle bitmap group is less than the video data and the audio data, the part of the data is placed in the adjustment field of the TS packet of the video data, and the adjustment field control position is "11" .
  • the streaming media server directly sends the transcoded video and the transcoded subtitle bitmap group in the RTP stream, and then packages it into a TS stream according to the foregoing method, and then packages it into the RTP according to the RFC2038 RTP Payload Format for MPEG1/MPEG2 Video standard.
  • the packet is sent to the mobile terminal through the RTP/RTSP (Real Time Streaming Protocol) protocol.
  • RTP/RTSP Real Time Streaming Protocol
  • Step 801 A user terminal initiates a request for playing a streaming media video to a video streaming server, where the request includes storing the The user agent archive database address of the capability information of the screen device of the user terminal.
  • the screen device capability information may be a physical size of the screen and a supported resolution, etc., and the request also includes a video name or a video number required by the user.
  • a request sending manner is provided. The method sends a request for playing a streaming media video to a video streaming server through a Request message of the HTTP protocol, where the request includes a user agent that stores capability information of the screen device of the user terminal.
  • the file database address, the specific format of the message is as follows:
  • Profile-rep:http: ⁇ profilerepository.oma.org/Nokia/—N61 is the address information of the user agent file database, and the device information of the mobile terminal (type Nokia-N61) is stored in the address information.
  • Step 802 The video streaming media server obtains user terminal device capability information according to the user agent file database address in the request, and sends an HTTP-based Request message to the user agent file database, requesting to query the mobile terminal.
  • Screen device capabilities, message format is as follows:
  • Step 803 After receiving the request of the video streaming server, the user agent file database feeds back the device capability information of the current user terminal to the video streaming media server, that is, sends the response message to the video streaming media server through an HTTP response message.
  • the format of the message is as follows:
  • Step 804 The video streaming server selects, according to the information fed back by the user agent file server, a transcoded video and a transcoding resolution suitable for the user terminal to play from a series of transcoded video and transcoded caption bitmap groups. Subtitle bitmap group. Corresponding to the content in the format information provided above, in this embodiment, a transcoded video of 240*320 resolution and a transposed subtitle bitmap set corresponding to 2.8 inches and 240*320 resolution should be selected for transmission. The video streaming server transmits the selected video and the subtitle bitmap group to the user terminal in the same manner as the transmission in the step 703.
  • the present invention can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is a better implementation. .
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which is stored in a storage medium and includes a plurality of instructions for making
  • the mobile device (which may be a cell phone, personal computer, media player, etc.) performs the methods described in various embodiments of the present invention.
  • the storage medium referred to herein is, for example, a ROM/RAM, a magnetic disk, an optical disk, or the like.

Abstract

A method, system and device of subtitle matching process are disclosed in the embodiments of the present invention, wherein the method includes the following steps of: receiving a playing request from a user terminal; obtaining the device capability information of the user terminal according to the playing request; providing transcoded subtitle bit map group for the user terminal according to the device capability information. The embodiments of the present invention also disclose a method for making subtitle bit map group, a device for making subtitles, a device of subtitle matching process and a system of subtitle matching process. The present invention can send the subtitle bit map group suitable for the screen resolution of a display device according to the capabilities of display device of various user terminals, so that the user can distinctly recognize the subtitle text when watching the video.

Description

一种字幕匹配处理的方法、 系统和装置  Method, system and device for subtitle matching processing
本申请要求了 2008年 09月 02日提交的, 申请号为 200810141753. 9, 发明名称为 "一 种字幕匹配处理的方法和系统" 的申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域  The present application claims the priority of the application filed on Sep. 2, 2008, the application number of which is incorporated herein by reference. . Technical field
本发明涉及视频通信领域, 特别是一种字幕匹配处理的方法、 系统和装置。  The present invention relates to the field of video communications, and in particular, to a method, system and apparatus for subtitle matching processing.
 Say
背景技术 Background technique
随着移动通信技术的发展, 特别是 3G(3rd Generation, 第三代数字通信)时代的来临, 以手机电视为首的移动视频业务逐渐开始商用。 在高速移动通信技术的支持下, 用户可以 在以手机为代表的移动终端上享受到高质量的视频服书务。 但通过移动终端的显示设备播放 视频内容, 由于移动终端自身处理能力以及屏幕大小的限制, 当前移动视频业务向用户提 供的视频内容多由 MPEG (Moving Pictures Experts Group, 动态图像专家组) -1, MPEG— 2 和 MPEG— 4等格式常规分辨率的视频文件经过普通的视频转码并且降低分辨率后得到,在 上述视频文件中大多都嵌入了和其内容所对应的字幕, 以方便用户对视频内容的理解。  With the development of mobile communication technology, especially the era of 3G (3rd Generation, third generation digital communication), the mobile video service led by mobile TV has gradually begun to be commercialized. With the support of high-speed mobile communication technology, users can enjoy high-quality video service books on mobile terminals represented by mobile phones. However, the video content is played by the display device of the mobile terminal. Due to the processing capability of the mobile terminal and the limitation of the screen size, the video content provided by the current mobile video service to the user is mostly by MPEG (Moving Pictures Experts Group)-1. The video files of conventional resolutions such as MPEG-2 and MPEG-4 are obtained by ordinary video transcoding and reduced resolution. Most of the video files are embedded with subtitles corresponding to their contents, so that the user can directly Understanding of the content.
发明人在实现本发明的过程中发现, 对于向移动终端提供的视频内容来说, 由于其经 过了转码以及降分辨率转换处理, 所以嵌入在视频内容中的字幕尺寸也同时会大幅减小而 变得模糊不清, 当前移动终端的用户在观看视频时几乎无法清楚正确地辨识字幕信息, 这 将影响用户完整观赏和理解视频内容, 从而也影响了用户的体验。 发明内容  In the process of implementing the present invention, the inventor has found that for the video content provided to the mobile terminal, since the transcoding and the resolution conversion processing are performed, the size of the subtitle embedded in the video content is also greatly reduced. However, it becomes blurred. The user of the current mobile terminal can hardly and clearly recognize the subtitle information when watching the video, which will affect the user's complete viewing and understanding of the video content, thereby affecting the user experience. Summary of the invention
本发明实施例为了解决现有技术中视频中字幕变得模糊不清的缺陷, 提供了一种字幕 匹配处理的方法、 系统和装置。  In order to solve the defect that the subtitles in the video in the prior art become blurred, the present invention provides a method, system and device for subtitle matching processing.
本发明实施例提供的一种字幕匹配处理的方法, 具体包括: 接收来自用户终端的播放 请求; 根据所述播放请求获取所述用户终端的设备能力信息; 根据所述设备能力信息, 为 所述用户终端提供转码字幕比特图组。  A method for processing a subtitle matching process according to an embodiment of the present invention includes: receiving a play request from a user terminal; acquiring device capability information of the user terminal according to the play request; The user terminal provides a transcoded caption bitmap group.
此外, 本发明实施例还提供一种字幕比特图组的制作方法, 具体包括: 从视频图象中 获取字幕区域, 以及获取所述字幕区域对应的时间信息和位置信息; 根据所述获取的字幕 区域以及所述时间信息和位置信息, 生成原始字幕比特图组; 根据分辨率对所述原始字幕 比特图组进行转码处理形成转码字幕比特图组。 此外, 本发明实施例还提供一种字幕制作的装置, 该装置包括: 字幕区域获取模块: 用于获取视频图象中的字幕区域以及所述字幕区域的位置信息和时间信息, 根据所述字幕 区域, 以及所述字幕区域的位置信息和时间信息, 生成原始字幕比特图组; 转码模块: 用 于根据分辩率对所述原始字幕比特图组进行转码处理形成转码字幕比特图组。 In addition, the embodiment of the present invention further provides a method for creating a subtitle bitmap group, which specifically includes: acquiring a subtitle area from a video image, and acquiring time information and location information corresponding to the subtitle area; The area and the time information and the location information are generated, and the original subtitle bitmap group is generated; and the original subtitle bitmap group is transcoded according to the resolution to form a transcoded subtitle bitmap group. In addition, an embodiment of the present invention further provides an apparatus for creating a subtitle, the apparatus comprising: a subtitle area obtaining module: configured to acquire a subtitle area in a video image and location information and time information of the subtitle area, according to the subtitle a region, and location information and time information of the subtitle region, generating an original subtitle bitmap group; and a transcoding module: configured to perform transcoding processing on the original subtitle bitmap group according to a resolution to form a transcoded subtitle bitmap group.
此外, 本发明实施例还提供一种字幕匹配处理的装置, 该装置包括: 接收模块: 接收 来自用户终端的播放请求, 所述播放请求携带所述用户终端的设备能力信息或保存有所述 用户终端的设备能力信息的用户代理档案数据库地址信息; 发送模块: 用于根据所述用户 终端的设备能力信息, 为所述用户终端提供转码字幕比特图组。  In addition, an embodiment of the present invention further provides an apparatus for performing subtitle matching processing, where the apparatus includes: a receiving module: receiving a play request from a user terminal, where the play request carries device capability information of the user terminal or saves the user The user agent profile database address information of the device capability information of the terminal; the sending module: configured to provide the user terminal with a transcoded caption bitmap group according to the device capability information of the user terminal.
此外, 本发明实施例还提供一种字幕匹配处理的系统, 包括: 字幕制作装置, 用于根 据分辨率制作转码字幕比特图组; 字幕匹配处理装置, 用于接收来自用户终端的播放请求, 所述播放请求中携带所述用户终端的设备能力信息, 并根据所述终端的设备能力信息为终 端提供转码字幕比特图组。  In addition, the embodiment of the present invention further provides a system for subtitle matching processing, including: a subtitle creation device, configured to generate a transcoded subtitle bitmap group according to a resolution; and a subtitle matching processing device, configured to receive a play request from the user terminal, The play request carries device capability information of the user terminal, and provides a transcoded caption bitmap group to the terminal according to the device capability information of the terminal.
此外, 本发明实施例还提供一种用户终端, 包括: 播放请求模块: 生成播放请求, 所 述播放请求中包含所述用户终端的设备能力信息或用户代理档案数据库的地址信息; 视频 播放模块: 播放视频, 以及转码字幕比特图组; 通信模块: 用于发送播放请求; 接收视频, 以及转码字幕比特图组。  In addition, the embodiment of the present invention further provides a user terminal, including: a play requesting module: generating a play request, where the play request includes device capability information of the user terminal or address information of a user agent file database; the video play module: Playing video, and transcoding subtitle bitmap group; communication module: for transmitting a play request; receiving video, and transcoding subtitle bitmap group.
由以上实施例可以看出, 本发明能根据不同终端的显示设备能力, 发送与其屏幕分辨 率相适应的字幕比特图组, 从而使用户在观看视频时清楚识别字幕文字。 附图说明  As can be seen from the above embodiments, the present invention can transmit a subtitle bitmap group adapted to the resolution of the screen according to the display device capabilities of different terminals, thereby enabling the user to clearly recognize the subtitle text while watching the video. DRAWINGS
图 1是本发明实施例提供的字幕匹配处理系统结构图;  1 is a structural diagram of a caption matching processing system according to an embodiment of the present invention;
图 2是本发明实施例提供的字幕匹配处理装置结构图;  2 is a structural diagram of a subtitle matching processing apparatus according to an embodiment of the present invention;
图 3是本发明实施例提供的字幕制作装置结构图;  3 is a structural diagram of a caption making apparatus according to an embodiment of the present invention;
图 4是本发明实施例提供的所述用户终端的结构图;  4 is a structural diagram of the user terminal according to an embodiment of the present invention;
图 5是本发明实施例提供的所述字幕比特图制作方法流程图;  FIG. 5 is a flowchart of a method for creating a caption bitmap according to an embodiment of the present invention;
图 6是本发明实施例提供的所述字幕匹配处理方法流程图;  FIG. 6 is a flowchart of the method for processing subtitle matching according to an embodiment of the present invention;
图 7是本发明实施例提供的所述视频以及转码字幕比特图组的一种发送方法的示意图; 图 8是本发明实施例提供的所述字幕匹配处理方法流程图。 具体实施方式  FIG. 7 is a schematic diagram of a method for transmitting a video and a transcoded subtitle bitmap group according to an embodiment of the present invention; FIG. 8 is a flowchart of the method for processing a subtitle matching according to an embodiment of the present invention. detailed description
为了使本技术领域的人员更好地理解本发明, 下面结合附图对本发明作进一步的详细 说明。 In order to make the present invention better understood by those skilled in the art, the present invention will be further described in detail below with reference to the accompanying drawings. Description.
图 1 为本发明实施例提供的一种字幕匹配处理的系统, 该系统包括: 字幕匹配处理装 置 101, 字幕制作装置 102, 用户代理档案数据库 103。 其中:  FIG. 1 is a system for subtitle matching processing according to an embodiment of the present invention. The system includes: a subtitle matching processing device 101, a subtitle creation device 102, and a user agent archive database 103. among them:
字幕匹配处理装置 101, 用于接收来自用户终端的播放请求, 所述播放请求中携带所述 用户终端的设备能力信息, 并根据所述终端的设备能力信息为终端提供转码字幕比特图组, 该装置还用于接收来自用户终端的携带用户代理档案数据库地址信息的播放请求, 并根据 所述用户代理档案数据库地址信息获取所述用户终端的设备能力信息, 进一步根据所述终 端的设备能力信息为终端提供转码字幕比特图组。  a caption matching processing device 101, configured to receive a play request from a user terminal, where the play request carries device capability information of the user terminal, and provide a transcoded caption bitmap group for the terminal according to device capability information of the terminal, The device is further configured to receive a play request for carrying the user agent file database address information from the user terminal, and obtain device capability information of the user terminal according to the user agent file database address information, and further according to the device capability information of the terminal. Provide a transcoded caption bitmap group for the terminal.
字幕制作装置 102: 用于根据分辨率制作所述字幕比特图组, 该装置通过对原始视频的 处理得到一系列与视频内容相对应的不同分辨率的转码字幕比特图组。 为了提供合适的字 幕比特图组给不同类型, 不同显示设备以及屏幕大小的用户终端, 字幕制作装置 102 首先 要确定字幕区域在视频图象中的粗略位置, 该位置可以是视频图象中的坐标信息, 象素点 编号信息等。 此后, 从视频图象序列中分割出字幕区域, 并同时获取和该字幕区域相对应 的时间信息和位置信息, 例如该字幕区域对应的播放起始时间点和结束时间点, 或该字幕 区域对应的视频图象起始帧号和结束帧号, 该字幕区域在屏幕中对应的位置, 可以是该字 幕区域中心点在屏幕的坐标值等。 根据上述获取的字幕区域, 以及与所述字幕区域相对应 的时间信息和位置信息, 生成字幕比特图组。 最后根据常见类型用户终端的不同屏幕的物 理尺寸和分辨率对原始字幕比特图组进行分辨率转码处理, 得到一系列不同分辨率的转码 字幕比特图组。 该字幕制作装置还可以进一步根据不同分辨率对原始视频进行转码处理, 得到一系列不同分辨率的转码视频。  Subtitle creation apparatus 102: configured to create the subtitle bitmap group according to a resolution, and the apparatus obtains a series of transcoding subtitle bitmap groups of different resolutions corresponding to the video content by processing the original video. In order to provide a suitable subtitle bitmap group for different types, different display devices and screen size user terminals, the subtitle creation device 102 first determines the coarse position of the subtitle region in the video image, which may be the coordinates in the video image. Information, pixel point number information, etc. Thereafter, the subtitle area is segmented from the video image sequence, and time information and location information corresponding to the subtitle area are simultaneously acquired, for example, a playback start time point and an end time point corresponding to the subtitle area, or the subtitle area corresponds to The video frame start frame number and end frame number, the corresponding position of the subtitle area in the screen, may be the coordinate value of the center point of the subtitle area on the screen, and the like. A caption bitmap group is generated based on the caption region acquired as described above, and time information and position information corresponding to the caption region. Finally, the original subtitle bitmap group is subjected to resolution transcoding according to the physical size and resolution of different screens of common types of user terminals, and a series of transcoding subtitle bitmap groups of different resolutions are obtained. The captioning device may further perform transcoding processing on the original video according to different resolutions to obtain a series of transcoded videos of different resolutions.
字幕匹配处理系统中还可以包括用户代理档案数据库 103,用户代理档案数据库中存储 有各种用户终端的设备信息, 所述设备信息可以包括: 设备的名称, 类型, 屏幕的大小, 屏幕的分辨率等。  The subtitle matching processing system may further include a user agent file database 103. The user agent file database stores device information of various user terminals, and the device information may include: a device name, a type, a screen size, and a screen resolution. Wait.
在本发明的另一个实施例中, 本发明字幕匹配处理装置 101 可以进一步包括: 接收模 块 1011, 发送模块 1012, 如图 2所示, 其中:  In another embodiment of the present invention, the caption matching processing apparatus 101 of the present invention may further include: a receiving module 1011, a transmitting module 1012, as shown in FIG. 2, wherein:
接收模块 1011 : 用于接收用户终端发来的播放请求, 还可以从所述播放请求中分析出 用户终端自身的设备能力信息, 可以是设备的型号, 类别, 屏幕的尺寸大小, 屏幕的分辨 率等; 在本发明的一个实施例中, 该模块也可以解析出播放请求中存储用户终端设备信息 的用户代理档案数据库地址信息, 如上述数据库 URL (Uniform Resource Locator, 统一资 源定位器) 地址, 所述 URL地址可以用 IP (Internet Protocol, 网络之间互连的协议) 地址 或域名等形式表示。 并进一步从所述用户代理档案数据库获取所述用户终端的设备能力信 息。 该地址信息中携带用户终端的设备能力信息如终端型号。 The receiving module 1011 is configured to receive a play request sent by the user terminal, and further analyze, by using the play request, the device capability information of the user terminal, which may be a model, a category, a size of the screen, and a resolution of the screen. In an embodiment of the present invention, the module may also parse the user agent file database address information for storing the user terminal device information in the play request, such as the database URL (Uniform Resource Locator) address. The URL address can be expressed in the form of an IP (Internet Protocol) protocol or domain name. And further acquiring the device capability letter of the user terminal from the user agent archive database Interest. The address information carries device capability information of the user terminal, such as a terminal model.
发送模块 1012: 用于根据由接收模块 1011得到的用户终端的设备能力信息, 向所述用 户终端下发与其设备相适应的转码视频, 以及与其设备相适应的转码字幕比特图组。 其发 送的转码视频和字幕比特图组的形式可以是文件形式, 或媒体流形式等。  The sending module 1012 is configured to send, according to device capability information of the user terminal obtained by the receiving module 1011, a transcoded video adapted to the device and a transcoded caption bitmap group adapted to the device. The form of the transcoded video and subtitle bitmap group that it sends may be in the form of a file, or a form of a media stream.
在本发明的一个实施例中, 字幕制作装置 102可以进一步包括以下模块, 如图 3所示, 其中包括: 字幕区域获取模块 1021, 转码模块 1022, 其中:  In one embodiment of the present invention, the captioning device 102 may further include the following modules, as shown in FIG. 3, including: a caption region obtaining module 1021, a transcoding module 1022, where:
字幕区域获取模块 1021 : 用于检测出视频图象中的字幕区域, 以及字幕区域的位置信 息和时间信息, 根据所述字幕区域, 以及字幕区域的位置信息和时间信息, 生成字幕比特 图组。 在本发明的另一个实施例中, 字幕区域获取模块 1021可以进一步包括以下单元: 字 幕区域粗检测单元 1021a, 字幕区域确认单元 1021b, 字幕区域定位单元 1021c, 字幕区域 跟踪单元 1021d, 字幕比特图组形成单元 1021e, 如图 3所示, 其中:  The subtitle area obtaining module 1021 is configured to detect a subtitle area in the video image, and position information and time information of the subtitle area, and generate a subtitle bitmap group according to the subtitle area, and the position information and the time information of the subtitle area. In another embodiment of the present invention, the subtitle area obtaining module 1021 may further include the following units: a subtitle area coarse detecting unit 1021a, a subtitle area confirming unit 1021b, a subtitle area positioning unit 1021c, a subtitle area tracking unit 1021d, and a subtitle bitmap group. Forming a unit 1021e, as shown in FIG. 3, wherein:
字幕区域粗检测单元 1021a: 该单元用于对视频图象中的字幕区域做粗检测, 确定字幕 区域在屏幕中的粗略位置和范围, 其检测方法有多种, 较佳的为采用图象纹理信息做为检 测的依据。  Subtitle area coarse detecting unit 1021a: This unit is used for coarse detection of the subtitle area in the video image, and determines the rough position and range of the subtitle area in the screen. There are various detection methods, and preferably image texture is adopted. Information is used as the basis for testing.
字幕区域确认单元 1021b: 该单元用于对经粗检检测出的可能的字幕区域进行确认, 确 认后的字幕区域即为可获取的字幕区域。 其确认的方法有多种, 较佳的为采用基于区域纹 理约束的方法进行字幕区域的确认。  Subtitle area confirming unit 1021b: This unit is used to confirm the possible subtitle area detected by the coarse detection, and the confirmed subtitle area is the subtitle area that can be acquired. There are various methods for confirmation, and it is preferable to perform the confirmation of the subtitle area by the method based on the region texture constraint.
字幕区域定位单元 1021c: 该单元用于对经确认的字幕区域的准确位置进行定位, 字幕 区域定位可以通过多种方法实现, 例如在像素域中水平和垂直方向的边缘点密度或者像素 灰度值的投影轮廓上获得定位信息, 或利用压缩域中块纹理强度投影的方法定位字幕区域。 经字幕区域定位后系统将得到字幕区域相对于原视频图像的尺寸信息和位置信息, 例如字 幕区域的长度和宽度, 中心的坐标等。  Subtitle area locating unit 1021c: The unit is used for locating the exact position of the confirmed subtitle area, and the subtitle area positioning can be implemented by various methods, such as edge point density or pixel gradation value in the horizontal and vertical directions in the pixel domain. The positioning information is obtained on the projection contour, or the subtitle region is located by using the block texture intensity projection in the compressed domain. After the subtitle area is located, the system will obtain the size information and position information of the subtitle area relative to the original video image, such as the length and width of the subtitle area, the coordinates of the center, and the like.
字幕区域跟踪单元 1021d: 该模块用于获取字幕区域的时间信息, 如该字幕区域播放的 起始时间和结束时间, 也可以是起始时间和持续时间或该字幕区域对应的起始帧号和结束 帧号, 也可以是起始帧号和持续帧的数目。 实现该模块功能的方法有多种如基于投影轮廓 的字幕跟踪方法, 或采用压缩域中的运动矢量来进行字幕跟踪的方法。  Subtitle area tracking unit 1021d: The module is configured to acquire time information of the subtitle area, such as a start time and an end time of the subtitle area playing, or a start time and duration or a start frame number corresponding to the subtitle area. The end frame number can also be the starting frame number and the number of persistent frames. There are various methods for implementing the function of the module, such as a projection contour based caption tracking method, or a motion vector in the compressed domain for caption tracking.
字幕比特图组形成单元 1021e:对定位跟踪后所得出的起始帧或终止帧也可以是其相应 的背景帧进行字幕区域分割, 把字幕区域从原始的视频图像中分割出来。 字幕区域的分割 可以采用基于多帧的 Max或者 Min的方法, 也可以采用基于直方图的字幕的分割等其它方 法实现。 原始的视频中相应的帧经过字幕区域分割后, 形成了原始字幕比特图组。  Subtitle bitmap group forming unit 1021e: The start frame or the end frame obtained after the positioning tracking may also be the corresponding background frame for subtitle region segmentation, and the subtitle region is segmented from the original video image. The division of the subtitle area can be performed by a method based on Max or Min of a multi-frame, or by other methods such as division of a subtitle based on a histogram. After the corresponding frame in the original video is divided by the subtitle area, the original subtitle bitmap group is formed.
转码模块 1022: 对由字幕获取模块形成的原始字幕比特图组进行不同分辨率的转码处 理, 形成不同分辨率的转码字幕比特图组。 还可以兼容现有的视频转码的功能, 对原始视 频生成一系列不同分辨率的视频, 在本发明的另一个实施例中, 转码模块 1022可以进一步 包括: 字幕比特图转码单元 1022a, 视频转码单元 1022b, 如图 3所示, 其中: Transcoding module 1022: performing transcoding at different resolutions on the original subtitle bitmap group formed by the subtitle acquisition module Rational, forming a transcoding subtitle bitmap group of different resolutions. In another embodiment of the present invention, the transcoding module 1022 may further include: a subtitle bitmap transcoding unit 1022a, which is compatible with the existing video transcoding function. The video transcoding unit 1022b is as shown in FIG. 3, wherein:
字幕比特图转码单元 1022a: 根据常见类型用户终端的不同显示设备, 如屏幕的物理尺 寸, 屏幕分辨率的大小等, 分别对原始字幕比特图组进行分辨率转码处理, 得到一系列不 同分辨率的转码字幕比特图组, 用于提供给上述不同显示设备的用户终端使用。  Subtitle bitmap transcoding unit 1022a: According to different display devices of common types of user terminals, such as the physical size of the screen, the size of the screen resolution, etc., respectively, the original subtitle bitmap group is subjected to resolution transcoding processing to obtain a series of different resolutions. The transcoded caption bitmap group of the rate is used for providing to the user terminal of the different display devices.
视频转码单元 1022b: 根据常见类型用户终端的不同显示设备, 如屏幕的物理尺寸, 屏 幕分辨率的大小等, 分别对原始视频进行分辨率转码处理, 得到一系列不同分辨率的转码 视频, 用于提供给上述不同显示设备的用户终端使用。  The video transcoding unit 1022b: performs resolution transcoding processing on the original video according to different display devices of the common type user terminal, such as the physical size of the screen, the size of the screen resolution, etc., to obtain a series of transcoding videos of different resolutions. Used for user terminals provided to the above different display devices.
在本发明的另一个实施例中, 字幕匹配处理系统中还可以包括用户代理档案数据库 In another embodiment of the present invention, the subtitle matching processing system may further include a user agent profile database.
103,用户代理档案数据库 103中存储有各种用户终端的设备信息,所述设备信息可以包括: 设备的名称, 类型, 屏幕的大小, 屏幕的分辨率等。 在带有用户代理档案数据库 103 的系 统中, 用户终端向视频服务模块发送的播放请求中可以携带用户代理档案数据库 103 的地 址信息, 如所述数据库的 URL地址, 可以用 IP地址或域名等形式表示, 视频服务模块根据 所述用户代理档案数据库 103 的地址信息, 从用户代理档案数据库 103 中获取该用户终端 的设备信息, 进而向用户终端下发与其设备相适应的转码视频, 以及与其设备相适应的转 码字幕比特图组。 103. The user agent profile database 103 stores device information of various user terminals, and the device information may include: a name, a type, a size of the screen, a resolution of the screen, and the like. In the system with the user agent archive database 103, the play request sent by the user terminal to the video service module may carry the address information of the user agent profile database 103, such as the URL address of the database, which may be in the form of an IP address or a domain name. The video service module obtains the device information of the user terminal from the user agent profile database 103 according to the address information of the user agent profile database 103, and then sends the transcoded video adapted to the device to the user terminal, and the device thereof. A suitable transcoding subtitle bitmap set.
以上实施例中介绍字幕匹配处理系统以及系统中的字幕制作装置和字幕匹配处理装置 在实际的应用中可以集成在现有的流媒体服务器中, 也可以分别独立存在向用户终端提供 服务。 所述用户终端用于向视频服务器发出播放请求, 播放请求中将携带终端设备能力信 息, 该信息可以是该终端屏幕尺寸, 也可以是该终端的型号, 类别等。 在本发明的另外一 个实施例中, 用户终端发出的播放请求中也可以不携带自身设备能力信息, 而携带存储自 身设备能力信息的用户代理档案数据库的地址信息,例如用户代理档案数据库的 URL地址, 可以以 IP地址或域名的形式表示。 用户终端还用于接收并播放视频服务模块发送来的转码 视频和转码字幕比特图组。 用户终端的类型可以是传统的显示终端, 如电视机, PC显示器 等, 也可以是带有显示设备的移动终端, 如移动 TV, 带有视频播放功能的手机, 手机电视 等。 在本发明的另一个实施例中用户终端可以进一步包括以下单元, 如图 4所示: 播放请 求模块 201, 视频播放模块 202, 通信模块 203。  In the above embodiment, the caption matching processing system and the caption creating device and the caption matching processing device in the system may be integrated into the existing streaming media server in actual applications, or may separately provide services to the user terminal. The user terminal is configured to send a play request to the video server, where the play request carries the terminal device capability information, where the information may be the screen size of the terminal, or the model, category, and the like of the terminal. In another embodiment of the present invention, the play request sent by the user terminal may not carry the device capability information, but the address information of the user agent file database storing the device capability information, such as the URL address of the user agent file database. , can be expressed in the form of an IP address or a domain name. The user terminal is further configured to receive and play the transcoded video and the transcoded subtitle bitmap group sent by the video service module. The type of the user terminal may be a conventional display terminal such as a television set, a PC display, or the like, or may be a mobile terminal with a display device, such as a mobile TV, a mobile phone with a video playing function, a mobile TV, and the like. In another embodiment of the present invention, the user terminal may further include the following units, as shown in FIG. 4: a play request module 201, a video play module 202, and a communication module 203.
播放请求模块 201 : 根据用户的需求, 如用户所需要观看的视频内容, 可以是视频的名 称或视频的编号等, 以及自身设备的能力, 可以是显示屏幕的大小尺寸, 设备的型号, 处 理器的处理能力等, 生成播放请求。 视频播放模块 202: 播放视频服务模块下发的转码视频, 以及转码字幕比特图组, 在播 放视频时字幕比特图组中各个字幕比特图按其中的时间信息先后出现在屏幕中, 和相应的 视频内容同时播放, 字幕比特图在屏幕中的位置由字幕比特图中的位置信息来确定, 所述 位置信息可以是坐标信息, 中心位置信息等。 The play request module 201: according to the user's needs, such as the video content that the user needs to watch, may be the name of the video or the number of the video, and the capabilities of the own device, which may be the size of the display screen, the model of the device, the processor The processing capability, etc., generates a play request. The video playing module 202: plays the transcoded video delivered by the video service module, and the transcoded subtitle bitmap group. When playing the video, each subtitle bitmap in the subtitle bitmap group appears in the screen according to the time information thereof, and correspondingly The video content is simultaneously played, and the position of the subtitle bitmap in the screen is determined by the position information in the subtitle bitmap, and the position information may be coordinate information, center position information, or the like.
通信模块 203 : 用于播放请求的发送; 接收与其设备相适应的转码视频, 以及与其设备 相适应的转码字幕比特图组。  Communication module 203: for transmitting a play request; receiving a transcoded video adapted to its device, and a transcoded caption bitmap group adapted to its device.
在本发明的一个实施例中, 将字幕区域从原始视频中分割出来, 根据不同分辨率制作 一系列不同分辨率的转码字幕比特图组的具体步骤如图 5所示, 其中包括:  In an embodiment of the present invention, a specific step of subdividing the subtitle area from the original video and making a series of transcoding subtitle bitmap groups of different resolutions according to different resolutions is as shown in FIG. 5, which includes:
步骤 501 : 首先确定需要进行处理的视频;  Step 501: First, determine a video that needs to be processed;
步骤 502: 对字幕区域的范围进行进行检测, 获取字幕区域在视频图象中的粗略区域, 较佳的检测方法为基于 DCT (Discrete Cosine Transform, 离散余弦变换) 系数的字幕区域 检测方法, 此外, 其他能够实现字幕区域范围检测的方法也是可以的;  Step 502: detecting a range of the subtitle area, and obtaining a rough area of the subtitle area in the video image. The preferred detection method is a subtitle area detection method based on a DCT (Discrete Cosine Transform) coefficient. Other methods that enable sub-area range detection are also possible;
步骤 503 : 对检测出来的字幕区域进行确认, 得到可获取的字幕区域。 在本步骤中可以 采用形态学滤波的方法来连接字符与字符之间的间隙以及消除噪声, 确认字幕区域的方法 较佳的为采用基于区域纹理约束等方法, 其他能够实现字幕区域确认的方法也是可以的; 步骤 504: 对经过确认的字幕区域进行定位, 获取字幕区域的位置信息, 得到字幕区域 相对于原视频图像的尺寸和位置。 较佳的定位方法为采用压缩域中块纹理强度投影的方法。 其他能够实现字幕区域定位的方法也是可以的;  Step 503: Confirm the detected subtitle area to obtain a subtitle area that can be obtained. In this step, morphological filtering can be used to connect the gap between characters and characters and to eliminate noise. The method of confirming the subtitle area is preferably based on the method of region texture constraint, and other methods for confirming the subtitle area are also Step 504: Position the confirmed subtitle area, obtain position information of the subtitle area, and obtain a size and a position of the subtitle area relative to the original video image. A preferred positioning method is a method of using block texture intensity projection in the compressed domain. Other methods that enable subtitle area positioning are also possible;
步骤 505 : 对经过定位的字幕区域进行跟踪, 获取字幕区域所对应的时间信息, 例如字 幕播放对应于视频的起始时间, 结束时间, 字幕播放对应于视频的起始帧号和结束帧号; 所用的跟踪方法较佳的为采用压缩域中的运动矢量进行跟踪的方法, 其他能够用于字幕区 域跟踪, 确定字幕区域所对应的播放时间信息的方法也是可以的;  Step 505: Track the subtitle area that is located, and obtain time information corresponding to the subtitle area. For example, the subtitle playback corresponds to the start time and end time of the video, and the subtitle playback corresponds to the start frame number and the end frame number of the video. The tracking method used is preferably a method of tracking using a motion vector in a compressed domain, and other methods capable of tracking subtitle regions and determining playback time information corresponding to the subtitle region are also possible;
步骤 506: 对步骤 503中获得的字幕区域从原始的视频图象中分离出来。其较佳的分割 方法为采用融合前、 背景的方法。 其他可以实现字幕区域分割的方法也是可以的;  Step 506: The subtitle area obtained in step 503 is separated from the original video image. A preferred method of segmentation is to use a pre-fusion and background method. Other methods for realizing segmentation of subtitle regions are also possible;
步骤 507:将从原始视频图象中分离出来的所有字幕区域连同其相关的时间信息和位置 信息首先分别生成单个字幕单元, 所有的字幕单元一起构成了该视频的原始字幕比特图组。 其中单个字幕单元所包含的信息可以包括: 字幕比特图数据, 字幕比特图位置信息, 例如: 字幕比特图中心的坐标等, 字幕比特图的播放时间信息, 例如该字幕比特图所对应的播放 起始时间和结束时间, 也可以是该字幕比特图所对应的起始帧号和结束帧号。 本实施例提 供了一种字幕单元的建议存储方式, 如下表一所示, 需要说明的是该存储方式并不是唯一 的方式, 其他可以实现所述信息存储的方式也是可以的。 表 Step 507: Firstly, all the subtitle regions separated from the original video image, together with their associated time information and location information, first generate a single subtitle unit, and all the subtitle units together constitute the original subtitle bitmap group of the video. The information contained in the single caption unit may include: caption bit map data, caption bit map position information, for example: coordinates of the center of the caption bit map, etc., play time information of the caption bit map, for example, the play corresponding to the caption bit map The start time and the end time may also be the start frame number and the end frame number corresponding to the caption bitmap. This embodiment provides a recommended storage mode for the caption unit. As shown in the following Table 1, it should be noted that the storage mode is not the only way. Other ways in which the information can be stored are also possible. table
Figure imgf000009_0001
Figure imgf000009_0001
步骤 508: 对原始的视频根据不同分辨率进行转码处理, 形成一组不同分辨率的转码视 频。  Step 508: Transcoding the original video according to different resolutions to form a set of transcoding video of different resolutions.
步骤 509: 对生成的原始字幕比特图组根据不同分辨率进行转码处理, 形成一系列不同 分辨率的转码字幕比特图组。 509与 508没有时序关系。  Step 509: Perform transcoding processing on the generated original subtitle bitmap group according to different resolutions to form a series of transcoded subtitle bitmap groups of different resolutions. 509 and 508 have no timing relationship.
上述步骤 508中形成的一组不同分辨率的转码视频, 以及上述步骤 509中生成的一组 不同分辨率的转码字幕比特图组将提供给不同硬件设备能力的用户终端使用。  A set of different resolution transcoded videos formed in the above step 508, and a set of different resolution transcoded subtitle bitmaps generated in the above step 509 will be provided to user terminals of different hardware device capabilities.
本发明实施例提供了一种字幕匹配处理的方法, 如图 6所示包括如下步骤:  The embodiment of the invention provides a method for subtitle matching processing, which includes the following steps as shown in FIG. 6:
步骤 601 : 以移动终端做为用户终端向流媒体服务器发起播放流媒体视频的请求, 请求 中包含视频名称或视频编号, 以及该移动终端的设备能力信息, 如屏幕的物理尺寸和支持 的分辨率等所述请求可以通过 HTTP (Hypertext Transfer Protocol, 超文本传输协议) 协议 的 Request消息发送, 其消息的具体格式如下:  Step 601: The mobile terminal initiates a request for playing a streaming video to the streaming server as the user terminal, where the request includes a video name or a video number, and device capability information of the mobile terminal, such as a physical size of the screen and a supported resolution. The request may be sent through a Request message of the HTTP (Hypertext Transfer Protocol) protocol, and the specific format of the message is as follows:
GET /pub/mobile/discovery.ts HTTP/1.1  GET /pub/mobile/discovery.ts HTTP/1.1
Host: stream.ifeng.com  Host: stream.ifeng.com
Accept: */*  Accept: */*
Profile-dev: "physical size = 2.8, resolution = 240—320"  Profile-dev: "physical size = 2.8, resolution = 240-320"
以上内容中: Profile-dev: "physical size = 2.8, resolution = 240_320"为一种具体的设备 能力信息, 表示其终端屏幕的物理尺寸为 2.8, 支持的分辨率为 240 X 320。  In the above: Profile-dev: "physical size = 2.8, resolution = 240_320" is a specific device capability information, indicating that its terminal screen has a physical size of 2.8 and a supported resolution of 240 X 320.
步骤 602: 流媒体服务器接到移动终端的播放请求后, 根据播放请求中该移动终端屏幕 的物理尺寸和支持的分辨率分别从转码视频文件组和字幕比特图组中选择出最合适该移动 终端屏幕或其分辨率的转码视频文件和转码字幕比特图组, 并发送给所述移动终端 (本实 施例中应该发送支持屏幕物理尺寸为 2.8, 支持分辨率为 240 X 320的转码视频和字幕比特 图组)。其发送的方式可以是 TS (Transport Stream,传送流)流或者 RTP (Real-time Transport Protocol, 实时传送协议) 流等, 流媒体服务器直接以 TS流下发转码视频和转码字幕比特 图组数据可以通过以下两种较佳方法实现, 方法一如图 7所示: 转码视频数据和音频数据 以及转码字幕比特图组分别经过打包处理后生成三个打包的基本流 VPES (Video Packetized Elementary Stream,视频分组基本流), APES (Audio Packetized Elementary Stream, 音频分组 基本流) 禾 P CPES ( Caption Packetized Elementary Stream , 字幕分组基本流), 并在 PMT(Program Map Table,节目映射表)中分别为这三个 PES流分配 PID (Package IDentity, 包标识), VPES, APES和 CPES经复用后生成 TS流后由流媒体服务器下发给移动终端。 方 法二: 由于转码字幕比特图组的数据相对于视频数据和音频数据而言较少, 将该部分数据 放到视频数据的 TS包的调整字段中, 并将调整字段控制位置为 " 11 "。 Step 602: After receiving the play request of the mobile terminal, the streaming media server selects the most suitable mobile from the transcoded video file group and the subtitle bitmap group according to the physical size and the supported resolution of the mobile terminal screen in the play request. a transcoded video file and a transcoded caption bitmap group of the terminal screen or its resolution, and sent to the mobile terminal (in this embodiment, the support screen should have a physical size of 2.8 and a resolution of 240 X 320) Video and subtitle bitmap group). It can be sent in the form of TS (Transport Stream) or RTP (Real-time Transport). Protocol, real-time transport protocol), etc., the streaming media server directly sends the transcoded video and the transcoded subtitle bitmap data in the TS stream by the following two preferred methods. The method 1 is as shown in Figure 7: Transcoding video data And the audio data and the transcoded subtitle bitmap group are respectively packaged to generate three packetized elementary streams VPES (Video Packetized Elementary Stream), APES (Audio Packetized Elementary Stream) and P CPES (Caption Packetized Elementary Stream), and assigns PID (Package IDentity) to the three PES streams in the PMT (Program Map Table), and VPES, APES and CPES are multiplexed. After the TS stream is generated, it is sent to the mobile terminal by the streaming media server. Method 2: Since the data of the transcoded subtitle bitmap group is less than the video data and the audio data, the part of the data is placed in the adjustment field of the TS packet of the video data, and the adjustment field control position is "11" .
流媒体服务器直接以 RTP流下发转码视频和转码字幕比特图组, 则先按照前述方法将 其打包成 TS流,再按照 RFC2038 RTP Payload Format for MPEG1/MPEG2 Video标准的规定 将其打包到 RTP包中, 并通过 RTP/RTSP (Real Time Streaming Protocol, 实时流传输协议) 协议下发给移动终端。  The streaming media server directly sends the transcoded video and the transcoded subtitle bitmap group in the RTP stream, and then packages it into a TS stream according to the foregoing method, and then packages it into the RTP according to the RFC2038 RTP Payload Format for MPEG1/MPEG2 Video standard. The packet is sent to the mobile terminal through the RTP/RTSP (Real Time Streaming Protocol) protocol.
本发明实施例提供的另外一种字幕匹配处理的方法如图 8所示, 包括如下步骤, 其中: 步骤 801 : 用户终端向视频流媒体服务器发起播放流媒体视频的请求, 请求中包含有存 储该用户终端的屏幕设备的能力信息的用户代理档案数据库地址。 所述屏幕设备能力信息 可以是屏幕的物理尺寸和支持的分辨率等, 请求中还包括用户所需要的视频名称或视频编 号等。本实施例中提供一种请求的发送方式, 该方式通过 HTTP协议的 Request消息向视频 流媒体服务器发起播放流媒体视频的请求, 请求中包含有储存该用户终端的屏幕设备的能 力信息的用户代理档案数据库地址, 其消息的具体格式如下:  As shown in FIG. 8 , another method for performing subtitle matching processing according to an embodiment of the present invention includes the following steps, where: Step 801: A user terminal initiates a request for playing a streaming media video to a video streaming server, where the request includes storing the The user agent archive database address of the capability information of the screen device of the user terminal. The screen device capability information may be a physical size of the screen and a supported resolution, etc., and the request also includes a video name or a video number required by the user. In this embodiment, a request sending manner is provided. The method sends a request for playing a streaming media video to a video streaming server through a Request message of the HTTP protocol, where the request includes a user agent that stores capability information of the screen device of the user terminal. The file database address, the specific format of the message is as follows:
GET /pub/mobile/discovery.ts HTTP/1.1  GET /pub/mobile/discovery.ts HTTP/1.1
Host: stream.ifeng.com  Host: stream.ifeng.com
Accept: */*  Accept: */*
Profile-rep: http://profilerepository. oma. org/Nokia/N61  Profile-rep: http://profilerepository. oma. org/Nokia/N61
上述内容中 Profile-rep:http:〃 profilerepository.oma.org/Nokia/—N61为用户代理档案数据 库的地址信息, 该地址信息中存有移动终端 (型号为 Nokia-N61 ) 的设备能力信息。  In the above content, Profile-rep:http:〃profilerepository.oma.org/Nokia/—N61 is the address information of the user agent file database, and the device information of the mobile terminal (type Nokia-N61) is stored in the address information.
步骤 802:视频流媒体服务器根据所述请求中的用户代理档案数据库地址到用户代理档 案数据库中获取用户终端设备能力信息, 即向用户代理档案数据库发送基于 HTTP 的 Request消息, 请求查询该移动终端的屏幕设备能力, 消息格式如下:  Step 802: The video streaming media server obtains user terminal device capability information according to the user agent file database address in the request, and sends an HTTP-based Request message to the user agent file database, requesting to query the mobile terminal. Screen device capabilities, message format is as follows:
POST /Nokia/N61 HTTP/1.1  POST /Nokia/N61 HTTP/1.1
Host: profilerepository.oma.org Query Type: " Screen—Capacity" Host: profilerepository.oma.org Query Type: "Screen-Capacity"
步骤 803 : 用户代理档案数据库收到视频流媒体服务器的请求后, 将当前用户终端的设 备能力信息反馈给所述的视频流媒体服务器, 即通过 HTTP的 Response消息发送给所述视 频流媒体服务器, 消息的格式如下:  Step 803: After receiving the request of the video streaming server, the user agent file database feeds back the device capability information of the current user terminal to the video streaming media server, that is, sends the response message to the video streaming media server through an HTTP response message. The format of the message is as follows:
HTTP/1.1 200 OK  HTTP/1.1 200 OK
QueryResult: " Screen—Capacity: physical size = 2.8, resolution = 240—320"  QueryResult: " Screen—Capacity: physical size = 2.8, resolution = 240-320"
其中 QueryResult: " Screen_Capacity: physical size = 2.8, resolution = 240_320"返回了移 动终端的设备能力信息, 表示其物理屏幕大小为 2.8,支持的分辨率为 240 X 320。  Where QueryResult: "Screen_Capacity: physical size = 2.8, resolution = 240_320" returns the device capability information of the mobile terminal, indicating that its physical screen size is 2.8 and the supported resolution is 240 X 320.
步骤 804: 所述视频流媒体服务器根据用户代理档案服务器反馈的信息, 从一系列转码 视频, 以及转码字幕比特图组中选择出合适所述用户终端播放的转码视频以及转码分辨率 字幕比特图组。 对应于上述提供的格式信息中的内容, 本实施例中应该选择 240*320 分辨 率的转码视频和对应于 2.8英寸, 240*320分辨率转码字幕比特图组以供传输。 所述视频流 媒体服务器将选择出的视频以及字幕比特图组传送给所述用户终端, 其传输方式与上述步 骤 703中的传输方式相同。  Step 804: The video streaming server selects, according to the information fed back by the user agent file server, a transcoded video and a transcoding resolution suitable for the user terminal to play from a series of transcoded video and transcoded caption bitmap groups. Subtitle bitmap group. Corresponding to the content in the format information provided above, in this embodiment, a transcoded video of 240*320 resolution and a transposed subtitle bitmap set corresponding to 2.8 inches and 240*320 resolution should be selected for transmission. The video streaming server transmits the selected video and the subtitle bitmap group to the user terminal in the same manner as the transmission in the step 703.
通过以上实施例的描述, 本领域的技术人员可以清楚地了解到本发明可借助软件加必 需的通用硬件平台的方式来实现, 当然也可以通过硬件, 但很多情况下前者是更佳的实施 方式。 基于这样的理解, 本发明实施例的技术方案本质上或者说对现有技术做出贡献的部 分可以以软件产品的形式体现出来, 该软件产品存储在一个存储介质中, 包括若干指令用 以使得移动设备 (可以是手机, 个人计算机, 媒体播放器等) 执行本发明各个实施例所述 的方法。 这里所称的存储介质, 如: ROM/RAM、 磁盘、 光盘等。  Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is a better implementation. . Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which is stored in a storage medium and includes a plurality of instructions for making The mobile device (which may be a cell phone, personal computer, media player, etc.) performs the methods described in various embodiments of the present invention. The storage medium referred to herein is, for example, a ROM/RAM, a magnetic disk, an optical disk, or the like.
显然, 本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和 范围。 这样, 倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内, 则本发明也意图包含这些改动和变型在内。 以上所述仅为本发明的较佳实施例, 并不用以限制本发明, 凡在本发明的精神和原则 之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。  It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and the modifications of the invention The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims

权 利 要 求 书 Claim
1 . 一种字幕匹配处理的方法, 其特征在于, 所述方法包括:  A method for subtitle matching processing, the method comprising:
接收来自用户终端的播放请求;  Receiving a play request from a user terminal;
根据所述播放请求获取所述用户终端的设备能力信息;  Obtaining device capability information of the user terminal according to the play request;
根据所述设备能力信息, 为所述用户终端提供转码字幕比特图组。  And providing, according to the device capability information, a transcoded caption bitmap group for the user terminal.
2. 根据权利要求 1所述方法, 其特征在于: 所述播放请求中携带所述用户终端的设备能 力信息, 所述根据所述播放请求获取所述用户终端的设备能力信息具体包括: 直接获取来自 用户终端的播放请求中的设备能力信息; 或 The method according to claim 1, wherein: the play request carries the device capability information of the user terminal, and the acquiring the device capability information of the user terminal according to the play request specifically includes: directly acquiring Device capability information in a play request from a user terminal; or
所述播放请求携带用户代理档案数据库地址信息, 所述根据所述播放请求获取所述用户 终端的设备能力信息具体包括: 根据所述用户代理档案数据库地址信息从所述用户代理档案 数据库获取所述用户终端的设备能力信息。  The playing request carries the user agent file database address information, and the acquiring the device capability information of the user terminal according to the playing request specifically includes: acquiring, according to the user agent file database address information, the user agent file database Device capability information of the user terminal.
3. 根据权利要求 1或 2所述的方法, 其特征在于, 所述方法还包括: 根据所述设备能力 信息, 为所述用户终端提供转码视频。 The method according to claim 1 or 2, wherein the method further comprises: providing the user terminal with a transcoded video according to the device capability information.
4.根据权利要求 3 所述方法, 其特征在于, 所述为所述用户终端提供转码视频以及为所 述终端提供转码字幕比特图组具体包括: The method according to claim 3, wherein the providing the transcoding video for the user terminal and providing the transcoded subtitle bitmap group for the terminal comprises:
将视频数据和音频数据以及字幕比特图组分别打包生成三个基本流, 并在节目映射表中 分别为所述三个基本流分配包标识;  The video data and the audio data and the subtitle bitmap group are separately packaged to generate three elementary streams, and the packet identifiers are respectively allocated to the three elementary streams in the program mapping table;
将所述三个基本流经复用后生成的 TS流下发给所述用户终端。  And transmitting the TS stream generated by multiplexing the three basic flows to the user terminal.
5.—种字幕比特图组的制作方法, 其特征在于, 所述方法包括: A method for manufacturing a subtitle bitmap group, the method comprising:
从视频图象中获取字幕区域, 以及获取所述字幕区域对应的时间信息和位置信息; 根据所述获取的字幕区域以及所述时间信息和位置信息, 生成原始字幕比特图组; 根据分辨率对所述原始字幕比特图组进行转码处理形成转码字幕比特图组。  Obtaining a subtitle area from the video image, and acquiring time information and location information corresponding to the subtitle area; generating an original subtitle bitmap group according to the acquired subtitle area and the time information and location information; The original subtitle bitmap group is subjected to transcoding processing to form a transcoded subtitle bitmap group.
6.根据权利要求 5 所述方法, 其特征在于, 所述从视频图象中获取字幕区域, 以及所述 字幕区域所对应的时间信息和位置信息, 具体包括: The method according to claim 5, wherein the obtaining the subtitle area from the video image, and the time information and the location information corresponding to the subtitle area, specifically include:
检测字幕区域的范围, 对检测出来的字幕区域进行确认; 对经过确认的字幕区域进行定位, 获取字幕区域所对应的位置信息; Detecting the range of the subtitle area and confirming the detected subtitle area; Positioning the confirmed subtitle area to obtain location information corresponding to the subtitle area;
对经过定位的字幕区域进行跟踪, 获取字幕区域所对应的时间信息。  Tracking the subtitle area that has been located to obtain time information corresponding to the subtitle area.
7.—种字幕制作的装置, 其特征在于, 所述装置包括: 7. A device for making subtitles, characterized in that the device comprises:
字幕区域获取模块: 用于获取视频图象中的字幕区域以及所述字幕区域的位置信息和时 间信息, 根据所述字幕区域, 以及所述字幕区域的位置信息和时间信息, 生成原始字幕比特 图组;  a subtitle area obtaining module: configured to acquire a subtitle area in the video image and location information and time information of the subtitle area, and generate an original subtitle bitmap according to the subtitle area, and location information and time information of the subtitle area. Group
转码模块: 用于根据分辩率对所述原始字幕比特图组进行转码处理形成转码字幕比特图 组。  Transcoding module: configured to perform transcoding processing on the original subtitle bitmap group according to a resolution to form a transcoded subtitle bitmap group.
8.根据权利要求 7所述字幕制作装置,其特征在于,所述字幕区域获取模块进一步包括: 字幕区域粗检测单元: 用于确定字幕区域的范围; The captioning device according to claim 7, wherein the caption region obtaining module further comprises: a caption region rough detecting unit: configured to determine a range of the caption region;
字幕区域确认单元: 用于对所述字幕区域进行确认;  Subtitle area confirmation unit: for confirming the subtitle area;
字幕区域定位单元: 用于对经确认的字幕区域的位置进行定位, 获取字幕区域的位置信 息;  a subtitle area locating unit: configured to locate a position of the confirmed subtitle area, and obtain location information of the subtitle area;
字幕区域跟踪单元: 用于获取所述字幕区域的时间信息;  a subtitle area tracking unit: time information for acquiring the subtitle area;
字幕比特图组形成单元: 根据所述获得的字幕区域, 以及所述字幕区域的时间信息和位 置信息, 生成原始字幕比特图组;  a subtitle bitmap group forming unit: generating an original subtitle bitmap group according to the obtained subtitle region, and time information and location information of the subtitle region;
所述转码模块进一步包括:  The transcoding module further includes:
视频转码单元: 根据分辨率对原始视频进行转码处理, 形成转码视频;  Video transcoding unit: transcoding the original video according to the resolution to form a transcoded video;
字幕比特图转码单元: 根据分辨率对原始字幕比特图组进行转码处理, 形成转码字幕比 特图组。  Subtitle bitmap transcoding unit: Transcodes the original subtitle bitmap group according to the resolution to form a transcoded subtitle bitmap group.
9. 一种字幕匹配处理的装置, 其特征在于, 所述装置包括: 9. A device for subtitle matching processing, the device comprising:
接收模块: 接收来自用户终端的播放请求, 所述播放请求携带所述用户终端的设备能力 信息或保存有所述用户终端的设备能力信息的用户代理档案数据库地址信息;  a receiving module: receiving a play request from the user terminal, where the play request carries device capability information of the user terminal or user agent file database address information that stores device capability information of the user terminal;
发送模块: 用于根据所述用户终端的设备能力信息, 为所述用户终端提供转码字幕比特 图组。  The sending module is configured to provide the user terminal with a transcoded caption bitmap group according to the device capability information of the user terminal.
10. 如权利要求 9所述的装置, 其特征在于: 所述发送模块还用于根据所述用户终端的设备能力信息, 为终端提供转码视频。 10. Apparatus according to claim 9 wherein: The sending module is further configured to provide a transcoded video for the terminal according to the device capability information of the user terminal.
11 . 一种字幕匹配处理的系统, 其特征在于, 所述系统包括: 11. A system for subtitle matching processing, the system comprising:
字幕制作装置, 用于根据分辨率制作转码字幕比特图组;  a caption making device, configured to generate a transcoded caption bitmap group according to a resolution;
字幕匹配处理装置, 用于接收来自用户终端的播放请求, 所述播放请求中携带所述用户 终端的设备能力信息, 并根据所述用户终端的设备能力信息为所属用户终端提供转码字幕比 特图组。  a caption matching processing device, configured to receive a play request from the user terminal, where the play request carries device capability information of the user terminal, and provides a transcoded caption bitmap for the user terminal according to the device capability information of the user terminal group.
12. 如权利要求 11所述的系统, 其特征在于, 所述系统还包括: 12. The system of claim 11, wherein the system further comprises:
用户代理档案数据库: 用于存储用户终端的设备能力信息;  User agent file database: device capability information for storing user terminals;
所述字幕匹配处理装置还用于接收来自用户终端的携带用户代理档案数据库地址信息的 播放请求, 并根据所述用户代理档案数据库地址信息获取所述用户终端的设备能力信息, 进 一步根据所述用户终端的设备能力信息为所属用户终端提供转码字幕比特图组。  The caption matching processing device is further configured to receive a play request for carrying the user agent file database address information from the user terminal, and obtain device capability information of the user terminal according to the user agent file database address information, further according to the user The device capability information of the terminal provides a group of transcoded subtitle bitmaps for the user terminal.
13. 如权利要求 11或 12所述的系统, 其特征在于: 13. The system of claim 11 or 12, wherein:
所述字幕制作装置还用于根据分辨率制作转码视频;  The caption making device is further configured to create a transcoded video according to the resolution;
所述字幕匹配处理装置还用于根据所述终端的设备能力信息为所述用户终端提供转码视 频。  The caption matching processing device is further configured to provide the user terminal with transcoded video according to the device capability information of the terminal.
14. 一种用户终端, 其特征在于, 所述用户终端包括: A user terminal, the user terminal comprising:
播放请求模块: 生成播放请求, 所述播放请求中包含所述用户终端的设备能力信息或用 户代理档案数据库的地址信息;  a play requesting module: generating a play request, where the play request includes device capability information of the user terminal or address information of a user agent file database;
视频播放模块: 播放视频, 以及转码字幕比特图组;  Video playback module: play video, and transcode subtitle bitmap group;
通信模块: 用于发送播放请求; 接收视频, 以及转码字幕比特图组。  Communication module: used to send a play request; receive video, and transcode caption bitmap group.
PCT/CN2009/073240 2008-09-02 2009-08-13 Method, system and device of the subtitle matching process WO2010025646A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200810141753A CN101668132A (en) 2008-09-02 2008-09-02 Method and system for matching and processing captions
CN200810141753.9 2008-09-02

Publications (1)

Publication Number Publication Date
WO2010025646A1 true WO2010025646A1 (en) 2010-03-11

Family

ID=41796737

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/073240 WO2010025646A1 (en) 2008-09-02 2009-08-13 Method, system and device of the subtitle matching process

Country Status (2)

Country Link
CN (1) CN101668132A (en)
WO (1) WO2010025646A1 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102594805A (en) * 2012-01-30 2012-07-18 中兴通讯股份有限公司 Method and system for providing multiple media services through single node
CN103294683B (en) * 2012-02-24 2016-09-14 腾讯科技(深圳)有限公司 A kind of video file captions automatic patching system and method
CN102625052B (en) * 2012-03-28 2014-08-20 广东威创视讯科技股份有限公司 Method, device and system for processing caption data
CN103379363B (en) * 2012-04-19 2018-09-11 腾讯科技(深圳)有限公司 Method for processing video frequency and device, mobile terminal and system
CN102663988B (en) * 2012-04-28 2015-06-24 广东威创视讯科技股份有限公司 Method, device and system for broadcasting subtitles
CN102984546A (en) * 2012-11-01 2013-03-20 上海文广互动电视有限公司 Transcoding service system for distributed video transcoding
US9131111B2 (en) * 2012-11-02 2015-09-08 OpenExchange, Inc. Methods and apparatus for video communications
CN105791367A (en) * 2014-12-25 2016-07-20 中国移动通信集团公司 Method, system and related equipment for sharing auxiliary media information in screen sharing
CN105957014A (en) * 2016-06-13 2016-09-21 天脉聚源(北京)传媒科技有限公司 Picture adaptive display method and apparatus
CN108156480B (en) * 2017-12-27 2022-01-04 腾讯科技(深圳)有限公司 Video subtitle generation method, related device and system
CN111131351B (en) * 2018-10-31 2022-09-27 中国移动通信集团广东有限公司 Method and device for confirming model of Internet of things equipment
CN110177295B (en) * 2019-06-06 2021-06-22 北京字节跳动网络技术有限公司 Subtitle out-of-range processing method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5805153A (en) * 1995-11-28 1998-09-08 Sun Microsystems, Inc. Method and system for resizing the subtitles of a video
KR20020097417A (en) * 2001-06-21 2002-12-31 엘지전자 주식회사 Processing apparatus for closed caption in set-top box
CN1432255A (en) * 2000-06-02 2003-07-23 汤姆森特许公司 Auxiliary information processing system with bitmapped on-screen display using limited computing resources
CN1642234A (en) * 2004-01-12 2005-07-20 松下电器产业株式会社 Caption treating device
CN1778111A (en) * 2003-04-22 2006-05-24 松下电器产业株式会社 Reproduction device and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5805153A (en) * 1995-11-28 1998-09-08 Sun Microsystems, Inc. Method and system for resizing the subtitles of a video
CN1432255A (en) * 2000-06-02 2003-07-23 汤姆森特许公司 Auxiliary information processing system with bitmapped on-screen display using limited computing resources
KR20020097417A (en) * 2001-06-21 2002-12-31 엘지전자 주식회사 Processing apparatus for closed caption in set-top box
CN1778111A (en) * 2003-04-22 2006-05-24 松下电器产业株式会社 Reproduction device and program
CN1642234A (en) * 2004-01-12 2005-07-20 松下电器产业株式会社 Caption treating device

Also Published As

Publication number Publication date
CN101668132A (en) 2010-03-10

Similar Documents

Publication Publication Date Title
WO2010025646A1 (en) Method, system and device of the subtitle matching process
US10582201B2 (en) Most-interested region in an image
RU2598800C2 (en) Device orientation capability exchange signaling and server adaptation of multimedia content in response to device orientation
JP6462566B2 (en) Transmitting apparatus, transmitting method, receiving apparatus, and receiving method
EP3313083B1 (en) Spatially-segmented content delivery
WO2016138844A1 (en) Multimedia file live broadcast method, system and server
EP3466084A1 (en) Advanced signaling of a most-interested region in an image
AU2018299899B2 (en) Enhanced high-level signaling for fisheye virtual reality video in dash
WO2008061416A1 (en) A method and a system for supporting media data of various coding formats
KR102247404B1 (en) Enhanced high-level signaling for fisheye virtual reality video
WO2014193996A2 (en) Network video streaming with trick play based on separate trick play files
CN110622516B (en) Advanced signaling for fisheye video data
JP5734699B2 (en) Super-resolution device for distribution video
KR101426579B1 (en) Apparatus and method for providing images in wireless communication system and portable display apparatus and method for displaying images
US20180227065A1 (en) Reception apparatus, transmission apparatus, and data processing method
KR101690153B1 (en) Live streaming system using http-based non-buffering video transmission method
WO2022213034A1 (en) Transporting heif-formatted images over real-time transport protocol including overlay images
CN117099375A (en) Transmitting HEIF formatted images via real-time transport protocol

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09811021

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09811021

Country of ref document: EP

Kind code of ref document: A1