WO2010025646A1

WO2010025646A1 - Method, system and device of the subtitle matching process

Info

Publication number: WO2010025646A1
Application number: PCT/CN2009/073240
Authority: WO
Inventors: 吴治国; 李智斌; 赵雷
Original assignee: 华为技术有限公司
Priority date: 2008-09-02
Filing date: 2009-08-13
Publication date: 2010-03-11
Also published as: CN101668132A

Abstract

A method, system and device of subtitle matching process are disclosed in the embodiments of the present invention, wherein the method includes the following steps of: receiving a playing request from a user terminal; obtaining the device capability information of the user terminal according to the playing request; providing transcoded subtitle bit map group for the user terminal according to the device capability information. The embodiments of the present invention also disclose a method for making subtitle bit map group, a device for making subtitles, a device of subtitle matching process and a system of subtitle matching process. The present invention can send the subtitle bit map group suitable for the screen resolution of a display device according to the capabilities of display device of various user terminals, so that the user can distinctly recognize the subtitle text when watching the video.

Description

Method, system and device for subtitle matching processing

The present application claims the priority of the application filed on Sep. 2, 2008, the application number of which is incorporated herein by reference. . Technical field

The present invention relates to the field of video communications, and in particular, to a method, system and apparatus for subtitle matching processing.

Say

Background technique

With the development of mobile communication technology, especially the era of 3G (3rd Generation, third generation digital communication), the mobile video service led by mobile TV has gradually begun to be commercialized. With the support of high-speed mobile communication technology, users can enjoy high-quality video service books on mobile terminals represented by mobile phones. However, the video content is played by the display device of the mobile terminal. Due to the processing capability of the mobile terminal and the limitation of the screen size, the video content provided by the current mobile video service to the user is mostly by MPEG (Moving Pictures Experts Group)-1. The video files of conventional resolutions such as MPEG-2 and MPEG-4 are obtained by ordinary video transcoding and reduced resolution. Most of the video files are embedded with subtitles corresponding to their contents, so that the user can directly Understanding of the content.

In the process of implementing the present invention, the inventor has found that for the video content provided to the mobile terminal, since the transcoding and the resolution conversion processing are performed, the size of the subtitle embedded in the video content is also greatly reduced. However, it becomes blurred. The user of the current mobile terminal can hardly and clearly recognize the subtitle information when watching the video, which will affect the user's complete viewing and understanding of the video content, thereby affecting the user experience. Summary of the invention

In order to solve the defect that the subtitles in the video in the prior art become blurred, the present invention provides a method, system and device for subtitle matching processing.

A method for processing a subtitle matching process according to an embodiment of the present invention includes: receiving a play request from a user terminal; acquiring device capability information of the user terminal according to the play request; The user terminal provides a transcoded caption bitmap group.

In addition, the embodiment of the present invention further provides a method for creating a subtitle bitmap group, which specifically includes: acquiring a subtitle area from a video image, and acquiring time information and location information corresponding to the subtitle area; The area and the time information and the location information are generated, and the original subtitle bitmap group is generated; and the original subtitle bitmap group is transcoded according to the resolution to form a transcoded subtitle bitmap group. In addition, an embodiment of the present invention further provides an apparatus for creating a subtitle, the apparatus comprising: a subtitle area obtaining module: configured to acquire a subtitle area in a video image and location information and time information of the subtitle area, according to the subtitle a region, and location information and time information of the subtitle region, generating an original subtitle bitmap group; and a transcoding module: configured to perform transcoding processing on the original subtitle bitmap group according to a resolution to form a transcoded subtitle bitmap group.

In addition, an embodiment of the present invention further provides an apparatus for performing subtitle matching processing, where the apparatus includes: a receiving module: receiving a play request from a user terminal, where the play request carries device capability information of the user terminal or saves the user The user agent profile database address information of the device capability information of the terminal; the sending module: configured to provide the user terminal with a transcoded caption bitmap group according to the device capability information of the user terminal.

In addition, the embodiment of the present invention further provides a system for subtitle matching processing, including: a subtitle creation device, configured to generate a transcoded subtitle bitmap group according to a resolution; and a subtitle matching processing device, configured to receive a play request from the user terminal, The play request carries device capability information of the user terminal, and provides a transcoded caption bitmap group to the terminal according to the device capability information of the terminal.

In addition, the embodiment of the present invention further provides a user terminal, including: a play requesting module: generating a play request, where the play request includes device capability information of the user terminal or address information of a user agent file database; the video play module: Playing video, and transcoding subtitle bitmap group; communication module: for transmitting a play request; receiving video, and transcoding subtitle bitmap group.

As can be seen from the above embodiments, the present invention can transmit a subtitle bitmap group adapted to the resolution of the screen according to the display device capabilities of different terminals, thereby enabling the user to clearly recognize the subtitle text while watching the video. DRAWINGS

1 is a structural diagram of a caption matching processing system according to an embodiment of the present invention;

2 is a structural diagram of a subtitle matching processing apparatus according to an embodiment of the present invention;

3 is a structural diagram of a caption making apparatus according to an embodiment of the present invention;

4 is a structural diagram of the user terminal according to an embodiment of the present invention;

FIG. 5 is a flowchart of a method for creating a caption bitmap according to an embodiment of the present invention;

FIG. 6 is a flowchart of the method for processing subtitle matching according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a method for transmitting a video and a transcoded subtitle bitmap group according to an embodiment of the present invention; FIG. 8 is a flowchart of the method for processing a subtitle matching according to an embodiment of the present invention. detailed description

In order to make the present invention better understood by those skilled in the art, the present invention will be further described in detail below with reference to the accompanying drawings. Description.

FIG. 1 is a system for subtitle matching processing according to an embodiment of the present invention. The system includes: a subtitle matching processing device 101, a subtitle creation device 102, and a user agent archive database 103. among them:

a caption matching processing device 101, configured to receive a play request from a user terminal, where the play request carries device capability information of the user terminal, and provide a transcoded caption bitmap group for the terminal according to device capability information of the terminal, The device is further configured to receive a play request for carrying the user agent file database address information from the user terminal, and obtain device capability information of the user terminal according to the user agent file database address information, and further according to the device capability information of the terminal. Provide a transcoded caption bitmap group for the terminal.

Subtitle creation apparatus 102: configured to create the subtitle bitmap group according to a resolution, and the apparatus obtains a series of transcoding subtitle bitmap groups of different resolutions corresponding to the video content by processing the original video. In order to provide a suitable subtitle bitmap group for different types, different display devices and screen size user terminals, the subtitle creation device 102 first determines the coarse position of the subtitle region in the video image, which may be the coordinates in the video image. Information, pixel point number information, etc. Thereafter, the subtitle area is segmented from the video image sequence, and time information and location information corresponding to the subtitle area are simultaneously acquired, for example, a playback start time point and an end time point corresponding to the subtitle area, or the subtitle area corresponds to The video frame start frame number and end frame number, the corresponding position of the subtitle area in the screen, may be the coordinate value of the center point of the subtitle area on the screen, and the like. A caption bitmap group is generated based on the caption region acquired as described above, and time information and position information corresponding to the caption region. Finally, the original subtitle bitmap group is subjected to resolution transcoding according to the physical size and resolution of different screens of common types of user terminals, and a series of transcoding subtitle bitmap groups of different resolutions are obtained. The captioning device may further perform transcoding processing on the original video according to different resolutions to obtain a series of transcoded videos of different resolutions.

The subtitle matching processing system may further include a user agent file database 103. The user agent file database stores device information of various user terminals, and the device information may include: a device name, a type, a screen size, and a screen resolution. Wait.

In another embodiment of the present invention, the caption matching processing apparatus 101 of the present invention may further include: a receiving module 1011, a transmitting module 1012, as shown in FIG. 2, wherein:

The receiving module 1011 is configured to receive a play request sent by the user terminal, and further analyze, by using the play request, the device capability information of the user terminal, which may be a model, a category, a size of the screen, and a resolution of the screen. In an embodiment of the present invention, the module may also parse the user agent file database address information for storing the user terminal device information in the play request, such as the database URL (Uniform Resource Locator) address. The URL address can be expressed in the form of an IP (Internet Protocol) protocol or domain name. And further acquiring the device capability letter of the user terminal from the user agent archive database Interest. The address information carries device capability information of the user terminal, such as a terminal model.

The sending module 1012 is configured to send, according to device capability information of the user terminal obtained by the receiving module 1011, a transcoded video adapted to the device and a transcoded caption bitmap group adapted to the device. The form of the transcoded video and subtitle bitmap group that it sends may be in the form of a file, or a form of a media stream.

In one embodiment of the present invention, the captioning device 102 may further include the following modules, as shown in FIG. 3, including: a caption region obtaining module 1021, a transcoding module 1022, where:

The subtitle area obtaining module 1021 is configured to detect a subtitle area in the video image, and position information and time information of the subtitle area, and generate a subtitle bitmap group according to the subtitle area, and the position information and the time information of the subtitle area. In another embodiment of the present invention, the subtitle area obtaining module 1021 may further include the following units: a subtitle area coarse detecting unit 1021a, a subtitle area confirming unit 1021b, a subtitle area positioning unit 1021c, a subtitle area tracking unit 1021d, and a subtitle bitmap group. Forming a unit 1021e, as shown in FIG. 3, wherein:

Subtitle area coarse detecting unit 1021a: This unit is used for coarse detection of the subtitle area in the video image, and determines the rough position and range of the subtitle area in the screen. There are various detection methods, and preferably image texture is adopted. Information is used as the basis for testing.

Subtitle area confirming unit 1021b: This unit is used to confirm the possible subtitle area detected by the coarse detection, and the confirmed subtitle area is the subtitle area that can be acquired. There are various methods for confirmation, and it is preferable to perform the confirmation of the subtitle area by the method based on the region texture constraint.

Subtitle area locating unit 1021c: The unit is used for locating the exact position of the confirmed subtitle area, and the subtitle area positioning can be implemented by various methods, such as edge point density or pixel gradation value in the horizontal and vertical directions in the pixel domain. The positioning information is obtained on the projection contour, or the subtitle region is located by using the block texture intensity projection in the compressed domain. After the subtitle area is located, the system will obtain the size information and position information of the subtitle area relative to the original video image, such as the length and width of the subtitle area, the coordinates of the center, and the like.

Subtitle area tracking unit 1021d: The module is configured to acquire time information of the subtitle area, such as a start time and an end time of the subtitle area playing, or a start time and duration or a start frame number corresponding to the subtitle area. The end frame number can also be the starting frame number and the number of persistent frames. There are various methods for implementing the function of the module, such as a projection contour based caption tracking method, or a motion vector in the compressed domain for caption tracking.

Subtitle bitmap group forming unit 1021e: The start frame or the end frame obtained after the positioning tracking may also be the corresponding background frame for subtitle region segmentation, and the subtitle region is segmented from the original video image. The division of the subtitle area can be performed by a method based on Max or Min of a multi-frame, or by other methods such as division of a subtitle based on a histogram. After the corresponding frame in the original video is divided by the subtitle area, the original subtitle bitmap group is formed.

Transcoding module 1022: performing transcoding at different resolutions on the original subtitle bitmap group formed by the subtitle acquisition module Rational, forming a transcoding subtitle bitmap group of different resolutions. In another embodiment of the present invention, the transcoding module 1022 may further include: a subtitle bitmap transcoding unit 1022a, which is compatible with the existing video transcoding function. The video transcoding unit 1022b is as shown in FIG. 3, wherein:

Subtitle bitmap transcoding unit 1022a: According to different display devices of common types of user terminals, such as the physical size of the screen, the size of the screen resolution, etc., respectively, the original subtitle bitmap group is subjected to resolution transcoding processing to obtain a series of different resolutions. The transcoded caption bitmap group of the rate is used for providing to the user terminal of the different display devices.

The video transcoding unit 1022b: performs resolution transcoding processing on the original video according to different display devices of the common type user terminal, such as the physical size of the screen, the size of the screen resolution, etc., to obtain a series of transcoding videos of different resolutions. Used for user terminals provided to the above different display devices.

In another embodiment of the present invention, the subtitle matching processing system may further include a user agent profile database.

103. The user agent profile database 103 stores device information of various user terminals, and the device information may include: a name, a type, a size of the screen, a resolution of the screen, and the like. In the system with the user agent archive database 103, the play request sent by the user terminal to the video service module may carry the address information of the user agent profile database 103, such as the URL address of the database, which may be in the form of an IP address or a domain name. The video service module obtains the device information of the user terminal from the user agent profile database 103 according to the address information of the user agent profile database 103, and then sends the transcoded video adapted to the device to the user terminal, and the device thereof. A suitable transcoding subtitle bitmap set.

In the above embodiment, the caption matching processing system and the caption creating device and the caption matching processing device in the system may be integrated into the existing streaming media server in actual applications, or may separately provide services to the user terminal. The user terminal is configured to send a play request to the video server, where the play request carries the terminal device capability information, where the information may be the screen size of the terminal, or the model, category, and the like of the terminal. In another embodiment of the present invention, the play request sent by the user terminal may not carry the device capability information, but the address information of the user agent file database storing the device capability information, such as the URL address of the user agent file database. , can be expressed in the form of an IP address or a domain name. The user terminal is further configured to receive and play the transcoded video and the transcoded subtitle bitmap group sent by the video service module. The type of the user terminal may be a conventional display terminal such as a television set, a PC display, or the like, or may be a mobile terminal with a display device, such as a mobile TV, a mobile phone with a video playing function, a mobile TV, and the like. In another embodiment of the present invention, the user terminal may further include the following units, as shown in FIG. 4: a play request module 201, a video play module 202, and a communication module 203.

The play request module 201: according to the user's needs, such as the video content that the user needs to watch, may be the name of the video or the number of the video, and the capabilities of the own device, which may be the size of the display screen, the model of the device, the processor The processing capability, etc., generates a play request. The video playing module 202: plays the transcoded video delivered by the video service module, and the transcoded subtitle bitmap group. When playing the video, each subtitle bitmap in the subtitle bitmap group appears in the screen according to the time information thereof, and correspondingly The video content is simultaneously played, and the position of the subtitle bitmap in the screen is determined by the position information in the subtitle bitmap, and the position information may be coordinate information, center position information, or the like.

Communication module 203: for transmitting a play request; receiving a transcoded video adapted to its device, and a transcoded caption bitmap group adapted to its device.

In an embodiment of the present invention, a specific step of subdividing the subtitle area from the original video and making a series of transcoding subtitle bitmap groups of different resolutions according to different resolutions is as shown in FIG. 5, which includes:

Step 501: First, determine a video that needs to be processed;

Step 502: detecting a range of the subtitle area, and obtaining a rough area of the subtitle area in the video image. The preferred detection method is a subtitle area detection method based on a DCT (Discrete Cosine Transform) coefficient. Other methods that enable sub-area range detection are also possible;

Step 503: Confirm the detected subtitle area to obtain a subtitle area that can be obtained. In this step, morphological filtering can be used to connect the gap between characters and characters and to eliminate noise. The method of confirming the subtitle area is preferably based on the method of region texture constraint, and other methods for confirming the subtitle area are also Step 504: Position the confirmed subtitle area, obtain position information of the subtitle area, and obtain a size and a position of the subtitle area relative to the original video image. A preferred positioning method is a method of using block texture intensity projection in the compressed domain. Other methods that enable subtitle area positioning are also possible;

Step 505: Track the subtitle area that is located, and obtain time information corresponding to the subtitle area. For example, the subtitle playback corresponds to the start time and end time of the video, and the subtitle playback corresponds to the start frame number and the end frame number of the video. The tracking method used is preferably a method of tracking using a motion vector in a compressed domain, and other methods capable of tracking subtitle regions and determining playback time information corresponding to the subtitle region are also possible;

Step 506: The subtitle area obtained in step 503 is separated from the original video image. A preferred method of segmentation is to use a pre-fusion and background method. Other methods for realizing segmentation of subtitle regions are also possible;

Step 507: Firstly, all the subtitle regions separated from the original video image, together with their associated time information and location information, first generate a single subtitle unit, and all the subtitle units together constitute the original subtitle bitmap group of the video. The information contained in the single caption unit may include: caption bit map data, caption bit map position information, for example: coordinates of the center of the caption bit map, etc., play time information of the caption bit map, for example, the play corresponding to the caption bit map The start time and the end time may also be the start frame number and the end frame number corresponding to the caption bitmap. This embodiment provides a recommended storage mode for the caption unit. As shown in the following Table 1, it should be noted that the storage mode is not the only way. Other ways in which the information can be stored are also possible. table

Step 508: Transcoding the original video according to different resolutions to form a set of transcoding video of different resolutions.

Step 509: Perform transcoding processing on the generated original subtitle bitmap group according to different resolutions to form a series of transcoded subtitle bitmap groups of different resolutions. 509 and 508 have no timing relationship.

A set of different resolution transcoded videos formed in the above step 508, and a set of different resolution transcoded subtitle bitmaps generated in the above step 509 will be provided to user terminals of different hardware device capabilities.

The embodiment of the invention provides a method for subtitle matching processing, which includes the following steps as shown in FIG. 6:

Step 601: The mobile terminal initiates a request for playing a streaming video to the streaming server as the user terminal, where the request includes a video name or a video number, and device capability information of the mobile terminal, such as a physical size of the screen and a supported resolution. The request may be sent through a Request message of the HTTP (Hypertext Transfer Protocol) protocol, and the specific format of the message is as follows:

GET /pub/mobile/discovery.ts HTTP/1.1

Host: stream.ifeng.com

Accept: */*

Profile-dev: "physical size = 2.8, resolution = 240-320"

In the above: Profile-dev: "physical size = 2.8, resolution = 240_320" is a specific device capability information, indicating that its terminal screen has a physical size of 2.8 and a supported resolution of 240 X 320.

Step 602: After receiving the play request of the mobile terminal, the streaming media server selects the most suitable mobile from the transcoded video file group and the subtitle bitmap group according to the physical size and the supported resolution of the mobile terminal screen in the play request. a transcoded video file and a transcoded caption bitmap group of the terminal screen or its resolution, and sent to the mobile terminal (in this embodiment, the support screen should have a physical size of 2.8 and a resolution of 240 X 320) Video and subtitle bitmap group). It can be sent in the form of TS (Transport Stream) or RTP (Real-time Transport). Protocol, real-time transport protocol), etc., the streaming media server directly sends the transcoded video and the transcoded subtitle bitmap data in the TS stream by the following two preferred methods. The method 1 is as shown in Figure 7: Transcoding video data And the audio data and the transcoded subtitle bitmap group are respectively packaged to generate three packetized elementary streams VPES (Video Packetized Elementary Stream), APES (Audio Packetized Elementary Stream) and P CPES (Caption Packetized Elementary Stream), and assigns PID (Package IDentity) to the three PES streams in the PMT (Program Map Table), and VPES, APES and CPES are multiplexed. After the TS stream is generated, it is sent to the mobile terminal by the streaming media server. Method 2: Since the data of the transcoded subtitle bitmap group is less than the video data and the audio data, the part of the data is placed in the adjustment field of the TS packet of the video data, and the adjustment field control position is "11" .

The streaming media server directly sends the transcoded video and the transcoded subtitle bitmap group in the RTP stream, and then packages it into a TS stream according to the foregoing method, and then packages it into the RTP according to the RFC2038 RTP Payload Format for MPEG1/MPEG2 Video standard. The packet is sent to the mobile terminal through the RTP/RTSP (Real Time Streaming Protocol) protocol.

As shown in FIG. 8 , another method for performing subtitle matching processing according to an embodiment of the present invention includes the following steps, where: Step 801: A user terminal initiates a request for playing a streaming media video to a video streaming server, where the request includes storing the The user agent archive database address of the capability information of the screen device of the user terminal. The screen device capability information may be a physical size of the screen and a supported resolution, etc., and the request also includes a video name or a video number required by the user. In this embodiment, a request sending manner is provided. The method sends a request for playing a streaming media video to a video streaming server through a Request message of the HTTP protocol, where the request includes a user agent that stores capability information of the screen device of the user terminal. The file database address, the specific format of the message is as follows:

GET /pub/mobile/discovery.ts HTTP/1.1

Host: stream.ifeng.com

Accept: */*

Profile-rep: http://profilerepository. oma. org/Nokia/N61

In the above content, Profile-rep:http:〃profilerepository.oma.org/Nokia/—N61 is the address information of the user agent file database, and the device information of the mobile terminal (type Nokia-N61) is stored in the address information.

Step 802: The video streaming media server obtains user terminal device capability information according to the user agent file database address in the request, and sends an HTTP-based Request message to the user agent file database, requesting to query the mobile terminal. Screen device capabilities, message format is as follows:

POST /Nokia/N61 HTTP/1.1

Host: profilerepository.oma.org Query Type: "Screen-Capacity"

Step 803: After receiving the request of the video streaming server, the user agent file database feeds back the device capability information of the current user terminal to the video streaming media server, that is, sends the response message to the video streaming media server through an HTTP response message. The format of the message is as follows:

HTTP/1.1 200 OK

QueryResult: " Screen—Capacity: physical size = 2.8, resolution = 240-320"

Where QueryResult: "Screen_Capacity: physical size = 2.8, resolution = 240_320" returns the device capability information of the mobile terminal, indicating that its physical screen size is 2.8 and the supported resolution is 240 X 320.

Step 804: The video streaming server selects, according to the information fed back by the user agent file server, a transcoded video and a transcoding resolution suitable for the user terminal to play from a series of transcoded video and transcoded caption bitmap groups. Subtitle bitmap group. Corresponding to the content in the format information provided above, in this embodiment, a transcoded video of 240*320 resolution and a transposed subtitle bitmap set corresponding to 2.8 inches and 240*320 resolution should be selected for transmission. The video streaming server transmits the selected video and the subtitle bitmap group to the user terminal in the same manner as the transmission in the step 703.

Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is a better implementation. . Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which is stored in a storage medium and includes a plurality of instructions for making The mobile device (which may be a cell phone, personal computer, media player, etc.) performs the methods described in various embodiments of the present invention. The storage medium referred to herein is, for example, a ROM/RAM, a magnetic disk, an optical disk, or the like.

It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and the modifications of the invention The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims

Claim

A method for subtitle matching processing, the method comprising:

Receiving a play request from a user terminal;

Obtaining device capability information of the user terminal according to the play request;

And providing, according to the device capability information, a transcoded caption bitmap group for the user terminal.

The method according to claim 1, wherein: the play request carries the device capability information of the user terminal, and the acquiring the device capability information of the user terminal according to the play request specifically includes: directly acquiring Device capability information in a play request from a user terminal; or

The playing request carries the user agent file database address information, and the acquiring the device capability information of the user terminal according to the playing request specifically includes: acquiring, according to the user agent file database address information, the user agent file database Device capability information of the user terminal.

The method according to claim 1 or 2, wherein the method further comprises: providing the user terminal with a transcoded video according to the device capability information.

The method according to claim 3, wherein the providing the transcoding video for the user terminal and providing the transcoded subtitle bitmap group for the terminal comprises:

The video data and the audio data and the subtitle bitmap group are separately packaged to generate three elementary streams, and the packet identifiers are respectively allocated to the three elementary streams in the program mapping table;

And transmitting the TS stream generated by multiplexing the three basic flows to the user terminal.

A method for manufacturing a subtitle bitmap group, the method comprising:

Obtaining a subtitle area from the video image, and acquiring time information and location information corresponding to the subtitle area; generating an original subtitle bitmap group according to the acquired subtitle area and the time information and location information; The original subtitle bitmap group is subjected to transcoding processing to form a transcoded subtitle bitmap group.

The method according to claim 5, wherein the obtaining the subtitle area from the video image, and the time information and the location information corresponding to the subtitle area, specifically include:

Detecting the range of the subtitle area and confirming the detected subtitle area; Positioning the confirmed subtitle area to obtain location information corresponding to the subtitle area;

Tracking the subtitle area that has been located to obtain time information corresponding to the subtitle area.

7. A device for making subtitles, characterized in that the device comprises:

a subtitle area obtaining module: configured to acquire a subtitle area in the video image and location information and time information of the subtitle area, and generate an original subtitle bitmap according to the subtitle area, and location information and time information of the subtitle area. Group

Transcoding module: configured to perform transcoding processing on the original subtitle bitmap group according to a resolution to form a transcoded subtitle bitmap group.

The captioning device according to claim 7, wherein the caption region obtaining module further comprises: a caption region rough detecting unit: configured to determine a range of the caption region;

Subtitle area confirmation unit: for confirming the subtitle area;

a subtitle area locating unit: configured to locate a position of the confirmed subtitle area, and obtain location information of the subtitle area;

a subtitle area tracking unit: time information for acquiring the subtitle area;

a subtitle bitmap group forming unit: generating an original subtitle bitmap group according to the obtained subtitle region, and time information and location information of the subtitle region;

The transcoding module further includes:

Video transcoding unit: transcoding the original video according to the resolution to form a transcoded video;

Subtitle bitmap transcoding unit: Transcodes the original subtitle bitmap group according to the resolution to form a transcoded subtitle bitmap group.

9. A device for subtitle matching processing, the device comprising:

a receiving module: receiving a play request from the user terminal, where the play request carries device capability information of the user terminal or user agent file database address information that stores device capability information of the user terminal;

The sending module is configured to provide the user terminal with a transcoded caption bitmap group according to the device capability information of the user terminal.

10. Apparatus according to claim 9 wherein: The sending module is further configured to provide a transcoded video for the terminal according to the device capability information of the user terminal.

11. A system for subtitle matching processing, the system comprising:

a caption making device, configured to generate a transcoded caption bitmap group according to a resolution;

a caption matching processing device, configured to receive a play request from the user terminal, where the play request carries device capability information of the user terminal, and provides a transcoded caption bitmap for the user terminal according to the device capability information of the user terminal group.

12. The system of claim 11, wherein the system further comprises:

User agent file database: device capability information for storing user terminals;

The caption matching processing device is further configured to receive a play request for carrying the user agent file database address information from the user terminal, and obtain device capability information of the user terminal according to the user agent file database address information, further according to the user The device capability information of the terminal provides a group of transcoded subtitle bitmaps for the user terminal.

13. The system of claim 11 or 12, wherein:

The caption making device is further configured to create a transcoded video according to the resolution;

The caption matching processing device is further configured to provide the user terminal with transcoded video according to the device capability information of the terminal.

A user terminal, the user terminal comprising:

a play requesting module: generating a play request, where the play request includes device capability information of the user terminal or address information of a user agent file database;

Video playback module: play video, and transcode subtitle bitmap group;

Communication module: used to send a play request; receive video, and transcode caption bitmap group.