WO2015143114A1

WO2015143114A1 - Sign language translation apparatus with smart glasses as display featuring a camera and optionally a microphone

Info

Publication number: WO2015143114A1
Application number: PCT/US2015/021395
Authority: WO
Inventors: Stephane Haziza
Original assignee: Thomson Licensing
Priority date: 2014-03-21
Filing date: 2015-03-19
Publication date: 2015-09-24

Abstract

Methods and apparatus for a system providing translation of sign language symbols to set of smart glasses without a translator are described. The system captures at least one image and processes it to determine if it closely resembles one of the stored versions in a set of sign language symbols. If so, the system wireless transmits the translation of that sign language symbol to at least one set of smart glasses. In one exemplary embodiment, the set of smart glasses is a pair of eyeglasses. In another exemplary embodiment, the system operates in two directions, so that in addition to being able to display a translation of a sign language symbol for a non-signer, the system can recognize words from a non-signer and either translate it to a sign language symbol or text for the hearing-impaired person to understand on another set of smart glasses, such as a pair of smart glasses.

Description

SIGN LANGUAGE TRANSLATION APPARATUS WITH SMART GLASSES AS DISPLAY FEATURING A CAMERA AND OPTIONALLY A MICROPHONE

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Serial No. 61 /968,461 , filed March 21 , 2014, which is incorporated by reference herein in its entirety.

TECHNICAL FIELD

The present principles relate to apparatus and methods for

communication with sign language users.

BACKGROUND OF THE INVENTION

Sign language communication exists to enable deaf people and those with speech disabilities to communicate with others. There is not one universal sign language that can be used in every country, although some simple form of international signing exists. It is estimated that there are about 137 different sign languages.

Sign language can be used without a translator when there is

communication among two or more deaf people, or if the communication is between a deaf person and one who knows sign language. However, often a deaf or speech-disabled person needs to communicate with others who do not know sign language. In these circumstances, a translator is needed who knows sign language and can speak to the non-sign language people. The translator must know the particular sign language that the deaf/speech-impaired person is using, but a translator is not always available or convenient.

A need exists for individuals who do not know sign language to be able to communicate with signers without requiring a translator. SUMMARY OF THE INVENTION

These and other drawbacks and disadvantages of the prior art are addressed by the present principles, which are directed to methods and apparatus for enabling communication between sign language users and non- signers. According to an aspect of the present principles, there are provided several methods and several apparatus for enabling communication between sign language users and non-signers.

A first method for enabling communication between sign language users and non-signers is comprised of steps for capturing at least one image comprising sign language information and processing the at least one image to determine whether data comprising the sign language information corresponds to stored data representing a sign language translation. The method further comprises packing at least one sign language translation into a wireless transmission format, wirelessly transmitting the at least one packed sign language translation to a set of smart glasses and displaying the at least one sign language translation on the set of smart glasses display.

According to another aspect of the present principles, there is provided a system for enabling communication between sign language users and non- signers. The system is comprised of a camera for capturing an image

comprising sign language information, a processing device that determines whether data comprising the sign language information corresponds to stored data representative of a sign language translation, and a formatter that packs at least one sign language translation into a wireless transmission format. The system is further comprised of a wireless transmitter that transmits the at least one packed sign language translation, and a display to show the sign language translation on the set of smart glasses.

According to another aspect of the present principles, a second method is directed to enabling communication from a non-sign language user to a sign- language user. The method comprises capturing audio information

corresponding to a voice from the non-sign language user, translating the audio information into corresponding visual information, and packing the visual information into a wireless format. The method further comprises transmitting the packed visual information to a set of smart glasses, and displaying the visual information on a display of the set of smart glasses.

According to another aspect of the present principles, an apparatus is directed to additionally enabling communication from a non-sign language user to a sign-language user. The apparatus comprises circuitry that captures audio information corresponding to a voice from a non-sign language user, a processor that translates the audio information into corresponding visual information, and a second formatter that packs the visual information into a wireless format. The apparatus further comprises a wireless transmitter that transmits the packed visual information to a set of smart glasses that displays the visual information on the set of smart glasses.

These and other aspects, features and advantages of the present principles will become apparent from the following detailed description of exemplary

embodiments, which are to be read in connection with the accompanying

drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 shows one embodiment of a system for communicating sign language information under the present principles.

Figure 2 shows another embodiment of a system for communicating sign language information without a translator under the present principles.

Figure 3 shows one embodiment of a method for communicating sign language information under the present principles.

Figure 4 shows another embodiment of a system for two-way

communication of sign language information under the present principles.

Figures 5 and 5a show an embodiment of an apparatus for

communicating sign language information from a sign language user to a non- sign language user, and additional apparatus for communicating verbal information from a non-sign language user to a sign language user, respectively.

DETAILED DESCRIPTION OF THE INVENTION

The present principles are directed to a method and apparatus for enabling communication between sign language users and non-signers without requiring a translator.

For example, a deaf or speech-disabled person may work among a group of people who do not understand sign language. This person has a need to

communicate with his co-workers, but it is either too expensive or inconvenient to have a translator present at all times.

One embodiment of a system that solves the problem of communicating using sign language without a translator is shown in the conceptual diagram of Figure 1 . Figure 1 shows a captured image of a hand making a signing gesture for the number "eight." This captured image is compared against a stored set of image data representative of a set of gestures. If one of the stored images has data similar to the captured image, the system will generate data corresponding to the sign language translation. The data is then packed into a wireless transmission format and transmitted. The transmitted data is received by a set of smart glasses and the word(s) corresponding to the signed gesture appear on a display at the set of smart glasses. In Figure 1 , the set of smart glasses is shown as a pair of smart glasses.

One embodiment of this type of system that allows the deaf/speech-disabled person to "talk" to his co-workers is shown in Figure 2. The system 200 is comprised of a power processing device 210 and an embedded set of smart glasses 220. The power processing device can be implemented in a tablet, PC, phone, set top box, or similar device. It may also be configured as a portable system carried by a hearing impaired individual. Or, the power processing device can be integrated with the set of smart glasses. A camera 215 can be located either in the power processing device 210, or in the set of smart glasses 220, for example, or can be a standalone unit with wireless connectivity. In Figure 2, the camera 215 is integrated with the power processing device 210, which also includes a wireless transmitter.

The power processing device 210 captures at least one image of a signing person (the signer) and interprets the symbol or sign that they have given. If the camera 215 is located at the set of smart glasses, the power processing device receives the image from camera 215. The power processing device then can wirelessly transmit information representing word(s) or letters for the particular given sign to embedded set of smart glasses 220. Embedded set of smart glasses 220 can comprise smart glasses that receive the wirelessly transmitted information and, after appropriate unpacking of the transmitted information and processing, display a word or phrase on the lenses of the smart glasses. In this way, the signer's words are communicated to the non-signer without a translator.

Power processing device 210 can be configured to interpret more than a single sign language, and/or configured to translate the signs into a multitude of languages. In one embodiment, the power processing device can be configured to interpret one of a plurality of different sign languages and then to translate the signs into a multitude of languages. Wireless information comprising translations of the signs in a multitude of languages can be simultaneously transmitted from the power processing device to one or more set of smart glasses. Each particular set of smart glasses can be configured to receive a desired language that the user wants displayed on his smart glasses.

The format of the wireless information transmitted by the power processing device to the set of smart glasses can be one of several wireless transmission protocols, such as Bluetooth, Wi-Fi (based on the Institute of Electrical and

Electronics Engineers' (IEEE) 802.1 1 standards), RF4CE, or another similar protocol, for example.

In order for power processing device 210 to be able to interpret the signs given by the signer, the power processing device learns the symbols that will be given by the signer. A learning phase can be employed in which the signer gives the symbol of each of the symbols, signs, or letters that it wants the power processing device to be able to interpret in front of a camera. The camera captures each sign, and either stores the captured data or processes the data to store as a reference. The stored data can then be used during the normal operation of the system to determine a closest match with a given signed symbol. In an exemplary

embodiment, a dictionary with a minimum list of words is created in the learning phase.

Alternately, the power processing device 210 can be configured with generic symbols, signs or letters, so that any signer's actions can be interpreted. This approach is more general and leads to wider user and quicker configuration of the power processing device.

A learning phase that uses one individual to train the power processing device leads to more accurate interpretations when the same individual that trained the power processing device is the person signing during normal operation of the system.

The power processing device can be configured to interpret static images or a series of images, corresponding to the movements of a gesture. When interpreting static images, the power processing device can be triggered to capture an image externally, or to capture images in sequence. In either case, processing is

performed on the image(s) to determine if objects in the images, or sequence of images, match any of the stored symbols or gestures. If so, the corresponding word, phrase, or letters are wirelessly transmitted to one or more sets of smart glasses. Pattern or template matching can be used to determine whether an image is similar to a known, stored sign language symbol. When interpreting a gesture, a plurality of images is analyzed to determine whether each of the images comprising the gesture is similar to the sequence of images comprising a known gesture.

In an alternate embodiment, a camera is located at the smart glasses so that the camera captures the sign gesture from the perspective of the non-signer.

One embodiment of a method 300 for communicating sign language information without a translator is shown in Figure 3. The method comprises a start block 301 which proceeds to block 310 for capturing at least one image comprising sign language information. Following block 310, control proceeds to block 320 for processing the at least one image to determine whether image data comprising the sign language information corresponds to stored data representative of a sign language translation. If so, control proceeds to block 330 for packing the sign language translation into a wireless transmission format. Following block 330, the method is further comprised of a block 340 for wirelessly transmitting the at least one packed sign language translation to a set of smart glasses. Following block 340, control proceeds to block 350 for displaying the sign language translation on the set of smart glasses display.

In one embodiment of the present system, the power processing device can translate the signs into several different languages. The languages are

simultaneously translated and transmitted, such that a given pair of smart glasses is either pre-configured for a certain language, or the user of each pair of glasses can select the language of translation that they prefer to receive in a selection process that pairs a language stream to one or more sets of smart glasses.

In another embodiment using the principles disclosed herein, the power processing device, instead of wirelessly transmitting the symbol translation to a set of smart glasses, can electronically speak the word or phrase corresponding to the translated sign language symbol.

Another embodiment of the described principles enables communication between a hearing/speech-impaired individual and a non-signer. In one embodiment of this system, the system works similar to the already described communication between the signer to the non-signer. But, communication from the non-signer back to the signer, the hearing/speech-impaired individual, is translated from voice to sign language symbols, or it can be from voice to text, for example. In this case, the signing person can have a set of smart glasses, such as smart glasses, or even a smartphone, or a tablet, to display the symbols or text information from the non- signing person. In this way, communication is enabled between a signer and a non- signer without a translator. As in the case with the previously described one-way communication arrangements, processing power can be incorporated as part of the signer's set of smart glasses. For example, his smart glasses can perform wireless reception and translation to display text or a sign language symbol to the user.

One embodiment showing both of the aforementioned communication systems 400 under the present principles is shown in Figure 4. Figure 4 shows a conversation between a signing person, on the right, and a non-signing person, on the left. Each person is wearing a set of smart glasses equipped with an embedded camera. Each of the smart glasses can have some level of processing power integrated with the glasses and the camera as well. The non-signing person is wearing a pair of smart glasses 420 with an embedded camera. The signing person is also wearing a pair of smart glasses 425 with an embedded camera. In this embodiment, there is also a power processing device 410 that can be implemented, for example, using a tablet, smartphone, laptop, or set-top box. The division of processing tasks between each of the set of smart glasses and the power processing device can be assigned using various configurations while still maintaining the present principles.

In an exemplary embodiment, the power processing device has been taught a dictionary of symbols, either by the signing person or with generic signs. A camera in the non-signing person's glasses 420 captures a sign being given by the signing person. The glasses can transmit this captured image to the power processing device 410 for translation to text, for example, and subsequent packing into a wireless format. The packed text can be sent back to the non-signing person's glasses 420 for processing of the information corresponding to the given sign that will appear on the non-signing person's glasses 420.

Communication from the non-signing person can be translated into either sign language or text to the signing person. Voice recognition can be implemented in the power processing device 410, or in one of the set of smart glasses 420 or 425 with an embedded microphone. When the non-signing person talks, their words are translated into corresponding text, or into corresponding sign language symbols, which are wirelessly transmitted to the lenses of the signing person's set of smart glasses 425. For example, the power processing device 410 can be configured with a microphone and have voice recognition capability. When it captures the non- signing person's words, it can translate them into sign language symbols or text and then wirelessly transmit the symbols or text to the signing person's glasses 425.

Other arrangements in which the resources for the capturing, processing and transmitting functionality are split among the two set of smart glasses 420, 425 and the power processing device 410 are conceivable under the present principles.

Therefore, the power processing device 410 can also have wireless transmit and receive capability.

One embodiment of an apparatus 500 for communicating sign language information to at least one set of smart glasses is shown in Figure 5. The apparatus comprises a camera 510 that is in signal connectivity with a processing device 520. The camera captures at least one image that comprises sign language information. The processing device 520 determines whether the data of the captured image(s) resembles a stored symbol. If so, a corresponding translation of the stored symbol is sent to formatter 530 whose input is in signal connectivity with the output of processing device 520. Formatter 530 packs the translation information into a format suitable for wireless transmission. The output of formatter 530 is in signal

connectivity with the input of wireless transmitter 540. Wireless transmitter 540 sends a wireless transmission comprising the translation information. Processing device 520, formatter 530, and wireless transmitter 540 can be implemented in one or more devices. The wireless transmission from wireless transmitter 540 is received by set of smart glasses 550. Set of smart glasses 550 incorporates functionality to unpack the information in the wireless transmission and display its contents.

Processing device 520, formatter 530, and wireless transmitter 540 can be implemented in one or more devices.

Figure 5a shows an embodiment of an apparatus 501 for communication between a sign language user and a non-sign language user. Audio information is captured using a microphone input to audio circuit 560. The output of audio circuit 560 is in signal connectivity with the input of processor 570. Processor 570 can perform speech recognition to translate captured audio information from verbal information to another format suitable for a hearing/speech-disabled person. This format can be text or a rendered sign language symbol. The output of processor 570 is in signal connectivity with the input of formatter 580. Formatter 580 packs the text or sign language symbol information into a format suitable for wireless transmission. The output of formatter 580 is in signal connectivity with the input of wireless transmitter 590. Wireless transmitter 590 sends a wireless transmission comprising the text or sign language information. Audio circuit 560, processing device 570, formatter 580, and wireless transmitter 590 can be implemented in one or more devices. The wirelessly transmitted text or sign language transmission from wireless transmitter 590 is received by set of smart glasses 595. Set of smart glasses 595 incorporates functionality to unpack the information in the wireless transmission and display its contents.

Audio circuit 560, processing device 570, formatter 580, and wireless transmitter 590 can be implemented in one or more devices. If this functionality is implemented with the processing device, formatter and wireless transmitter of Figure 5 within a single device, it can be configured as a portable sign-language interface which, together with two sets of smart glasses, would be an interface for a

hearing/speech-disabled person to carry for the purpose of conversing with others who do not know sign language.

One or more implementations having particular features and aspects of the presently preferred embodiments of the principles have been provided. However, features and aspects of described implementations can also be adapted for other implementations. For example, these implementations and features can be used in the context of other video devices or systems. The implementations and features need not be used in a standard.

Reference in the specification to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an

implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.

The implementations described herein can be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed can also be implemented in other forms (for example, an apparatus or computer software program). An apparatus can be implemented in, for example, appropriate hardware, software, and firmware. The methods can be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.

Implementations of the various processes and features described herein can be embodied in a variety of different equipment or applications. Examples of such equipment include a web server, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment can be mobile and even installed in a mobile vehicle.

Additionally, the methods can be implemented by instructions being

performed by a processor, and such instructions (and/or data values produced by an implementation) can be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact disc, a random access memory ("RAM"), or a readonly memory ("ROM"). The instructions can form an application program tangibly embodied on a processor-readable medium. Instructions can be, for example, in hardware, firmware, software, or a combination. Instructions can be found in, for example, an operating system, a separate application, or a combination of the two. A processor can be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium can store, in addition to or in lieu of

instructions, data values produced by an implementation.

As will be evident to one of skill in the art, implementations can use all or part of the approaches described herein. The implementations can include, for example, instructions for performing a method, or data produced by one of the described embodiments. A number of implementations have been described. Nevertheless, it will be understood that various modifications can be made. For example, elements of different implementations can be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes can be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this disclosure and are within the scope of these principles.

Claims

1 . A method for communicating sign language information to at least one set of smart glasses, comprising:

capturing at least one image comprising sign language information; processing the at least one image to determine whether data

comprising the sign language information corresponds to stored data

representative of a sign language translation, and if so,

packing at least one sign language translation into a wireless

transmission format;

wirelessly transmitting the at least one packed sign language

translation to the at least one set of smart glasses;

displaying the at least one sign language translation on the at least one set of smart glasses.

2. The method of Claim 1 , further comprising:

pairing the at least one set of smart glasses to at least one packed sign language translation,

wherein said displaying step is performed on the paired at least one set of smart glasses.

3. A method for communicating verbal information to at least one sign- language user, comprising:

capturing audio information corresponding to a voice from a non-sign language user;

translating the audio information into corresponding visual information; packing the visual information into a wireless format;

transmitting the packed visual information to at least one sign language user with smart glasses for displaying the visual information.

4. An apparatus for communicating sign language information to at least one set of smart glasses, comprising:

a camera for capturing an image comprising sign language information; a processing device that determines whether data comprising the sign language information corresponds to stored data representative of a sign language translation;

a formatter that packs at least one sign language translation into a wireless transmission format;

a wireless transmitter that transmits the at least one packed sign language translation to the at least one set of smart glasses for displaying the sign language translation.

5. The apparatus of Claim 4, further comprising:

circuitry to pair the at least one set of smart glasses to at least one packed sign language translation,

wherein said paired at least one set of smart glasses displays the sign language translation.

6. An apparatus for communicating verbal information to at least one sign-language user, comprising:

a circuit that captures audio information corresponding to a voice from a non-sign language user;

a processor that translates the audio information into corresponding visual information;

a formatter that packs the visual information into a wireless format; a wireless transmitter that transmits the packed visual information to at least one sign language user with smart glasses for displaying the visual information.