WO2011025450A1

WO2011025450A1 - Methods and systems for visual interaction

Info

Publication number: WO2011025450A1
Application number: PCT/SE2010/050916
Authority: WO
Inventors: Per Carleberg; Torbjörn GUSTAFSSON
Original assignee: Xmreality Research Ab
Priority date: 2009-08-25
Filing date: 2010-08-25
Publication date: 2011-03-03

Abstract

The present document relates to systems and methods for remote visual interaction between at least one user and an expert. A method for remote visual interaction between an expert (200) and at least one user (100) is shown. The method is used in an interaction system comprising at least one workstation comprising a presentation device (101) and a user camera (102) for use by the user (100), wherein said workstation is operatively connected to a support station comprising an expert presentation device (201) and an expert camera unit (202) for use by the expert (200). The method comprises using the user camera (102) to capture the user's (100) field of vision in order to form a user image (505), presenting the user image (505) to the expert (200) by means of the expert presentation device (201), using the expert camera unit (202) to depict an object (502) at the support station as an expert object image (500), mixing together the expert object image (500) with the user image (505) to form a mixed image, and presenting the mixed image to the user (100) via the user presentation device (101). The expert camera unit (202) is used to depict the object (502) when this is at least partly located essentially between the expert camera unit and the expert presentation device. An image of the object is created, constituting a subset of the expert object image (500), and the mixed image is created based on the user image and the depiction of the object. Alternative equipment for the expert and the user is also shown.

Description

METHODS AND SYSTEMS FOR VISUAL INTERACTION

Technical field

The present document relates to systems and methods for remote visual interaction between, for example, a user and an expert. The user, who generally has less knowledge than the expert about a specific piece of equipment, carries on a visual and verbal discussion with the expert, who is located at a distance, as to how certain actions are to be performed.

Background

Within certain technical fields, certain "apparatus" has a tendency to become more and more complicated and complex. By the term apparatus, usually- some form of equipment, system or product of a technical nature is meant here. A biological being (construction) is also contained within the term "apparatus", such as a human being (e.g., a patient in a medical consultation) .

Apparatus most often requires, for its maintenance and repair, a person who is well trained and very familiar with how such apparatus functions and how it should be handled in various situations. The person should also have practical experience with the handling of the apparatus. Such a person, with the aforementioned skills, is designated here by the term "expert".

A person with more general knowledge about the apparatus is designated here by the term "user". The user can in many cases, using his general knowledge and experience, repair and maintain the apparatus. In some cases, however, the user's knowledge is not enough, and the user must then turn to the expert for guidance.

The apparatus can exist in several units and these can be geographically spread out. Most often, an apparatus is repaired and maintained by persons (users) who are situated at the actual location of the apparatus, or by persons (users) whom a firm or organization sends out to the actual location.

The expert, on the other hand, should be present in some way at a central location in a company or organization so that as many persons as possible can quickly obtain help from him or them. It can be both costly and impractical to send out an expert on different assignments in order to help a user. Through modern technology, however, it is possible to let the expert be virtually present at the user's location and to guide him visually and verbally.

There are several different variants of systems and methods, with various advantages and disadvantages, for enabling a visual and verbal interaction and communication between a user and an expert.

One example of such a known system and method, which has been shown at exhibitions and described in report form, consists of two items of headborne microdisplay equipment (presentation device) with an integrated video camera. The user wears the one item of equipment and the expert the other. The user's and expert's equipment are interconnected by means of computers and transmission channels. The expert's hand, arm, or some other object which the expert has placed on a work surface can be extracted from the image from the expert's video camera and superimposed in the image that is seen by the user's camera and describes what is in front of the user. The principle of the known system is shown in Fig. 1, where:

• 100 is a "user"

• 101 is a user's headborne microdisplay equipment

• 102 is a video camera that is integrated in a user' s presentation device

• 103 is a computer for a user to use

• 200 is an "expert"

• 201 is an expert's headborne microdisplay equipment

• 202 is a video camera that is integrated in an expert's presentation device • 203 is a computer for an expert to use

• 204 is a work surface

• 205 is an expert's hand and arm

• 300 is a transmission channel between a user and an expert.

User and expert also each have a unit for verbal communication, which includes a microphone and loudspeaker device. This unit is not shown in the figure.

The aforementioned system and solution has certain drawbacks, however:

1. Since both the user's and the expert's equipment is headborne, and thus moves in tandem with the head's movements, it can be difficult to spatially synchronize the user's and the expert's image in such a way that a visual interaction is practically possible. Thus, it can be difficult for the expert to point to a specific item of the apparatus in question, if the expert or the user or both of them are moving their head.

2. The transmission channel's delay can further make the visual interaction difficult, in the same way as with head movements.

3. The expert may be guiding several users at the same time, i.e., switching between different users, and is then forced to spend relatively long periods of time wearing the headborne equipment. The expert may then find his equipment inconvenient and fatiguing.

WO2009/036782 and EP1739642 Al show several methods and techniques for remote interaction. However, there is a need for further improvements. Summary

Consequently, one object of the present document is to provide systems and methods which facilitate - A - remote visual interaction between, for example, an expert and at least one user in an interaction system.

One particular object is to provide a method and system for visual interaction where the expert and the user can interact in as natural a way as possible.

The invention is defined by the independent patent claims. Embodiments are found in the dependent patent claims, the drawings, and the following description.

According to a first aspect, a method is provided for remote visual interaction between an expert and at least one user in an interaction system comprising: at least one workstation comprising a presentation device and a user camera for use by the user, wherein said workstation is operatively connected to a support station comprising an expert presentation device and an expert camera unit for use by the expert. The method comprises using the user camera to capture the user' s field of vision in order to form a user image, presenting the user image to the expert by means of the expert presentation device, using the expert camera unit to depict an object at the support station as an expert object image, mixing together the expert object image with the user image to form a mixed image, and presenting the mixed image to the user via the user presentation device. The method also comprises using the expert camera unit to depict the object when this is at least partly located essentially between the expert camera unit and the expert presentation device, creating an image of the object, constituting a subset of the expert object image, and creating the mixed image based on the user image and the depiction of the object .

By means of the method, the expert can give the user instructions in a natural and simple way by pointing directly in the image that the user is seeing.

The depicting of the object can be achieved by means of an object camera and a mask camera. The mask camera can be adapted to a wavelength range that differs from a wavelength range to which the object camera is adapeted.

The mask camera can be a near IR camera.

According to another aspect, a system is provided for remote visual interaction between an expert and at least one user in an interaction system. The system comprises at least one workstation comprising a presentation device and a user camera for use by the user, wherein said workstation is operatively connected to a support station comprising an expert presentation device and an expert camera unit for use by the expert, wherein the user camera is configured to capture the user' s field of vision, which is presented to the expert by means of the expert presentation device as a user image and the expert camera unit is configured to depict an object at the support station as an expert object image and mix together the expert object image with the user image and present these to the user via the user presentation device as a mixed image. The expert camera unit is turned toward the expert presentation device in order to depict the object when this is at least partly located in a field of vision between the expert camera unit and the expert presentation device. The expert camera unit is arranged to achieve a depiction of the object, constituting a subset of the expert object image. The expert camera unit is arranged to achievel the mixed image based on the user image and the depiction of the object.

According to a third aspect, a system is provided for remote visual interaction between an expert and at least one user in an interaction system. The system comprises at least one workstation comprising a presentation device and a user camera for use by the user, wherein said workstation is operatively connected to a support station comprising an expert presentation device and an expert camera unit for use by the expert, wherein the user camera is configured to capture the user's field of vision, which is presented to the expert by means of the expert presentation device as a user image and the expert camera unit is configured to depict an object at the support station as an expert object image and mix together the expert object image with the user image and present these to the user via the user presentation device as a mixed image.

This system is characterized in that images in remote visual interaction between an expert and at least one user are synchronized by the system comprising a user computer designed to define a fixed point in the user image, an expert computer designed to define a fixed point in the expert object image, and a synchronizing unit designed to synchronize the user image and the expert image by placing the user fixed point and the expert fixed point on top of each other in the mixed image in a predetermined way.

According to a fourth aspect, a method is provided in which images in remote visual interaction between an expert and at least one user are synchronized by the method comprising the steps of defining a fixed point in the user image, defining a fixed point in the expert object image, and synchronizing the user image and the expert object image by placing the user fixed point and the expert fixed point on top of each other in the mixed image in a predetermined way.

According to a fifth aspect, a mobile unit is provided, comprising a presentation device and a camera unit which is turned toward the presentation device in a working position in order to depict objects on or in front of the presentation device, a processing unit which is arranged to present a first image on the presentation device and to receive a second image from the camera unit, and a communication unit which is arranged to receive the first image from a remotely situated unit and send at least a portion of the second image to a remotely situated unit. The presentation device is arranged in a housing and the camera unit is moveable relative to the housing between the working position and a transport position.

By "mobile unit" is meant a computer of laptop type, an e-reader or the like.

The housing can be formed by the display screen of a laptop computer.

In this way, a mobile unit is provided that can function as a mobile and compact expert station in an interaction system.

The processing unit can be designed to mix together the first and second image in order to form a third image, wherein the third image is sent to the remotely- situated unit.

The camera unit, in the transport position, can be drawn into or toward the housing. For example, the camera unit can be received in a recess in the housing or otherwise conform to the shape of the housing so that the mobile unit receives an advantageous form from the transport perspective.

In the working position, the camera unit can protrude from the housing.

The camera unit, in the transport position, can be drawn into or toward an intermediate housing connected to the housing. Such a housing can be, for example, a keyboard component of a laptop computer.

In the working position, the camera unit can protrude from the intermediate housing.

The camera unit can comprise an object camera and a mask camera, as described below.

The portable unit can comprise a regenerable power source, such as a rechargeable or replaceable battery. The portable unit can have a mass of less than around 20 kg, less than around 10 kg, less than around 5 kg or less than around 1 kg.

According to a sixth aspect, a station is provided for remote visual interaction with a second station, comprising a first presentation device, arranged to present a first image, a camera unit which is arranged to depict objects on or in front of the first presentation device as a second image, a second presentation device, arranged to present a third image, and a processing unit which is arranged to form the third image by mixing together the first image and the second image. The first and second presentation devices are placed adjacent to each other, and the second presentation device has a greater angle to a horizontal plane than does the first presentation device.

By "adjacent to" is meant that the presentation devices are arranged in immediate proximity to each other, either integrated with each other or in the form of separate units. For example, the presentation devices can be arranged in line with each other and with a definite position for the user.

The first presentation device can make an angle of less than around 45 degrees with the horizontal plane, less than 30 degrees with the horizontal plane, or less than around 15 degrees with the horizontal plane.

The second presentation device can make an angle of more than around 45 degrees with the horizontal plane, more than around 60 degrees with the horizontal plane, or more than around 75 degrees with the horizontal plane.

The camera unit can comprise an object camera and a mask camera. The station can be arranged to receive the first image from the second station. The station can be arranged to send the third image to the second station.

According to a seventh aspect, a mobile communication unit is provided, comprising a processing unit, a communication unit for wireless communication, a presentation device comprising a microdisplay and a magnifying optics, and a camera unit, which faces in the same direction as the presentation device. The processing unit is arranged to receive a first image from the camera unit and a second image via the communication unit, to mix together the first and second image in order to form a third image, and to present the third image to the user via the presentation device.

By "mobile communication unit" is meant a cell phone, "smart phone", communication radio or the like.

At least one of the processing unit and the communication unit can be arranged at least partly in a first housing, and the presentation unit can be arranged at least partly in a second housing, which can move relative to the first housing between a resting position and a working position.

The second housing can fold out, spread out, or be pulled out relative to the first housing.

The camera unit can be arranged at least partly in the second housing.

The camera unit can comprise an object camera and a mask camera.

According to an eighth aspect, a system is provided for remote visual interaction between an expert and at least one user, comprising: a support station, which consists of a portable unit as described above or a station as described above, and a workstation comprising a user presentation device and a user camera unit, which is arranged to depict at least one portion of a user' s field of vision, wherein the user camera is arranged to generate the first image and the user presentation device is arranged to present the second image.

The workstation can comprise a mobile communication unit as described above.

The workstation can be arranged to define a fixed point in a user image from the workstation' s camera unit and the support station can be set up to define a fixed point in an expert object image from the support station's camera unit; and a synchronizing unit is set up to synchronize the user image and the expert image by placing the user fixed point and the expert fixed point on top of each other in a mixed image in a predetermined way.

Brief description of the drawings

In the drawings, the same reference numbers indicate the same elements throughout the various figures and:

Fig. 1 shows an example of a previously known system for remote interaction;

Figs. 2a-2e show relevant parts of the user's equipment;

Figs. 3-9 show relevant parts of the expert's equipment in various configurations;

Fig. 10 shows relevant parts of the method for visual interaction;

Figs. 11-12 show relevant parts of the method for spatial synchronization;

Fig. 13 shows examples of image compression;

Fig. 14 is a flow chart showing the steps of the method;

Figs. 15a-15b are schematic images of an item of expert equipment.

Fig. 16 is a schematic image of another item of expert equipment .

Fig. 17 is a schematic image of an item of user equipment .

Description of embodiments The document can roughly be divided into the following parts:

1. The user's equipment

2. The expert's equipment and method for extraction of an object

3. Method for visual interaction

4. Method for spatial synchronization

5. Method for compression of transmitted image data. The various parts are described in the text below. 1. The user's equipment The user makes use of:

• an item of headborne (helmetborne) or handheld equipment consisting of a visual presentation device, a camera unit and a fitting for attachment to the head or helmet, or handle. Note, however, that the user should be able to remove and set aside this equipment, yet secure it in some way in front of him so that the user can still receive a visual experience at the same time as the user is performing some particular work.

• a computer, or any equipment with computer-like calculating capacity

• a transmission channel of some kind

• an interactive tool

• a verbal communication unit.

The presentation unit can be, for example:

• A microdisplay system with magnifying optics, to be used close to the eye

• A projector

• A display (monitor screen) of the type found in computers, cell phones, etc.

The camera unit can be, for example:

1. A video camera sensitive in the visual range 2. A video camera sensitive in other ranges, such as near IR (infrared) , thermal IR, or UV (ultraviolet)

3. Consisting of several different types of video cameras . The interactive tool can be, for example:

1. A set of buttons (mechanical control by using a part of the body)

2. Voice control unit 3. Visual control unit

4. Gesture control unit

5. Consisting of a combination of different control variants

6. A physical unit, such as a block or plate, with patterns thereon which a camera can detect. Various types of virtual objects can be coupled into the pattern, such as 3D objects, pictures, text, etc. These virtual objects can be mixed into the image of the user and/or expert.

The verbal communication unit can be, for example:

1. Traditional telephone system

2. Cell phone system

3. Based on computer/Internet, such as "Voice-IP".

It is desirable to synchronize the image and sound transmission with each other, for example, by sending image and sound data in the same packet, or by providing the sound and image frames with a time stamp and/or serial number.

Synchronization can be accomplished, for example, in the same way as for video chat.

Relevant parts of the user's equipment, in various situations, are described in Fig. 2, where:

• 100 is a "user"

• 101 is a user's headborne presentation device (microdisplay system)

• 102 is a video camera that is integrated in a user' s presentation device

• 104 is a fitting for attaching the presentation device to the head or a helmet

• 105 is a user' s handheld presentation device (microdisplay system)

• 106 is a fitting for a handle • 107 shows a user's handheld presentation device in the form of a monitor screen of the type found in computers, cell phones, etc.

• 108 shows a user' s presentation device in the form of a headborne video projector

• 109 shows a user's presentation device in the form of a handheld video projector.

The user also has a unit for verbal communication, which includes a microphone and loudspeaker device. This unit is not shown in Fig. 2.

2. The expert's equipment and method for extraction of an object

The expert sits or stands at a device which can resemble a permanent installation, but it should also be able to be mobile. The difference between this equipment and the user's equipment is that, for the most part, the expert's equipment is not carried by the expert in so obvious a way as the user does.

The expert's device consists of:

• a work surface on which an image is projected and on whose surface (or above it) the expert's hand/arm and/or objects can be placed or moved around

• a computer, or any equipment with computer-like calculating capacity

• a transmission channel of some kind

• an interactive tool, which can be, for example:

I. A set of buttons (mechanical control by using a part of the body)

II. Voice control unit

III. Visual control unit

IV. Gesture control unit

V. Consisting of a combination of different control variants

VI. A physical unit, such as a block or plate, with patterns thereon which a camera can detect. Various types of virtual objects can be coupled into the pattern, such as 3D objects, pictures, text, etc. These virtual objects can be mixed into the image which the expert sees, and so the expert can for example interact and point to these virtual objects as if they were real objects. One could say that the "mask camera" and "object camera" see these virtual objects virtually (see below) .

• a verbal communication unit

• a video camera, "object camera", placed above the work surface and registering (filming) the objects which are located on/above the work surface, for example, a hand

• another video camera, "mask camera", whose task is to mask out the object which is located on the work surface from the image being projected onto the work surface. Note that the object camera can also function as the mask camera. The mask camera can be placed above or below the work surface.

• an image generating unit that "projects", actually or virtually, an image onto the work surface. This unit can be designed, for example, according to the following proposal:

I. A horizontal monitor screen (such as a flat TV or flat computer monitor screen of a relatively large kind) , on which a transparent and durable glass or plastic sheet is placed. An optical filter can be placed between the monitor screen and the sheet. The filter can be transparent (for the most part) to visual radiation (it lets through the image on the screen so that the eye can perceive it) , but opaque (for the most part) to radiation other than visual, such as near IR

(infrared) radiation or UV radiation. The filter could also be of the type that blocks certain intervals within the visible wavelength band.

When using a monitor screen of LED type, the filter is not needed, since a monitor screen of LED type gives off very little radiation in the near IR region. II. A "back projection" surface (cloth + sheet (glass/plastic) ) on which a projector projects an image from below. A filter can be placed in front of the projector's lens that only lets through visual radiation (for the most part) . The filter could also be of the type that blocks certain intervals within the visible wavelength band.

III. A sheet (white or with certain patterns) above which a projector is placed ("front projection"). A filter can be placed in front of the projector's lens that only lets through visual radiation (for the most part) . The filter could also be of the type that blocks certain intervals within the visible wavelength band.

IV. A flat monitor screen that is placed between the expert's head and the expert's work surface.

V. A microdisplay system which is placed between the expert's head and the expert's work surface.

Relevant parts of the expert's equipment, in various situations, are described in Figs. 3-9, where:

• 200 is an "expert"

• 204 is a work surface

• 206 is a horizontal monitor screen (such as a flat TV or flat computer monitor screen of a relatively large size)

• 207 is a video camera, "object camera"

• 208 is another video camera, "mask camera"

• 209 is an optical beam splitter, such as a "hot mirror" or a "cold mirror" (beam splitters with different reflecting and transmitting properties depending on wavelength interval)

• 210 is an optical filter, in front of the mask camera, which blocks certain wavelengths, such as the visual wavelengths

• 211 is a video projector

• 212 is an optical filter, in front of the video projector, which blocks certain wavelengths, such as near IR

• 213 is a beam splitter for visual wavelengths • 214 is a cloth/sheet for "back" or "front projection"

• 215 is a cloth/sheet acting as an optical filter which blocks certain wavelengths, such as near IR

• 216 is a video screen (computer screen) , of flat or other configuration

• 217 is a microdisplay with magnifying optics

• 218 is a mechanical attachment device, such as a mechanical arm.

Note that alternative solutions are shown in the dashed circles in Figs. 3-6. The expert also has a unit for verbal communication that includes a microphone and loudspeaker device. This unit is not shown in Figs. 3- 9. Also note that in Figs. 7-9 the expert sees an image similar to the user, i.e., his own object mixed into the image that the expert sees.

Each of Figs. 3-9 is described below.

Description of Fig. 3: the expert, 200, stands or sits in front of a work surface, 204, on which the expert can place objects, such as a hand or a tool. Under the work surface there is a flat monitor screen, 206, which shows the user's view, among other things. On top of the monitor screen is an optical filter, 215, which blocks near IR radiation. At a distance of 50-100 cm above the work surface are placed two cameras, the object camera 207 and the mask camera 208. In front of the mask camera is placed an optical filter, 210, which blocks visual radiation but lets through near IR radiation. In the left-hand dashed ring of the figure, object camera and mask camera are arranged so that they divide the optical axis by a beam splitter, a "hot mirror" which is transparent to visual radiation and reflective to near IR radiation. The right-hand dashed ring of the figure shows an alternative solution with object and mask camera, where these are placed alongside each other. When the expert places some object on the work surface, the mask camera extracts this object out against the known (in this case) background on the work surface. The mask camera is not disturbed by the variable image from the monitor screen, since the image does not reach the mask camera through the optical filter. On the surfaces in the image that the mask camera masks out are then placed corresponding surfaces from the image from the object camera, and in this way one gets a new image with a known and well-defined background and the object in question. This image is sent to the user, whose computer can extract out the object in question, and then mix this object into the user's image or view.

Description of Fig. 4: the expert, 200, stands or sits in front of a work surface, 204, on which the expert can place objects, such as a hand or a tool. Under the work surface there is a "film cloth" or "film sheet", 214, with "back projection" properties. A projector, 211, projects an image on the "film cloth". An optical filter, 212, which is placed in front of the projector's lens, blocks near IR radiation from the projector, if such happens to be present. At a distance of 50-100 cm above the work surface are placed two cameras, the object camera 207 and the mask camera 208. In front of the mask camera is placed an optical filter, 210, which blocks visual radiation but lets through near IR radiation. In the left-hand dashed ring of the figure, object camera and mask camera are arranged so that they divide the optical axis by a beam splitter, a "hot mirror" which is transparent to visual radiation and reflective to near IR radiation. The right-hand dashed ring of the figure shows an alternative solution with object and mask camera, where these are placed alongside each other. When the expert places some object on the work surface, the mask camera extracts this object out against the known (in this case) background on the work surface. The mask camera is not disturbed by the variable image from the monitor screen, since the image does not reach the mask camera through the optical filter. On the surfaces in the image that the mask camera masks out are then placed corresponding surfaces from the image from the object camera, and in this way one gets a new image with a known and well-defined background and the object in question. This image is sent to the user, whose computer can extract out the object in question, and then mix this object into the user's image or view.

Description of Fig. 5: the expert, 200, stands or sits in front of a work surface, 204, on which the expert can place objects, such as a hand or a tool. Integrated in the work surface there is a "film cloth" or "film sheet", 214, with "front projection" properties. A projector, 211, which is placed at a distance of 50-100 cm, projects an image on the "film cloth". An optical filter, 212, which is placed in front of the projector's lens, blocks near IR radiation from the projector, if such happens to be present. The object camera, 207, is placed near the projector. In the left-hand dashed ring of the figure, object camera and projector are arranged so that they divide the optical axis by a beam splitter, 213. The right-hand dashed ring of the figure shows an alternative solution with object camera and projector, where these are placed alongside each other.

The mask camera is placed under the work surface at a distance of 25-100 cm. In front of the mask camera is placed an optical filter, 210, which blocks visual radiation but lets through near IR radiation. When the expert places some object on the work surface, the mask camera extracts this object out against the known (in this case) background on the work surface. The mask camera is not disturbed by the variable image from the monitor screen, since the image does not reach the mask camera through the optical filters. On the surfaces in the image that the mask camera masks out are then placed corresponding surfaces from the image from the object camera, and in this way one gets a new image with a known and well-defined background and the object in question. This image is sent to the user, whose computer can extract out the object in question, and then mix this object into the user's image or view.

Description of Fig. 6: the expert, 200, stands or sits in front of a work surface, 204, on which the expert can place objects, such as a hand or a tool. Integrated in the work surface there is a "film cloth" or "film sheet", 214, with "front projection" properties. A projector, 211, which is placed at a distance of 50-100 cm, projects an image on the "film cloth". An optical filter, 212, which is placed in front of the projector's lens, blocks near IR radiation from the projector, if such happens to be present. The object camera, 207, and the mask camera, 208, are placed near the projector. In the left dashed ring of the figure, object camera and mask camera are arranged so that they divide the optical axis by a beam splitter, a "hot mirror" which is transparent to visual radiation and reflective to near IR radiation. The right dashed ring of the figure shows an alternative solution with object camera and mask camera, where these are placed alongside each other. In front of the mask camera is placed an optical filter, 210, which blocks visual radiation but lets through near IR radiation. When the expert places some object on the work surface, the mask camera extracts this object out against the known (in this case) background on the work surface. The mask camera is not disturbed by the variable image from the monitor screen, since the image does not reach the mask camera through the optical filters. On the surfaces in the image that the mask camera masks out are then placed corresponding surfaces from the image from the object camera, and in this way one gets a new image with a known and well-defined background and the object in question. This image is sent to the user, whose computer can extract out the object in question, and then mix this object into the user's image or view. Alternatively, the expert camera can first take an image of the background against which the expert is working and thereafter mask out future objects, such as the expert's hand or a marker.

Description of Fig. 7: the expert, 200, stands or sits in front of a work surface, 204, on which the expert can place objects, such as a hand or a tool. A monitor screen, 216, is placed between the expert's head and the work surface. Placed near or on this monitor screen is a camera which functions both as object camera and mask camera, 207/208. Monitor screen and camera are held in place by a mechanical device, 218. The work surface can have a color or pattern allowing an object to be masked out from the work surface. The work surface can be blue or green, for example, and the masking can then be performed by the "chroma-key" technique. In this way one gets a new image with a known and well-defined background and the object in question. This image is sent to the user, whose computer can extract out the object in question, and then mix this object into the user's image or view.

Description of Fig. 8: the expert, 200, stands or sits in front of a work surface, 204, on which the expert can place objects, such as a hand or a tool. A microdisplay system with magnifying optics, 217, is placed between the expert's head and the work surface. Placed near or on this monitor screen is a camera which functions both as object camera and mask camera, 207/208. Monitor screen and camera are held in place by a mechanical device, 218. The work surface can have a color or pattern allowing an object to be masked out from the work surface. The work surface can be blue or green, for example, and the masking can then be performed by the "chroma-key" technique. In this way one gets a new image with a known and well-defined background and the object in question. This image is sent to the user, whose computer can extract out the object in question, and then mix this object into the user's image or view.

Description of Fig. 9: the expert, 200, stands or sits in front of a work surface, 204, on which the expert can place objects, such as a hand or a tool. A display of the monitor screen type, 206/216, or projector-projected surface, 214, is placed to the side next to the expert. Placed above the work surface at a distance of 50-100 cm is a camera which functions both as object camera and mask camera, 207/208. The work surface can have a color or pattern allowing an object to be masked out from the work surface. The work surface can be blue or green, for example, and the masking can then be performed by the "chroma-key" technique. In this way one gets a new image with a known and well-defined background and the object in question. This image is sent to the user, whose computer can extract out the object in question, and then mix this object into the user's image or view.

3. Method for visual interaction

For the user and expert to feel that they are in the same "virtual room", a method for visual interaction must be used. The method is based on moving images being transmitted (and mixed) between user and expert, and vice versa. Such a method can function in the following way:

1. An image (image sequence) describing the user's field of vision, the "user image", is transmitted, on the one hand, to the user's computer (calculating unit) and, on the other hand, via a transmission channel to the expert's computer (calculating unit) .

2. The "user image" is shown to the expert by the expert's presentation device.

3. The expert places an object, such as a hand, on his work surface, and this object is extracted out with the aid of the mask camera and the object camera, and this object is thus described as a subset in the "expert object image". Note that the expert in this case obtains a mixed visual experience of an actual object being placed on a presentation device which shows the "user image".

4. The "expert object image" is transmitted via a transmission channel to the user's computer where the "expert object image" is superposed on (mixed with) the "user image" (the instantaneous one) . The image that has been mixed is here called the "mixed image".

5. The "mixed image" is shown to the user by the user's presentation device. Note that the "expert object image" (the sequence) is slightly delayed in relation to the "user image" (the sequence) .

Relevant parts of the method for visual interaction are described in Fig. 10 where:

• 300 is a transmission channel between a user and an expert

• 301 is a one-way transmission channel between a user and an expert

• 302 is a one-way transmission channel between an expert and a user

• 120 is a user's computer (calculating unit)

• 220 is an expert's computer (calculating unit) • 110 is a user's presentation device

• 121 is a user's equipment

• 221 is an expert's equipment

• 222 is an expert's presentation device/work surface

• 223 is an expert's mask camera/object camera

• 224 is an object that the expert provides

• 122 is a user's view/object.

4. Method for spatial synchronization

As has previously been mentioned, there can be a problem with the spatial synchronization between the user and the expert. The synchronization is disturbed, on the one hand, by the user's movements and, on the other hand, by the delay that the transmission channel gives rise to. A way of solving this synchronization problem is described below (see also the more detailed explanation of spatial synchronization given below after the itemized points):

1. The expert's presentation device is largely independent of how the expert moves his head. Thus, in the image that is presented in this presentation device, a fixed point, the "expert fixed point", can be obtained, for example at the center of the image. This can be predefined in terms of size (z position), position (x,y) and twist.

Alternatively, the expert's presentation device can be movable, for example in the form of a cell phone or other user-borne unit. In such a case, synchronization between the expert and the user can be achieved by defining the expert fixed point in the same way as is described below for the user fixed point.

2. The user in most cases moves his head, and the view recorded by the user's video camera is therefore changed. In the user's view, however, there can be image properties, for example an edge, a corner or a light point, which a system for "feature detection" can detect. Unless the conditions are such that certain image properties are too weak, the user can place his own "marker" describing a pattern that is to be detected. This marker can be secured, for example by magnet or tape.

3. By means of a command, the user and/or the expert can "freeze" an image from the user, from which a number of "features" are detected. The user's computer (system) and/or the expert's computer (system) can also automatically "freeze" and detect "features" according to a time plan.

4. These "features" are used as a basis for defining a fixed point, the "user fixed point", for example centrally in the image. The user fixed point is defined by its size (z position), position (x,y) and twist .

5. When the "expert object image" and the "user image" are to be spatially synchronized, the images are placed on top of each other such that "expert fixed point" and "user fixed point" coincide, provided of course that these fixed points are located in the instantaneous images.

6. For the expert, however, this synchronization can mean that parts of the user's image disappear. This could possibly be compensated by the expert's computer system "remembering" the environment of the instantaneous image from the user and thus filling out those parts that have disappeared with parts from the "memory", which of course are not updated in real time.

Spatial synchronization here means that, when the images are mixed together, the system takes into consideration how the views of the expert and of the user are synchronized in the (x,y) direction, in the z direction and in terms of twist. Regarding synchronization in the z direction, this means that an image has to be enlarged or reduced to permit complete spatial synchronization. Spatial synchronization also means that, when the images are mixed together, the system also takes into consideration twisting of the views and turns the images correctly, i.e. synchronizes them. The spatial synchronization can be defined with the aid of the "features" or "markers" that the system detects. These of course change in terms of size and mutual relationship when the user approaches or moves away from them or holds his head at an angle.

Relevant parts of the method for spatial synchronization are described in Figs. 11-12 where:

• 500 is the image that is recorded by the expert's object camera, the "expert object image"

• 501 is a centrally located fixed point, the "expert fixed point", in the "expert object image"

• 502 is an "object" in the "expert object image" • 505 is the image the user sees, the "user image"

• 506 shows detectable "features" in the "user image"

• 507 is a fixed point, the "user fixed point", the position of which in the image is calculated with respect to relevant "features"

• 508 is a unit that symbolizes spatial synchronization of the "user image" and the "expert image"

• 509 is the result of synchronization in the point above

• 510 is the image the expert sees after the synchronization

• 511 is a field in the expert's image that does not have continuous updating from the user. However, the field can be filled with image information from the "memory" where image information of the field can be stored as it was previously in the image

• 512 is the image the user sees after the synchronization

• 513 is a field in the user's image that does not have continuous updating from the expert. Description of Fig. 11: This figure shows how a synchronization proceeds when the user does not move. The expert fixed point, 501, and the user fixed point, 507, do not change position, size and twist. Therefore, the synchronization, 508, is simple and the two fixed points are placed correctly over each other, 509.

Description of Fig. 12: This figures shows how a synchronization proceeds when a user moves. The expert fixed point, 501, does not change position, size and twist. By contrast, the user fixed point, 507, which was defined according to Fig. 11, changes position, size and twist. Therefore, the synchronization, 508, is complicated, because the user image and the expert object image have to be adapted in such a way that the two fixed points are placed correctly over each other, 509.

The synchronization entails certain sequences as regards the images that the user and the expert have in their fields of vision. In the image the user sees, 512, certain surfaces from the expert, 513, are not continuously updated. In the image the expert sees, 510, certain surfaces from the expert, 511, are not continuously updated.

5. Method for compression of transmitted image data

There are many methods for compression of image data and sound data. The field is well developed with methods such as MPEG4 and JPEG2000. Other examples of compression algorithms that can be used include VP8 for image data and GSM, MP2, SPEEX, TrueSpeech and the like for sound data.

The image quality and sound quality may need to be adjusted depending on the transmission capacity (bit/time unit) and delay (time to transmit one bit) of the transmission channel that is used. The image quality is dependent on which compression method is used and on the degree of compression. The relationship between transmission capacity and image quality should in principle be kept constant and at such a level that stable interactivity is achieved, both as regards image and also sound.

If the delay on the transmission channel is shorter in time than the time it takes to transmit an image, the image quality can be allowed to deteriorate in order to maintain interactivity. Compression methods for high image quality normally produce a greater delay than other methods, since it takes a longer time for compression and decompression. However, by using special hardware solutions, the time for compression and decompression can be reduced.

Instead of transmitting the expert's "object" as a whole image, it should be possible to transmit only relevant parts of the image. For example, each row (or column) in the image should contain a number of start and stop values for relevant image data, i.e. contain data from some "object".

Figure 13 is a diagram showing an example of a compression method in which:

• 600 is a color camera that generates a "Bayern Pattern" signal (compressed color signal). In another embodiment, the camera could, for example, generate an RGB image or YUV image, in which case parts of the figure would be able to be omitted

• 601 is a module that converts the "Bayern Pattern" image into three images, an R image (red) , a G image (green) and a B image (blue)

• 602 shows three compression modules in which a respective image is compressed, e.g. with JPEG2000

• 603 is a transmission channel in which the compressed images are transmitted, for example one after another, i.e. sequentially,

• 604 shows three decompression modules in which the images are decompressed according to JPEG2000

• 605 is a module that combines the decompressed images into one color image

• 606 is a presentation device for the user/expert.

Note that the method described above can be used from expert to user, and vice versa.

According to a general embodiment shown in Figure 14, a method is made available for synchronizing images in remote visual interaction between an expert and at least one user in an interaction system comprising at least one work station comprising a presentation device and a camera for use by the user, said work station being operatively connected to a support station comprising a presentation device and a camera for use by the expert, the user camera being configured to capture the user's field of vision, which is presented to the expert via the expert presentation device as a user image, and the expert camera being configured to depict an object at the support station as an expert object image and mix the expert object image together with the user image and present these to the user via the user presentation device as a mixed image. The method comprises the steps of:

• taking the user image with the user camera (step 700)

• defining a fixed point in the user image (step 701)

• presenting the user image to the expert via the expert presentation device (step 702)

• taking an expert object image with the expert camera and masking out the object (step 703)

• defining a fixed point in the expert object image (step 704)

• synchronizing the user image and the expert object image by placing the user fixed point and the expert fixed point on top of each other in a predetermined manner as regards position, size and twist, so as to form a mixed image (step 705)

• presenting the mixed image to the user (step 706)

Figs. 15a and 15b show a variant of an expert station. The expert station can comprise a first monitor screen 801, on which a user image is presented to the expert and to which the expert points with a hand 803 or pointer device. The expert station can also be provided with a lighting source 804, which generates visible and/or near IR light, and a camera unit 805, which can comprise a visible IR and a near IR camera and can be designed in accordance with what has been described with reference to Figs. 3-6. The camera unit produces an image which can be mixed with the image shown on the first monitor screen, and the mixed image is shown on a second monitor screen 802, which is placed in a manner that gives an ergonomically advantageous viewing angle. The monitor screens 801, 802 can be placed on a common stand 806. The camera unit's near IR camera generates a mask, which is used to determine the position and shape of the hand 803. With the aid of the mask, those parts of an image from the visible IR camera of the camera unit corresponding to the hand can be extracted to form an image that shows only the expert's hand. This image can be mixed and combined with the image that is shown on the first monitor screen.

As an alternative to a visual image of the user's hand, a predefined "skin pattern" or a predefined image of a hand with the desired appearance can be placed on the masked area. In this case, the shape of the predefined hand can be adapted (twisted) to match the masked area.

Fig. 15b shows a marker 807 (a detail with predetermined shape or pattern) which the computer recognizes via the camera module. The computer "couples" the image or 3D object to the position and twist of the marker. The image and/or 3D object can be shown to the expert on the second monitor screen 802 and to the user in the latter 's presentation system.

Fig. 16 shows an expert station in the form of a laptop computer, which has a keyboard component 901 and a monitor screen part 902, and a stand 905 supporting a camera unit 906 that can be designed in the manner described with reference to Figs. 3-6. In the same way as in Figs. 15a-15b, the expert can point to the monitor screen, his hand 903 being extracted and mixed with the image shown on the monitor screen and being sent to a user located at a distance.

The camera unit 906 can comprise a light source for visible and/or near IR light.

Fig. 17 shows a variant of a user unit in the form of a cell phone or similar device, which can have a processing unit, user interface (at least loudspeaker and set of buttons/touch screen) and communication unit, and which has a first housing 1001, which accommodates at least one of the processing unit, the user interface and the communication unit. The user unit also comprises a second housing 1002, which is movable relative to the first housing 1001 and which accommodates a microdisplay 1005 with magnifying optics and a camera unit 1007, which is arranged in the same direction 1004 as the microdisplay and the user's eye 1006. The camera unit and the microdisplay can be designed in the manner described with reference to Figs. 1-2.

The first and second housings can pivot or slide relative to each other, such that the second housing can be moved between a (collapsed/retracted) rest position and a working position, in which the microdisplay is visible to the user. The user unit can be designed such that a loudspeaker is placed to be audible from one ear of the user, while at the same time the microdisplay is visible to one eye of the user. The pivoting/sliding between the housings can be such as to permit a certain degree of individual adaptation of the user unit, such that loudspeaker and display are positioned in the most advantageous way possible for a respective user.

It will be appreciated that a system of the kind that has been described above can be obtained with the aid of two mobile or headborne units, for example of the type shown in Fig. 17, with the aid of two stationary portable units, for example of the type shown in Fig. 16, or with the aid of two stationary units, as is shown at a number of places herein.

Any desired combinations of mobile, headborne, portable and stationary units are of course also conceivable.

Claims

1. A method for remote visual interaction between an expert (200) and at least one user (100) in an interaction system comprising:

at least one workstation comprising a presentation device (101) and a user camera (102) for use by the user (100), wherein said workstation is operatively connected to

a support station comprising an expert presentation device (201) and an expert camera unit (202) for use by the expert (200),

wherein the method comprises:

using the user camera (102) to capture the user's (100) field of vision in order to form a user image (505) ,

presenting the user image (505) to the expert (200) by means of the expert presentation device (201), using the expert camera unit (202) to depict an object (502) at the support station as an expert object image (500),

mixing together the expert object image (500) with the user image (505) to form a mixed image, and

presenting the mixed image to the user (100) via the user presentation device (101),

c h a r a c t e r i z e d in that

the expert camera unit (202) is used to depict the object (502) when this is at least partly located essentially between the expert camera unit and the expert presentation device,

an image of the object is created, constituting a subset of the expert object image (500), and

the mixed image is created based on the user image and the depiction of the object.

2. The method as claimed in claim 1, wherein the depicting of the object is achieved by means of an object camera and a mask camera.

3. The method as claimed in claim 2, wherein the mask camera is adapted to a wavelength range that differs from a wavelength range to which the object camera is adapted.

4. The method as claimed in claim 3, wherein the mask camera is a near IR camera.

5. A system for remote visual interaction between an expert (200) and at least one user (100) in an interaction system comprising:

wherein the user camera (102) is configured to capture the user's (100) field of vision, which is presented to the expert (200) by means of the expert presentation device (201) as a user image (505) and the expert camera unit (202) is configured to depict an object (502) at the support station as an expert object image (500) and mix together the expert object image (500) with the user image (505) and present these to the user (100) via the user presentation device (101) as a mixed image,

c h a r a c t e r i z e d in that

the expert camera unit (202) is turned toward the expert presentation device (201) in order to depict the object (502) when this is at least partly located in a field of vision between the expert camera unit and the expert presentation device,

the expert camera unit is arranged to achieve a depiction of the object, constituting a subset of the expert object image (500), and the expert camera unit is arranged to achieve the mixed image based on the user image and the depiction of the object.

6. A system for remote visual interaction between an expert (200) and at least one user (100) in an interaction system comprising:

wherein the user camera (102) is configured to capture the user's (100) field of vision, which is presented to the expert (200) by means of the expert presentation device (201) as a user image (505) and the expert camera unit (202) is configured to depict an object (502) at the support station as an expert object image (500) and mix together the expert object image

(500) with the user image (505) and present these to the user (100) via the user presentation device (101) as a mixed image,

c h a r a c t e r i z e d in that

the expert camera unit comprises:

an object camera (207), designed to register objects which are on or above the work surface, and

a mask camera (208), designed to form a mask in order to mask out at least one of said objects located on or above the work surface from the image shown on the work surface;

wherein the support station is arranged to mix together images of the object from the object camera and the mask camera to form the expert object image (500) .

7. A system for synchronization of images in remote visual interaction between an expert (200) and at least one user (100) in an interaction system comprising at least one workstation comprising a presentation device (101) and a camera (102) for use by the user (100), said workstation being operatively connected to a support station comprising a presentation device (201) and a camera (202) for use by the expert (200), the user camera (102) is configured to capture the user's (100) field of vision, which is presented to the expert (200) by means of the expert presentation device (201) as a user image (505) and the expert camera (202) is configured to depict an object (502) at the support station as an expert object image (500) and mix together the expert object image (500) with the user image (505), which is presented to the user (100) via the user presentation device (101) as a mixed image, c h a r a c t e r i z e d in that the system comprises:

- a user computer (120) designed to define a fixed point (507) in the user image (505);

- an expert computer (220) designed to define a fixed point (501) in the expert object image (500);

a synchronizing unit (508) designed to synchronize the user image (505) and the expert image (500) by placing the user fixed point (507) and the expert fixed point (501) on top of each other (509) in the mixed image in a predetermined way.

8. The system as claimed in claim 7, wherein the workstation equipment is either headborne or handheld.

9. The system as claimed in claim 7, wherein the support station equipment is either permanently installed or portable.

10. The system as claimed in claim 7, wherein the support station contains a work surface on which the user image is presented and which is arranged to receive said object.

11. The system as claimed in claim 10, wherein the expert presentation device is an image generating unit which produces an image on the work surface and is one of the following: a display screen; a projector set up to project the user image on the work surface from below; or a projector set up to project the user image on the work surface from above.

12. The system as claimed in claim 7, wherein the support station comprises a microphone and a loudspeaker making verbal communication between the expert and the user possible.

13. The system as claimed in claim 7, wherein the support station comprises an interaction tool, which is at least one of the following: a keyset; a voice control unit; a visual control unit; a gesture control unit; or a physical object which the expert camera is arranged to detect.

14. The system as claimed in claim 7, wherein the system further comprises at least one compression module and at least one decompression module placed at the workstation and/or the support station to compress and decompress, respectively, the user image and/or the expert object image.

15. A method for synchronization of images in remote visual interaction between an expert and at least one user in an interaction system comprising at least one workstation comprising a presentation device and a camera for use by the user, said workstation being operatively connected to a support station comprising a presentation device and a camera for use by the expert, the user camera being configured to capture the user' s field of vision, which is presented to the expert by means of the expert presentation device as a user image and the expert camera being configured to depict an object at the support station as an expert object image and mix together the expert object image with the user image, which is presented to the user via the user presentation device as a mixed image,

c h a r a c t e r i z e d in that the method comprises the steps of:

- defining a fixed point in the user image;

defining a fixed point in the expert object image ;

synchronizing the user image and the expert object image by placing the user fixed point and the expert fixed point on top of each other in the mixed image in a predetermined way.

16. The method as claimed in claim 15, wherein the method further comprises the step of compressing the user image and/or the expert object image before it is transmitted to the workstation and support station, respectively.

17. A mobile unit, comprising:

a presentation device and a camera unit which is turned toward the presentation device in a working position in order to depict objects on or in front of the presentation device,

a processing unit which is arranged to present a first image on the presentation device and to receive a second image from the camera unit, and

a communication unit which is arranged to receive the first image from a remotely situated unit and send at least a portion of the second image to a remotely situated unit,

c h a r a c t e r i z e d in that the presentation device is arranged in a housing and the camera unit being moveable relative to the housing between the working position and a transport position.

18. The mobile unit as claimed in claim 17, wherein the processing unit is designed to mix together the first and second image in order to form a third image, wherein the third image is sent to the remotely situated unit.

19. The mobile unit as claimed in claim 17 or 18, wherein the camera unit, in the transport position, is drawn into or toward the housing.

20. The mobile unit as claimed in any one of claims 17-19, wherein, in the working position, the camera unit protrudes from the housing.

21. The mobile unit as claimed in any one of claims 17 or 18, wherein the camera unit, in the transport position, is drawn into or toward an intermediate housing connected to the housing.

22. The mobile unit as claimed in claim 21, wherein, in the working position, the camera unit protrudes from the intermediate housing.

23. The mobile unit as claimed in any one of claims 17-22, wherein the camera unit comprises an object camera and a mask camera.

24. The mobile unit as claimed in any one of claims 17-23, wherein the portable unit comprises a regenerable power source.

25. The mobile unit as claimed in any one of claims 17-24, wherein the portable unit has a mass of less than around 20 kg, less than around 10 kg, less than around 5 kg or less than around 1 kg.

26. A station for remote visual interaction with a second station, comprising:

a first presentation device, arranged to present a first image;

a camera unit which is arranged to depict objects on or in front of the first presentation device as a second image;

a second presentation device, arranged to present a third image; and

a processing unit which is arranged to form the third image by mixing together the first image and the second image,

wherein said first and second presentation devices are placed adjacent to each other, and

the second presentation device has a greater angle to a horizontal plane than does the first presentation device.

27. The station as claimed in claim 26, wherein the first presentation device makes an angle of less than around 45 degrees with the horizontal plane, less than 30 degrees with the horizontal plane, or less than around 15 degrees with the horizontal plane.

28. The station as claimed in claim 26 or 27, wherein the second presentation device makes an angle of more than around 45 degrees with the horizontal plane, more than around 60 degrees with the horizontal plane, or more than around 75 degrees with the horizontal plane.

29. The station as claimed in any one of claims 26-28, wherein the camera unit comprises an object camera and a mask camera.

30. The station as claimed in any one of claims 26-29, wherein the station is arranged to receive the first image from the second station.

31. The station as claimed in any one of claims 26-30, wherein the station is arranged to send the third image to the second station.

32. A mobile communication unit, comprising:

a processing unit,

a communication unit for wireless communication, a presentation device comprising a microdisplay and a magnifying optics, and

a camera unit, which faces in the same direction as the presentation device,

wherein the processing unit is arranged to receive a first image from the camera unit and a second image via the communication unit, to mix together the first and second image in order to form a third image, and to present the third image to the user via the presentation device.

33. The mobile communication unit as claimed in claim 32, wherein at least one of the processing unit and the communication unit is arranged at least partly in a first housing, and wherein the presentation unit is arranged at least partly in a second housing, which can move relative to the first housing between a resting position and a working position.

34. The mobile communication unit as claimed in claim 33, wherein the second housing can fold out, spread out, or be pulled out relative to the first housing.

35. The mobile communication unit as claimed in claim 32 or 33, wherein the camera unit is arranged at least partly in the second housing.

36. The mobile communication unit as claimed in any one of claims 32-35, wherein the camera unit comprises an object camera and a mask camera.

37. A system for remote visual interaction between an expert and at least one user, comprising:

a support station, which consists of a portable unit as claimed in any one of claims 17-25, a station as claimed in any one of claims 26-31 or a mobile communication unit as claimed in any one of claims 32-

36, and

a workstation comprising a user presentation device and a user camera unit, which is arranged to depict at least one portion of a user' s field of vision,

wherein the user camera unit is arranged to generate the first image and the user presentation device is arranged to present the second image.

38. The system as claimed in claim 37, wherein the workstation comprises a mobile communication unit as claimed in any one of claims 32-36.

39. The system as claimed in either of claims 37 and 38, wherein the workstation is arranged to define a fixed point (507) in a user image (505); the support station is set up to define a fixed point (501) in an expert object image (500); and a synchronizing unit (508) is set up to synchronize the user image (505) and the expert image (500) by placing the user fixed point (507) and the expert fixed point (501) on top of each other (509) in a mixed image in a predetermined way.