By Doug Peterson
March 2006
Tom Huang is hoping to go one step further than Tom Cruise.
In the movie, Minority Report, Cruise played the role of a futuristic detective who could manipulate objects on a massive, virtual-reality screen by simply moving his hands -- like a conductor leading an orchestra.
However, Cruise’s character had to wear a glove, which interfaced with the computer. Huang, the William L. Everitt Professor of Electrical Engineering at the University of Illinois, is aiming to make it possible for computers to recognize hand movements without the aid of an electronic glove.
It’s this kind of research that has earned Huang another in a long line of honors. He recently flew to San Jose to receive the award for “Electronic Imaging Scientist of the Year” from the International Society for Optical Engineering and the Society for Imaging Science and Technology.
Huang came to the U of I in 1980, which is about the time that he began to work in computer vision. Today, much of his research falls under one of two categories -- the human/computer interface and data retrieval.
In his human/computer interface work, Huang’s team has taken some of the first steps in enabling a computer to recognize hand gestures. Their system can recognize some basic hand motions, as long as the hand is positioned squarely in front of the camera.
Huang’s hand-tracking algorithm allows people to perform some simple tasks, such as use their fingertips to move “virtual balls” across the computer screen and even play a simple game of “rock, paper and scissors.” But, as he put it, the system is “far from being able to recognize American Sign Language,” which is an ultimate goal.
Among the tasks ahead is adding a second camera to provide three-dimensional imagery and enabling the computer to better recognize where the hand’s many joints are located.
In another project, Huang is quite literally putting the “face” into interface -- the interface between humans and computers. His team has developed an “avatar,” a synthetic “talking face” that can interact with the computer user.
The talking face may eventually be used in situations in which people deal with an automatic agent, such as at an ATM machine, Huang said. He cited AT & T research, which studied online shopping and concluded that customers will trust a company more if they can interact with a visual representation of a person, rather than just a block of text.
Huang’s team has also been bringing middle-school children to the lab to help test and refine a system for computer-aided learning. But this is unlike any computer-aided learning these students have ever encountered. He is designing a system that can talk with the students as they learn. It will even be able to spot when students are becoming bored or distracted.
If the student appears to be bored, the computer might pipe in by asking a question to revive interest. But for such a sophisticated exchange to happen, computers need to be able to recognize human emotion, which is exactly what Huang has been doing with their face-processing system.
They have been training the face-processing system to analyze the geometry of specific facial expressions and then determine whether the person is displaying one of six emotions -- happiness, sadness, surprise, anger, disgust or fear -- plus a neutral expression.
For Huang, the key to all of this work is “multi-modal” input. In other words, the computer relies on both visual and audio clues. Their speech-recognition system does not just depend on hearing alone; it can also read lips. This makes it much easier for the computer to recognize speech in a noisy environment.
As he explained, when the computer is trying to understand the spoken word while working with a middle-school student in a noisy environment, it’s going to need all the help it can get.