Translation from one language to another is often like swapping variable values; you need something in the middle. That’s why translators were invented, but the opportunity for them to interject obscene comments made communication between people of different languages uneasy. And since my understanding of other languages is a bit rusty, and my group needed a project with a biomedical bent, we decided to translate to American English from American Sign Language. That’s right, you should be really excited right now.
It currently doesn't work because it's missing a user's hand.
Pretty simple premise, if you ask me. In fact, if it’s on wikipedia then everybody should already know about it. There’s two options to go about sign language translation, but both need to measure a person’s hand movements. This can be done visually or mechanically; because it’d be a bit lame to have to carry around video cameras and computers necessary to do the processing, my group went with a glove based system. By the by, ‘my group’ consisted of a few buds in CMU’s Biomedical Engineering Design Capstone class: Allen Ambulo, Andrew S.D. Tsai, Michelle Lin, Sherry Huang, and Eric Wideburg. We also had the awesome Professors Dr. Conrad Zapanta, and Dr. James Antaki.So, clearly, the idea of a sign language recognizing glove is not new, but two things have not been done–at least we didn’t find evidence of it.
1) With the abundance of iPods and other media devices, why can’t this device also make noise? And if it could make noise that corresponds to whatever is being signed by the user, that’d be extra impressive.
2) Nobody likes to spell, so why do all currently made gloves mainly focus on finger spelling. Damn, you’d have some strong hands if all you were able to do was finger spell… Why not include good gesture recognition? Wiimotes do it, and 3 year olds are better than me at playing Nintendo Wii.
Anyway, lets talk about implementation. I wont go into the significantly boring detail in this post; I’ll probably put up guides on specific aspects (the Recognition, the Sensors, etc.) of the project later on.
Like any good embedded system, this glove is merely a system of input and output, with some processing in the middle. Like a kind of mathematical system of equations sandwich. Yum. Input comes from the sensors or from user input. Output is the LCD screen and the tiny speaker I stole off one of those cards that sing at you (thanks Grandma for the birthday card! I really like it!).
Accelerometers, flex sensors, and my beautiful sewing.
Trackball for user input. Or to just click. Click click click click
There’s two things to look for in sign language: the movement of the hands and the position of the fingers. Thus, we’ve got an accelerometer and flex sensors. While some might see the limitations of just these sensors, I had a few work arounds for this initial proof-of-concept version. The trackball is the same kind as those found on Blackberrys, and I had an old ominous looking LCD screen (red on black, ooooh~) lying around.
All this plugs into an Arduino Mega, because it has a lot of input/outputs, and looks badass when strapped to your wrist. The output is this Sparkfun made Speakjet Arduino shield; think of it as text-to-speech. It is capable of pronouncing a list of phonetics, from which you can configure it to say them in the right order to make words, or gobbledegook. This pushed out an audio signal to the previously mentioned tiny speaker, and mirrored the results onto the LCD screen.
Temperature outside is directly proportional to the amount of solder vapor I inhale.
There was some prototyping area on the speakjet, so all the connections were routed through here.
That about does it for a hardware overview; software from here on out. The reason the glove you see is Version 1 is because so much time was spent getting the hardware together and reliable, the software is not as robust as a daily use version. Don’t get me wrong, this thing can work fine and dandy, but there’s some improvements I would like to make. Let’s start from the top of what V.1 is, and then I’ll discuss future improvements.
Sensor data comes in and, due in no small part to how they are attached, are pretty free of movement artifacts; the data is pretty reliable and consistent, only a simple low pass filter is used to just take an exponential moving average of the data. Mechanically, the sensors are attached to the glove through the use of small metal brackets made with garden wire (this too forever…) or with button snaps like you’d find on clothing (hey, we are dealing with a glove here). This allowed the flex sensors to remain fixed at their base and slide through the brackets, but also allowed the stretchiness of the glove to act as the spring–and the user’s hand as the damper–in a simple spring-mass-damper system. Fancy words for, “I sewed things on to a stretchy glove. And the glove was on my hand at the time.”
Future posts: making brackets. sewing. sewing with your off hand.
After the sensor data comes in and is converted to digital values through the Arduino’s on board ADC, and is slightly filtered, it gets formatted into a simple state matrix: 5 values for the flex sensors, 3 for each axis of the accelerometer. This state matrix gets run through a Naive Bayesian Classifier whenever the state has stabilized, i.e. the user has performed a gesture/letter and holds that position for a specified amount of time. This delay signals the microprocessor to compute the most likely gesture that has just been performed based on the current state of the sensor data out of a list of possible gestures that the Arduino knows about. Because I have no idea what I’m doing when it comes to ASL, I configured the delay to be 2 seconds for myself. Gimme a break, I learned cello on weekends, not ASL.
After the classifier has done its duty, the Arduino takes the gesture of what it thinks to have just been done, and looks it up in it’s dictionary–for us, this was the alphabet and like 10 words, due to memory constraints. The recognizable gestures corresponded to entries in it’s recognition dictionary, which translated between the gesture to the requisite phonetic commands for the Speakjet. These phonetics get sent to the Speakjet chip, and a freakishly robotic voice then says the word. Hey, they included a volume dial thankfully.
So some optimizations included using letter frequencies in the classifier (an ‘e’ is more likely to show up than a ‘z’), and code was optimized for performance; it’s pretty slick how much a 16MHz processor can do. While there were more optimizations that could be done (e.g. letter frequency based on the previous letter), it just was not worth it. The Bayesian Classifier is very limited in capability, but great for a proof-of-concept.
Thus, for Version 2, I’ve got Hidden Markov Models planned and the Arduino Mega will be a “training” unit (both for the user, and the HMM), and I want to miniaturize it to an Arduino Pro Mini. HMM’s are awesome for identifying things that can’t be observed directly; they are frequently used for speech recognition. But yeah, things to do things to do. I’ll revise this article shortly, as its level of ‘snarky’ is probably too high.