Currently, I am working on a medical project which requires detection of Head Nods (in agreement), Head Shakes (in disagreement), and Head Rolls (Asian/East Indian head gesture for agreement) within a computer application.
Being that I work with the Kinect for Windows device, I figured this device is perfect for this type of application.
This posting serves as explanation to how I built this library, the algorithm used, and how I used the Kinect device and Kinect for Windows SDK to implement it.
Before we get into the Guts of how this all works, let’s talk about why the Kinect is the device that is perfect for this type of application.
The Kinect v2.0 Device has many capabilities. One of which allows the device to capture a persons face in 3-D… That is 3-Dimensions:
Envision the Z-axis arrow pointing straight out towards you in one direction, and out towards the back of the monitor/screen in the other direction.
In Kinect terminology, this feature is called HD Face. In HD Face, the Kinect can track the eyes, mouth, nose, eye brows, and other specific things about the face when a person looks towards the Kinect camera.
So envision a person’s face tracked in 3-D.
We can measure height, width, and depth of a face. Not only can we measure 3-d values and coordinates on various axes, with a little math and engineering we can also measure movements and rotations over time.
Think about normal head movements for a second. We as humans twist and turn our heads for various reasons. One such reason is proper driving techniques. We twist and turn our heads when driving looking for other cars on the road. We look up at the skies on beautiful days. We look down on floors when we drop things. We even slightly nod our heads in agreement, and shake our heads in disgust.
Question: So from a technical perspective what does this movement look like?
Answer: When a person moves their head, the head rotates around a particular axis. It’s either the X, Y, Z, or even some combination of the three axis. This rotation is perceived from a point on the head. For our purposes, let’s look at the Nose as the point of perspective.
When a person Nods their head, the nose is rotated around the X-axis in small up and down manner. The Nose coordinates for Head Nod makes the Y- coordinate values of the Nose point go up and down.
When a person Shakes their head, the nose is rotated around the Y-axis in a small left and right manner. The Nose coordinates for the Head Shake makes the X-coordinate values of the Nose point go up and down.
If we were to graph Nods and Shakes over time, their Y and X graphs would look like this:
Question: So great, we have a graph of Head Nods and Head Shakes… How do we get the Y, X and rotations from the head?
Answer: Luckily for us the Kinect for Windows SDK, provides us engineers with the HD Face Coordinates in 3-D. That is we get the X, Y, and Z coordinates of a Face. Due to linear algebra, and vector math, we can also derive the Rotational Data from this as well. HD Face gives us Facial orientation, and also Head Pivot data.
Question: Now we’re getting somewhere, so exactly how do you calculate Head Nods/Shakes/Rolls with the Kinect?
Answer: Well it takes a little creativity, and some help from some researchers in Japan (Shinjiro Kawato and Jun Ohya), who figured out the mathematically formula to derive the head position deviations.
So my implementation is based in part on this paper. Instead of “Between the eyes”, I decided to use the Nose, since the Kinect readily gives me this information fairly easily.
The implementation concept is simple.
First let’s assume, from the research paper that a typical Nod/Shake/Roll lasts about 1 to 1.4 seconds.
Next let’s take for fact that the Kinect device produces 30 frames per second. And as long as a person is facing the camera, the majority of these frames per second will produce a HD Face frame for us (assuming at least approx ~15-20 fps).
Therefore if I capture about 1-1.5 seconds of frames, I can determine Head Rotations, pixel coordinates (X, Y and Z), derive rotation in angles, and store this data in a state machine for each measured frame.
I can then change states for each measured frame from “Extreme” to “Stable” to “Transient” based on the algorithms provided by Kawato and Ohya.
I then use a delayed 5 frame buffer to evaluate a set of states for the last 3 of the 5 buffered frames.
Next thing I do is continue applying the algorithm from Kawato and Ohya to figure out when and precisely how to check for head nods/shakes/rolls inside my buffered frame states.
The mechanism to check is simple as well. If the current frame state changes from a non stable state to “Stable” then I go and evaluate for Nods/Shakes/Rolls.
The evaluation is also simple. During the evaluation process, if the previous frame states have more than 2 adjacent “Extreme” states, then I check to see if all the adjacent states have Nose rotation angles greater than a configurable threshold. By default my threshold is 1 degrees. Depending on which axis it is, Y – Nods, X – Shakes, Z – Rolls, I raise an event that the appropriate head action occurred.
Here’s a graphical view of the process flow:
Frame state depiction:
If you’re interested in testing out this library, please contact me here through this blog.
Here’s the library and a sample Windows 8.1 store application using the library in action. In the picture below, I have updated the HD Face Basic XAML sample for visualization. As the HD Face mesh head nods and shakes, I show the confidence of a Head Nod or Head Shake. On the left represents KinectStudio and a recorded clip of me testing the application
Just a quick reminder, for those that will be developing applications for the Kinect for Windows v2, the Kinect Human Interface Guidelines v2.0 has been released back in October 30, 2014. I originally missed this posting so I’m putting here on my blog so I can find it again.
It contains 140 pages of recommendations, specifics, and best practices on how to place, use, develop and interact with Kinect enabled applications.
You can get the latest version of the guide here: http://download.microsoft.com/download/6/7/6/676611B4-1982-47A4-A42E-4CF84E1095A8/KinectHIG.2.0.pdf
This morning I awoke to an email:
Dear Dwight Goins,
Congratulations! We are pleased to present you with the 2015 Microsoft® MVP Award! This award is given to exceptional technical community leaders who actively share their high quality, real world expertise with others. We appreciate your outstanding contributions in Kinect for Windows technical communities during the past year.
The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.
Here’s an excerpt:
The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 23,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 9 sold-out performances for that many people to see it.
I was fortunate enough to attend the New York City (NYC) Kinect Hackathon held on June 21, 2014 – June 22, 2014 @GrindSpaces. This posting is about my experiences and a self interview/review of the projects and experimental hardware Microsoft allowed us to play with. If you’re interested please read on…
Question: So exactly what is a Kinect Hackathon?
The Microsoft Kinect for Windows team has been touring the world trying to push and expose the new Kinect for Windows (K4W) v2 device. They have gone to places like San Francisco, Germany, and China. More recently, they stopped off in New York City. At each location, the team introduced new features, and hardware to the masses showing off capabilities and potential. In NYC, the way in which the team did this was to have a Hackathon contest.
A Hackathon is simply a gathering of technical minded people ranging from inventors, to designers, to enthusiasts, to hobbyists, to developers, architects, and just plain ole smart people who have an idea. The goal is to take this idea and see if the people in the room can make it a reality by building a Proof of Concept (POC).
The contest part of it is to see which team can come up with the best working POC, for one or more ideas within 24 hours. Food and drinks are supplied all night, and team members and architecting, designing, developing, and testing all night until that cut off time.
Question: Wow, that sounds like fun, What was it like?
It was very fun!!! Let me explain why. The day started off by Ben Lower, community manager for Kinect for Windows team, introducing us to various members of the K4W Team: Carmine, Chen, Smarth, Ragul, Kevin, David and Lori. (Please excuse name spellings and if I missed anyone I apologize) and then explained about the new experimental firmware update to make the K4W v2 device support near mode – up to a potential 1cm, although at the Hackathon the current edition was up to 10cm. Ben also talked about Kinect Ripple a new framework which allows you use the floor to map or calibrate a grid for a pseudo menu/command control system for an application, while still keeping the K4W normal functionality – body tracking, audio, etc.
The next this that transpired was opening the floor for ideas, and forming a team. A little feeling slighted note… the winners of the contest were teams which were pre-planned and prepared prior to this event, but that was ok.
People took the microphone in droves… I wish I had recorded all the great ideas and thoughts people envisioned with this device, because I could just quit my day job and work on each idea, one project at a time. Each idea has the potential to make profit, and benefit humanity. The few ideas I did remember ranged from tracking animals, plants, and people, to robots avoiding obstacles, to field sobriety tests, to automated CAD designs, to virtual orchestras, playing instruments with your body, occulus rift + kinect skeleton tracking, simple funny gestures, to move the virtual egg but don’t wake the rooster farm game, to robotic hand tracking
, to Kinect ripple based Mine sweeper, a kinect ripple based match that shape game, and of course but not least my idea of the Windows 8 Store app: Kinect Virtual Doctor.
After the idea forming came the teams. I pitched my idea, others pitched their ideas, and we just went around forming teams if you didn’t have one. At first I was afraid my heart rate idea (based on my initial code here) would just be a copy and paste app for Windows 8 until a young college student named Mansib Rahman decided to pair up with me.
We changed the game…
We started binging (googling in reality – but binging sounds WAYYYY better) potential algorithms for various medical rates using the HD Camera, IR Camera, and the new Near Mode firmware features of the Kinect. We learned a lot. We worked all night when I re-imagined and realized the potential for a medical library for use with the K4W v2 device was huge. That’s when we decided to create the Kinect Virtual Doctor windows 8 store app. The application could potentially be placed inside your home, and you can stand in front of your mirror, while the application could tell you your breating rate- O2 Levels, pulse, blood pressure, stress mood, alertness, and many other things. But first we needed to make sure it was plausible and doable. We took the rest of the night trying to determine which algorithms we could implement in 24 hours. It turns out the heart rate, and breathing rates were the easiest, but we only ended up with re-writing my heart rate sample for Windows 8 utilizing the algorithm posted here.
One of the funniest stories of the night in my opinion was the “Pied Piper” green T’s group, at least that’s what I call them. Kudo’s by the way to sticking it out, and passing me a working audio sample – thanks to Andras (K4W MVP). Oh and before I forget – thank you Velvart Andras and James Ashley – (K4W MVP’s) for helping me out with ideas and coding.
These “Pied Piper” guys started out with the idea playing musical instruments with your body. For example if you hit your arm, it plays a drum, if you chest it changes the octave or plays another instrument. Sitting next to these guys was painful because of the terrible sounds coming from that area of the room. Envision akward musical notes with no melody constantly sounding off around 3am in the morning… Then on the other side of me was the Roosters crowing “Cocka-doodle-doo” right afterwards. I swear I felt like Noah or something. In any case the piped piper guys realized it was a little more difficult to do the playing music with your body routine. So they started to give up. A couple of them left and had some drinks – and in my opinion came back slightly wasted. That’s when the only logical thing for them to do appeared… “Let’s make a field sobriety test with the Kinect”. The app was simple – walk a straight line and repeat a tounge twister phrase. If the Kinect tracked you walking the straight line and the you said the phrase correctly, you didn’t go to jail.
This was hilarious and straight out of the HBO movie series “Sillicon Valley” and their fake Pied Piper web site mixed with the intoxication app from Google’s based “The Internship”… Now we went from 3 in the morning bad music to rooster crows to “Ricky Rubio went around the reading rainbow wrong” or something like that – PRICELESS!!!
Question: So what was your experience with the experimental firmware for the Kinect?
I will simply say this, for my application the 10cm worked better for obtaining the face and calculating the heart rate, however not everyone had the same success for their applications during the event.
Question: What was your experience with the Kinect Ripple?
I thought this was another great implementation for the Kinect. I can see Museums, Science exhibits, Schools, Convention centers and the like all utilizing this funtionality. In case you’re wondering what exactly it does… here’s a quick image:
and video: http://youtu.be/RfJggcO7zZ8
Question: So would you say the Kinect Hackathon was a success?
Yes, I most definitely would!
In case you missed my newsgroup session last night, I have attached my PowerPoint slides from the presentation. Whats New in Kinect for windows v2
The Kinect team is sponsoring a terrific hackathon in New York City June 21-22. The Kinect Team is going to be there with plenty of pre-release v2 sensors for people to use to create interactive applications.
In addition to the Kinect v2 sensors & SDK, the team is going to be bringing two new, cutting edge things with them: near-field sensors and Kinect Ripple (see below for more info).
My talk at the Boulder.Net UG on June 17th 2014 is now up: http://www.meetup.com/Boulder-NET-User-Group/events/186873732/
Check it out!!!