Part 2: Building the Home Automation App – Tell Windows 10 Cortana to Control your Lights

This weekend I decided to take on a personal endeavour and step through my adventures of automating my Home with Windows 10 and IoT (Internet of my Things). You can read about a general overview in part 1 here.

This posting is about my steps and experiences around using Windows 10, an IoT device called the Phillips Hue Lighting system, and some custom code to control my office lights. Let’s first see it in action:

HomeAutomationTurnOnOffLights from Dwight Goins on Vimeo.

In the above video clip, I speak to my Windows 10 personal assistant: Cortana. I tell Cortana to turn my office lights on and off. I can even tell it to change office colors:

HomeAutomationChangeLightColors from Dwight Goins on Vimeo.

This is great!!! So how did you do it?

I created a Windows 10 UWP application and used the Phillips Hue Lighting REST API to control the lights and light colors.

Ok that’s the simple answer so let me expound some more on that. Windows 10 allows you to create Universal Windows Platform applications which target the Windows 10 operating system. The Windows 10 operating system has some core components which make it fairly easy to work with IoT devices. One example of this is controlling the Hue lights. Controlling the Hue lights is accomplished by way of a wireless signal through a local network which Windows 10 can send to the lighting system.

Windows 10 also has a built in personal assistant like that off Apple’s Siri on iPhones. The personal assistant is called Cortana Cortana comes with all Windows 10 devices capable of processing speech and accessing the internet. Cortana can do everything Siri can do and a lot more. As you’ve already seen, one can tell Cortana to perform new and custom actions based on speech, custom code and body gestures. Cortana even supports multiple languages, for those ancient languages that aren’t apart of Cortana, you’ll have to get creative and “englibic” it. Here’s an example:

HomeAutomation_AncientAfricanLanguage from Dwight Goins on Vimeo.

Thus inside the Home automation App, I’m going to tell Cortana to turn on and off the lights, and change the light colors. Cortana will process my speech commands and inform my Home Automation App of which actions it should take. The Home Automation App will then send the commands over the network to the Phillips Hue Lighting system.


  1. I first started with downloading and installing Visual Studio .Net 2015 on my Windows 10 computer.
  2. Next I downloaded and installed the Windows 10 SDK.
  3. After my environment was setup, I opened up Visual Studio .Net 2015 and I created a new Universal project for Windows 10. (To learn how to do this view getting started)
  4. Next I started researching to find out exactly how to teach Cortana about new speech commands, and how to have Cortana tell my Home Automation App what to do. What I found was a sample project on Github and a nice video explaining how to include Cortana in your UWP apps.
  5. Next I researched how to turn on and off lights in the Hue system from here.
  6. I then just created my custom speech commands and invoked the Hue REST api’s to turn on and off the lights.
  7. Lastly, I looked at the hue, brightness, and saturation fields from the Hue system to get a range of colors and added those colors into my Home Automation app to support changing colors.

Overall this took about 4-5 hours to get it all working and I was impressed how easy it all was.

Now on to my next adventure: Controlling the Sonos Wireless Stereo system. I suspect that this is going to be harder, because I know for a fact that the Sonos system does not provide a documented API to control it, so that means I’ll have to hack it.

Stay Tuned for part 3: Automating a Sonos Stereo system with Windows 10 and IoT

Windows10 and IOT: How to automate your home Part 1

IoT stands for the Internet of Things. Some of the concepts of IoT are that devices can, connect to the internet, communicate with other devices, transmit events and data, and receive events and data all through the internet. Another concept of IoT, is that with these devices there is telemetry data and information that comes from them, and this data can be analysed, and learned from, providing insight into the events and daily operating usage of the device. If these devices are home based and personal, the idea is that learning from this data should allow me to better understand how these devices affect my life, and allow me to make better choices about how to use the devices to better my life.

Windows 10 is Microsoft’s latest operating system to be released which has core functionality to use a single platform for multiple devices to connect to the Internet. For developers, it utilizes a new platform called the Universal Windows Platform (UWP). This new platform hides the gory details of how to connect devices, and get them on the internet. Instead, developers focus on the core functionality of their application. This means it should be easy to get devices connected, sending, and receiving which leads to analyzing and learning.

So why mention IoT and Windows 10 together?

Microsoft touts that we can use Windows 10 to quickly and easily build an IoT solution. Let’s see if we can test this claim. The goal, use Windows 10 to build a Home Automated solution. Basically I want to turn lights on, change music, and look at a security camera feed at night. Once I’ve accomplished this, I want to see what my favourite music is over time, see what security events I should be aware of happening at night around my house, such as movement and blob detection, and lastly, figure out if my electric bills are rising due to elongated usage of my office lights.

This post will be 1 of multi part articles about Windows 10 and IoT. So with all these devices, and with all this data, I should be able to make a quick and easy Home Automation solution and learn from my daily routine to make better decisions about my office, music, security and lighting conditions.

Ok this is alot of reading, I want to get started now. How do we start?

To get started, I figure I would take some time and talk a little about where the IoT industry is going in light of many announcements made from big name companies like Microsoft, Amazon, Google and Apple. The obvious is move is that these tech giants are trying to push more and more connected devices towards consumers. If you notice these devices are integrated in our homes, schools, offices, and have even made it into our daily living routines.

What devices are you referring to?

For example, we have the new Microsoft Band 2 which is getting ready to come out which monitors your health and living style. From a home décor standpoint, we have the Phillips Hue lighting system which allows you to control your lights in your home. From an entertainment perspective, we have the Sonos stereo system, which allows you to control your entertainment system and music. From a security camera standpoint, we have infra-red cameras and depth sensors like the Intel RealSense camera, and Kinect for windows v2 camera which can easily provide security video feeds around your house. Lastly, we have the Windows 10 operating system, software that can run with and on various devices to allow you to connect and bring them all together.

As we venture through this home automation solution, I’ll post video snippets to show my progress and timings.

Do you have a diagram of how all these things will work together?

Ok with all that out the way let’s draw up the architecture around how all this will work together.

Windows 10 and IOT diagram

In the above diagram, the user: Me, can say some commands such as: “Hey Cortana, Home Automation, turn on Office lights”, “Hey Cortana, Home Automation, play music from India Arie”, or “Hey Cortana, Home Automation, view the Security Video from last night”. A Home Automation Windows 10 application will process the commands and send and receive data from the connected devices. As the automation works, telemetry data, and information is sent to Cortana Analytics. After a few days of automation, I can query the data from Cortana Analytics and analyze, and learn from my daily usage habits. The theory is I should be able to tell what my favorite music I like to listen to was that previous week. Also I can figure out what type of weird security events such as movement detection and blob detection has occurred at night, and get a running log of how long I keep my lights on in my office for electric billing purposes. Groovy huh???

Stay Tuned for part 2

Stay tuned for part 2… Building the Home Automation App – Getting Cortana to understand my commands and control the Phillips Hue Lighting system.

Windows Hello Rocks!!! Now Why can’t the Kinect for Windows Do This???

Let me start by repeating “Windows Hello Rocks!!!”.

For those of you who don’t quite now what this is, one of the new features of Windows 10 is to get rid of passwords, and use biometric sensors to recognize who you are.

Biometric sensors being fingerprint readers, Iris retina scanners, and of course Depth cameras such as the Intel RealSense F200. The Intel depth camera is eerily similar to the Kinect for Windows cameras: v1 and v2, so hopefully those with Kinects, can use this feature too in the near feature. I know some of you may be asking what’s the big deal… Fingerprint readers have been around for 15 years or more. I know I use to have one when I worked for the Air Force Reserves as a Crystal Reports developer.

Well, the big news is now you can use embedded cameras like the Intel RealSense F200 to simply have your face recognized securely so you don’t need your finger anymore!!!

But I digress, currently the Kinect is not supported so I ask why?

My only guess why is because doing this requires changes in the driver architecture. The current driver is designed to be run in User mode setting. User mode loads “AFTER” a user is logged in, so using the Kinect would require creating a driver which runs in Kernel mode – which runs before the user is logged in, thus allowing this device to run outside the realm of the user mode setting and be used for Facial recognition.

Well I just got my Intel RealSense Development Kit in the mail. It contains the F200 camera along with SDK and drivers for windows 8.1 and windows 10.

I installed the drivers and SDK on both my win8.1 machine, along with my surface pro 3 which has Windows 10 build 10240, and I attached the device. Windows recognized it perfectly. I followed the steps here.

Once complete Windows hello was working and “Look Ma, no hands” no more passwords. Windows 10 recognized my face and only my face. I can sign in by just getting in front of the camera.

Great work Microsoft!!!

I’m presenting at the MVP Virtual Conference


For those who would like to hear about using the Microsoft Kinect for Windows v2 to detect facial expressions, I will be presenting a 50 minute session online for the first ever Microsoft MVP Virtual Conference on just that.

Save the Date: May 14th @2pm – 2:50pm EST…


There will be many sessions not just mine. There will be topics on Windows 10, Microsoft Azure, Microsoft Edge (formerly project spartan), SQL Server, Power BI, Office 365, Enterprise Mobility, Surface, Powershell, Skype for Business, Hyper-V, System Center, OWIN, ASP.Net vNext, Unity 3D, AngularJS, Puppet, Microsoft Band, Xamarin and much more.

If you’re interested in this free online event, register here: 

Come support me and my fellow MVP’s!!! Troll us or simply ask us questions. I hope to hear and see you there.

Denver Kinect + Microsoft Band + Unity Hackathon


Last year I went all around Colorado drumming up support for Kinect development by talking at user groups, breakout sessions, and meetings. I’ve spoken in Fort Collins, Fort Carson, Boulder, Denver, Colorado Springs, Fountain, and many more areas. I’m glad to announce with all the support we’ve received, we are now holding a Kinect hackathon in Denver Colorado, April 24th – 25th 2015.

This event will bring the Kinect for Microsoft engineers down to an all night event, answering any questions you have about the device and how to program with it. The Microsoft Kinect team will also bring, spare Kinect devices, Microsoft Bands, Unity Pro plugins, and the Unreal 4 engine plugin, along with other surprises.

Kinect for Windows MVP’s such as myself will also be there to help, and tell you about things we’re working on, experiences, and possibly even opportunities for you to get architecture and coding help.

This event is for designers, gamers, developers, enthusiasts, students, hobbyists, idealists, medical researchers. If you have an idea you’d like to see built, or a game you want to play that is Kinect enabled this is the place to be.

Register for the upcoming #denver #hackathon today at: Space is limited! #kinect #k4wdev #k4wv2

So click on the link above to register. Don’t miss this opportunity to meet the best minds in the NUI, Kinect, Hololens world…

My Kinect told me I have Dark Olive Green Skin…

Did you know the Kinect for windows v2 has the ability to determine your Skin  pigmentation and your hair color? – Yes I’m telling you the truth. One of the many features of the Kinect device is the ability to read skin complexion and hair color of a person who is being tracked by the device.

If you ever need or require the ability to read the skin complexion of a person or determine the color of a persons hair on their head, this posting will show you how to do just that.


The steps are rather quick and simple. Determining the skin color requires you to access Kinect’s HD Face features.

Kinect has the ability to detect Facial features in 3-D. This is known as “HD Face”. It can detect depth, height, and width. The Kinect can also use it’s High Definition Camera, to detect colors such as the Red, Green, and Blue intensities that reflect back, and infer the actual skin tone of a tracked face. Along with the skin tone, the Kinect can also detect the Hair color on top of a person’s head…

So What’s Your Skin Tone? Click Here to download the source code and try it out.

If you want to include this feature inside your application, the steps you must take are:

1. Create a new WPF or Windows 8.1 WPF application

2 Inside the new application, add a reference to the Microsoft.Kinect and Microsoft.Kinect.Face assemblies.


3. Let’s also make sure we set this up for the proper processor architecture: HD Face supports both 32bit and 64 bit. I want to use 64 bit. Change your build settings to use 64 bit configuration and builds from the project properties in VS.Net:


The above step is very important. You must choose either x86 (32 bit) or x64 (64 bit) architecture and build accordingly. “Any Cpu” won’t work as is here. The reason being that the Kinect Assemblies are named exactly the same thing, however they are compiled appropriately for each architecture. You can easily get a “BadFormat” exception if you’re using version x86 with a 64 bit build and vice versa.

4. Next copy the correct version of the NuiDatabase from the Kinect Redist folder into your \bin\x64\Debug  folder path for your project. This step is also important. If you mis-match your versions, by copying the x86 NuiDatabase into a 64bit compiled application you’ll start to see weird errors during runtime, things like it can’t find your Kinect.Face assembly, and “BadFormat” errors. So make sure you choose the correct architecture.


Note: Optionally You can also use the Kinect Nu-Get packages which will basically do the right thing for you. However you can’t mix and match. You can’t manually add references and then go back and add NuGet packages, things will quickly get out of sync:


5. Inside your code add the namespaces for Kinect and Kinect HD Face:

using Microsoft.Kinect;
using Microsoft.Kinect.Face;

6. Create some variables to hold pointers to the artefacts:

        private KinectSensor m_sensor;
        private BodyFrameReader m_bodyReader;
        private HighDefinitionFaceFrameReader m_hdFaceReader;
        private HighDefinitionFaceFrameSource m_hdFaceSource;
        private FaceModel m_faceModel;
        private FaceAlignment m_faceAlignment;
        private FaceModelBuilder m_faceBuilder;
        private ulong m_trackedBodyId;
        private bool m_faceBuilderStarted;
        private bool m_faceBuildComplete;

The m_sensor holds a pointer to the Kinect Device itself. We’ll use this to get access to the Body Frames, High Definition Face Frames, FaceModel, FaceModel Builder and tracked person. The m_bodyReader will be a frame reader for determining a body being tracked. The Kinect sends 30 frames per second. Each frame can tell us if a person is found within that frame of data. The m_hdFaceSource will be the HD Face source to keep track of bodyTrackingID’s and give us access to the 30 frames per second data of HD Face Frames. The m_hdFaceReader will be used as each HD Face Frame is processed, it allows us to get the 3-D Face information (FaceModel), and listen for events which allow us to build a complete 180 degree view of the face. The m_faceModel will be the 3-D Face measurements. The m_faceBuilder will be used to build the 180 degree HD Face model which will be stored inside the m_hdFaceModel. The m_faceBuilder provides us with the internal mechanism to build an internal matrice of 3-D Face depth values of  IR, and Color (RGB) information. This allows us to then produce the complete m_hdFaceModel with Skin Color and Hair Color respectively. The m_faceBuilder also allows us to listen for events that tell us when the tracked face needs to rotate left, right, and tilt up, to make sure the complete matrix is built. The m_trackedBodyId is a tracking id that synchronizes the tracked Body, with the HD Face Source. Without a synchronized tracked person the HD Face can not perform it’s work. Lastly, there are two flag variables that will help us keep track of when the Face Builder process has started, and when the Face Builder process has completed.

Game Plan:

Overall what the application is going to do is initialize the Kinect sensor  and variables to default values. It will then setup the BodyFrameReader to listen for body frames to come from the Kinect. Once a body frame is generated, we will determine if a body is within the frame and figure out if the body is tracked. If the body is tracked, we will get the trackingId of the body and set it to the HDFrameSource. Once the tracking Id is set on the HDFrame source, this will generate HDFaceFrame events. Once a valid HDFaceFrame is generated we will start the face builder process. We will ask the face builder to start the process of building the 180 degree face model matrix. At this point the tracked user needs to turn their head slowly left and back to center, right and back to center, up and down back to center until the face builder notify us when it’s complete building the matrix. Once complete we ask the face builder to produce the 3-d face model. The 3-D face model then gives us access to the Skin Color, Hair Color, and 3-D depth and matrices.

7.  Initialize the sensor get an instance of your Kinect sensor, initialize your bodyReader, hdFaceReader, faceModel, trackingId and faceAligment variables:

 public MainWindow()

        public void InitializeKinect()
            m_sensor = KinectSensor.GetDefault();
            m_bodyReader = m_sensor.BodyFrameSource.OpenReader();
            m_bodyReader.FrameArrived += m_bodyReader_FrameArrived;
            m_hdFaceSource = new HighDefinitionFaceFrameSource(m_sensor);
            m_hdFaceReader = m_hdFaceSource.OpenReader();
            m_hdFaceReader.FrameArrived += m_hdFaceReader_FrameArrived;
            m_faceModel = new FaceModel();
            m_faceBuilder =
                 | FaceModelBuilderAttributes.SkinColor);
            m_faceBuilder.CollectionCompleted += m_faceBuilder_CollectionCompleted;
            m_faceBuilder.CaptureStatusChanged += m_faceBuilder_CaptureStatusChanged;
            m_faceBuilder.CollectionStatusChanged += m_faceBuilder_CollectionStatusChanged;
            m_faceAlignment = new FaceAlignment();
            m_trackedBodyId = 0;
            m_faceBuilderStarted = false;
            m_faceBuildComplete = false;

8. Inside the BodyReader_FrameArrived, event handler, add code to determine when the Kinect tracks a body, once Kinect finds the tracked body, set the trackingId for the hdFaceReader Source.

void m_bodyReader_FrameArrived(object sender, BodyFrameArrivedEventArgs e)
            using (var bodyFrame = e.FrameReference.AcquireFrame())
                if (null != bodyFrame)
                    Body[] bodies = new Body[bodyFrame.BodyCount];
                    foreach (var body in bodies)
                        if (body.IsTracked)
                            m_trackedBodyId = body.TrackingId;
                            m_hdFaceReader.HighDefinitionFaceFrameSource.TrackingId = m_trackedBodyId;

9. Once the trackingId is set for the HdFaceFrameReader Source, this will kick off the HD Face Frame Arrived event handler. Just check the flag and start the face builder process:

void m_hdFaceReader_FrameArrived(object sender, HighDefinitionFaceFrameArrivedEventArgs e)
            if (!m_faceBuilderStarted)


10. In the FaceBuilder_CollectionStatus, just listen for a complete status. This allows us to set our flag for letting us know all the face views have been correctly captured and we can ask for the face builder to give us the model:

 void m_faceBuilder_CollectionStatusChanged(object sender, FaceModelBuilderCollectionStatusChangedEventArgs e)
            var collectionStatus = e.PreviousCollectionStatus;
            switch (collectionStatus)
                    case FaceModelBuilderCollectionStatus.Complete:
                    lblCollectionStatus.Text = "CollectionStatus: Complete";
                    m_faceBuildComplete = true;


11. In the faceBuilder_CollectionCompleted event handler check the collection status to make sure it’s completed, check your flag to make sure it’s set, and then ask the faceBuilder to produce the FaceModel using the event argument variable. The face Model provides access to the Skin Color and Hair Color as an Unsigned Integer (UINT).  To make this an actual drawing color, we’ll need to convert the UINT to a color structure. The color structure can be created using some old skool bit shifting, see below.

private void m_faceBuilder_CollectionCompleted(object sender, FaceModelBuilderCollectionCompletedEventArgs e)
            var status = m_faceBuilder.CollectionStatus;
            //var captureStatus = m_faceBuilder.CaptureStatus;
            if (status == FaceModelBuilderCollectionStatus.Complete && m_faceBuildComplete)
                    m_faceModel = e.ModelData.ProduceFaceModel();
                catch (Exception ex)
                    lblCollectionStatus.Text = "Error: " + ex.ToString();
                    lblStatus.Text = "Restarting...";
                    m_faceBuildComplete = false;
                    m_faceBuilderStarted = false;
                    var skinColor = UIntToColor( m_faceModel.SkinColor);
                    var hairColor = UIntToColor(m_faceModel.HairColor);
                    var skinBrush = new SolidColorBrush(skinColor);

                    var hairBrush = new SolidColorBrush(hairColor);

                    skinColorCanvas.Background = skinBrush;

                    lblSkinColor.Text += " " + skinBrush.ToString();

                hairColorCanvas.Background = hairBrush;
                lblHairColor.Text += " " + hairBrush.ToString();

                    m_faceBuilderStarted = false;
        private Color UIntToColor(uint color)
            //.Net colors are presented as
            // a b g r
            //instead of
            // a r g b
            byte a = (byte)(color >> 24);
            byte b = (byte)(color >> 16);
            byte g = (byte)(color >> 8);
            byte r = (byte)(color >> 0);
            return Color.FromArgb(250, r, g, b);


12. Lastly add the WPF Labels, and Canvas elements to your app so you can actually see something:

<Window x:Class="KinectFindingSkinTone.MainWindow"
        Title="MainWindow" Height="350" Width="525">
        <SolidColorBrush x:Key="MediumGreyBrush" Color="#ff6e6e6e" />
        <SolidColorBrush x:Key="KinectPurpleBrush" Color="#ff52318f" />
        <SolidColorBrush x:Key="KinectBlueBrush" Color="#ff00BCF2" />

    <Grid Background="White" Margin="10 0 10 0">

        <StackPanel Margin="20">
            <TextBlock x:Name="lblCollectionStatus"  Text="CollectionStatus: " Foreground="{StaticResource KinectBlueBrush}" FontSize="20" />
            <TextBlock x:Name="lblStatus"  Text="FrameStatus: " Foreground="{StaticResource KinectBlueBrush}" FontSize="20" />

            <TextBlock x:Name="lblSkinColor"  Text="Skin Color: " Foreground="{StaticResource KinectBlueBrush}" FontSize="20" />
                       <Border BorderBrush="Black"><Canvas Width="300" Height="100"  x:Name="skinColorCanvas" Background="DarkGray"></Canvas></Border>
            <TextBlock x:Name="lblHairColor"  Text="Hair Color: " Foreground="{StaticResource KinectBlueBrush}" FontSize="20" />
                <Border BorderBrush="Black">
            <Canvas Width="300" Height="100" x:Name="hairColorCanvas" Background="DarkGray"></Canvas>

Once your application runs it should look similar to this (Minus the FrameStatus):


Try it out on your own.

Using Kinect HD Face to make the MicroHeadGesture Library

Currently, I am working on a medical project which requires detection of Head Nods (in agreement), Head Shakes (in disagreement), and Head Rolls (Asian/East Indian head gesture for agreement) within a computer application.

Being that I work with the Kinect for Windows device, I figured this device is perfect for this type of application.

This posting serves as explanation to how I built this library, the algorithm used, and how I used the Kinect device and Kinect for Windows SDK to implement it.

Before we get into the Guts of how this all works, let’s talk about why the Kinect is the device that is perfect for this type of application.

The Kinect v2.0 Device has many capabilities. One of which allows the device to capture a persons face in 3-D… That is 3-Dimensions:


Envision the Z-axis arrow pointing straight out towards you in one direction, and out towards the back of the monitor/screen in the other direction.

In Kinect terminology, this feature is called HD Face. In HD Face, the Kinect can track the eyes, mouth, nose, eye brows, and other specific things about the face when a person looks towards the Kinect camera.


So envision a person’s face tracked in 3-D.


We can measure height, width, and depth of a face. Not only can we measure 3-d values and coordinates on various axes, with a little math and engineering we can also measure movements and rotations over time.

Think about normal head movements for a second. We as humans twist and turn our heads for various reasons. One such reason is proper driving techniques. We twist and turn our heads when driving looking for other cars on the road. We look up at the skies on beautiful days. We look down on floors when we drop things. We even slightly nod our heads in agreement, and shake our heads in disgust.

Question: So from a technical perspective what does this movement look like?

Answer: When a person moves their head, the head rotates around a particular axis. It’s either the X, Y, Z, or even some combination of the three axis. This rotation is perceived from a point on the head. For our purposes, let’s look at the Nose as the point of perspective.


When a person Nods their head, the nose is rotated around the X-axis in small up and down manner. The Nose coordinates for Head Nod makes the Y- coordinate values of the Nose point go up and down.

When a person Shakes their head, the nose is rotated around the Y-axis in a small left and right manner. The Nose coordinates for the Head Shake makes the X-coordinate values of the Nose point go up and down.

If we were to graph Nods and Shakes over time, their Y and X graphs would look like this:


Question: So great, we have a graph of Head Nods and Head Shakes… How do we get the Y, X and rotations from the head?

Answer: Luckily for us the Kinect for Windows SDK, provides us engineers with the HD Face Coordinates in 3-D. That is we get the X, Y, and Z coordinates of a Face. Due to linear algebra, and vector math, we can also derive the Rotational Data from this as well. HD Face gives us Facial orientation, and also Head Pivot data.

Question: Now we’re getting somewhere, so exactly how do you calculate Head Nods/Shakes/Rolls with the Kinect?

Answer: Well it takes a little creativity, and some help from some researchers in Japan (Shinjiro Kawato and Jun Ohya), who figured out the mathematically formula to derive the head position deviations.

So my implementation is based in part on this paper. Instead of “Between the eyes”, I decided to use the Nose, since the Kinect readily gives me this information fairly easily.

The implementation concept is simple.

First let’s assume, from the research paper that a typical Nod/Shake/Roll lasts about 1 to 1.4 seconds.

Next let’s take for fact that the Kinect device produces 30 frames per second. And as long as a person is facing the camera, the majority of these frames per second will produce a HD Face frame for us (assuming at least approx ~15-20 fps).

Therefore if I capture about 1-1.5 seconds of frames, I can determine Head Rotations, pixel coordinates (X, Y and Z), derive rotation in angles, and store this data in a state machine for each measured frame.

I can then change states for each measured frame from “Extreme” to “Stable” to “Transient” based on the algorithms provided by Kawato and Ohya.

I then use a delayed 5 frame buffer to evaluate a set of states for the last 3 of the 5 buffered frames.

Next thing I do is continue applying the algorithm from Kawato and Ohya to figure out when and precisely how to check for head nods/shakes/rolls inside my buffered frame states.

The mechanism to check is simple as well. If the current frame state changes from a non stable state to “Stable” then I go and evaluate for Nods/Shakes/Rolls.

The evaluation is also simple. During the evaluation process, if the previous frame states have more than 2 adjacent “Extreme” states, then I check to see if all the adjacent states have Nose rotation angles greater than a configurable threshold. By default my threshold is 1 degrees. Depending on which axis it is, Y – Nods, X – Shakes, Z – Rolls, I raise an event that the appropriate head action occurred.

Here’s a graphical view of the process flow:



Frame state depiction:



If you’re interested in testing out this library, please contact me here through this blog.

Here’s the library and a sample Windows 8.1 store application using the library in action. In the picture below, I have updated the HD Face Basic XAML sample for visualization. As the HD Face mesh head nods and shakes, I show the confidence of a Head Nod or Head Shake. On the left represents KinectStudio and a recorded clip of me testing the application


!Happi Kinecting!