Steam Powered Data Science for HR

 

steam engineWe’re back with another installment of our One Model Difference series. On the heels of our One AI announcement, how could we not take this opportunity to highlight it as a One Model difference maker?

In preparation for the One AI launch, I caught up with Taylor on our data science team and got an updated tour of how it all works. I’m going to try to do that justice here. The best analogy I can think of is that this thing is like a steam engine for data science. It takes many tedious, manual steps and let’s the machine do the work instead.

It's not wizardry.

It's not a black box system where you have to point at the results, shrug, and say, “It’s magic.” This transparent approach is a difference in its own right, and I’ll cover that in a future installment. For now though, describing it as some form of data wizardry simply would not do it justice. I think it’s more exciting to see it as a giant, ambitious piece of industrial data machinery.

Let me explain. You know the story of John Henry, right? John Henry is an African-American folk hero who, according to legend, challenged a steam-powered hammer in a race to drill holes to make a railroad tunnel. It’s a romantic, heart-breaking story. Literally. It ends with John Henry’s heart exploding from the effort of trying to keep pace. If you need a quick refresher, Bruce Springsteen can fill you in here.

(Pause while you use this excuse to listen to an amazing Bruce Springsteen song at work.)

Data science is quite a bit easier than swinging a 30 pound hammer all day, but I think the comparison is worthwhile. Quite simply, you will not be able to keep pace with One AI. Your heart won’t explode, but you’ll be buried under an exponentially growing number of possibilities to try out.

This is particularly true with people data. The best answer is hiding somewhere in a giant space defined by the data you feed into the model multiplied by the number of techniques you might try out multiplied by (this is the sneaky one) the number of different ways you might prepare your data. Oh, and that’s just to predict one target. There’s lots of targets you might want to predict in HR! So you wind up with something like tedious work to the fourth power and you simply should not do it all by hand.

All data science is tedious. The first factor, deciding what data to feed in, is something we’re all familiar with from stats class. Maybe you’ve been assigned a regression problem and you need to figure out which factors to include. You know that a smaller number of factors will probably lead to a more robust model, and you need to tinker with them to get the ones that give you the most bang for your buck. This is a pretty well known problem, and most statistical software will help you with this. This phase might be a little extra tricky to manage over time in your people analytics program, because you’ll likely bring in new data sets and have to retest the new combinations of factors. Still, this is doable. Hammer away.

Of course, One AI will also cycle through all your dimensional data for you. Automatically. And if you add factors to the data set, it will consider those factors too.

But what if you didn’t already know what technique to use? Maybe you are trying to predict which employees will leave the company. This is a classification problem. Data science is a rapidly evolving field. There are LOTS of ways to try to classify things. Maybe you decide to try a random forest. Maybe you decide to try neural nets using Tensorflow. Now you’re going to start to lose ground fast. For each technique you want to try out, you’ve got to cycle through all the different data you might select for that model and evaluate the performance. And you might start cycling through different time frames. Does this model predict attrition using one year of data but becomes less accurate with two years…?  And so on.

Meanwhile, One AI will automatically test different types of models and techniques, over different time periods, while trying out different combinations of variables and evaluating the outcomes. In comparison, you’ll start to fall behind pretty rapidly. But there’s more...

Now things get kind of meta. HR data can be really problematic for data science. There is a bunch of manual work you need to do to prepare any data set to yield results. This is the standard stuff like weeding out bad columns, weeding out biased predictors, and trying to reduce the dimensionality of your variables. But this is HR DATA. The data sets are tiny and lopsided even after you clean them up. So you might have to start tinkering with them to get them into a form that will work well with techniques like random forests, neural nets, etc. If you’re savvy, you might try doing some adaptive synthetic sampling (making smaller companies appear larger) or principal component analysis. (I’m not savvy, I’m just typing what Taylor said.)

So now you’re cycling through different ways of preparing the data, to feed into different types of models, to test out different combinations of predictors. You’ve got tedious work to the third power now.

Meanwhile, One AI systematically hunts through these possibilities as well. Synthetic sampling was a dead end. No problem. On to the next technique and on through all the combinations to test that follow. This is not brute force per se-- that actually would introduce new problems around overfitting. The model generation and testing can actually be organized to explore problem spaces in an intelligent way. But from a human vs. machine perspective, yeah, this thing has more horsepower than you do.

And it will keep working the models over, month after month. This is steam powered data science. Not magic. Just mechanical beauty.

And now that we have this machine for HR machine learning. We can point that three-phase cycle at different outcomes that we want to predict. Want to predict terminations? Of course you do. That’s what everyone wants to predict. But what if in the future you want to predict quality of hire based upon a set of pre-hire characteristics. One AI will hunt though different ways to stage that data, through different predictive techniques for each of those potential data sets, and through different combinations of predictors to feed into each of those models…and so on and so on.

You can’t replicate this with human powered data science alone. And you shouldn’t want to. There’s no reason to try to prove a John Henry point here. Rather than tediously cycling through models, your data science team can think about new data to feed into the machine, can help interpret the results and how they might be applied, or can devise their own, wild one-off models to try because they won’t have to worry about exhaustively searching through every other option. This might turn out similar to human-computer partnership in chess. (https://www.bloomreach.com/en/blog/2014/12/centaur-chess-brings-best-humans-machines.html)

One AI certainly supports this blended, cooperative approach. Each part of the prediction pipeline can be separated and used on its own. Depending on where you are at in your own data science program, you might take advantage of different One AI components. If you just want your data cleaned, we can give you that. Or, if you already have the data set up the way you want it, we can save you time by running a set of state of the art classifiers on it, etc. The goal is to have the cleaning/preprocessing/upsamping/training/etc pieces all broken out so you can use them individually or in concert. In this way, One AI can deliver value whatever the size and complexity of your data science team, as opposed to an all-or-nothing scenario.

In that regard, our human vs. machine comparison starts to break down. One AI is here to work with you. Imagine what John Henry could have done if they’d just given him the keys to the steam engine?

Book some time on Phil's calendar below
to get your HR data-related questions answered. 

book some time on phil's calendar

Book some time on Phil's calendar.


About One Model:
One Model provides a data management platform and comprehensive suite of people analytics directly from various HR technology platforms to measure all aspects of the employee lifecycle. Use our out-of-the-box integrations, metrics, analytics, and dashboards, or create your own. Our newest tool, One AI, integrates cutting-edge machine learning capabilities into its current platform, equipping HR professionals with readily-accessible, unparalleled insights from their people analytics data. Notable customers include Squarespace, PureStorage, HomeAway, and Sleep Number.

 

Asset 1

Subscribe to
Our Newsletter

Get the latest insides,
news and updates delivered straight to your inbox.

PREVIOUS ARTICLE NEXT ARTICLE