I recently made a simple post on LinkedIn which received a crazy amount of views and overwhelmed us with requests to take a look at what we had built. The simple release was that we had managed to take Workday's point in time (snapshot) based reporting and rebuild a data schema that is effective dated, and transactional in nature. The vast majority of organizations and people analytics vendors use snapshots for extracting data from workday because this is really the only choice they've been given to access the data.
We don't like snapshots for several reasons
- They are inaccurate - you will typically miss out on the changes occurring between snapshots, this makes it impossible to track data/attribute changes in between, to pro-rate, and create analysis any deeper than the snapshot's time context.
- They are inflexible - an object or time context has already been applied to the data which you can't change without replacing the entire data set with a new context.
- They don't allow for changes - If data is corrected or changed in history you need to replace the entire data set, urggh.
- External data is difficult to connect - without effective dating joining in any external data means you have to assume a connection point and apply that time context's values to the external data set. This compounds the inaccuracy problem if you end up having to snapshot the external data as well.
- A pain in the #$% - to pull snapshots from Workday now you need to create the report for each snapshot period that you need to provide. Three years of data with a month end snapshot, that's 36 reports to build and maintain.
With our background in working with raw data directly from HR systems this approach wasn't going to cut the mustard and couldn't deliver the accuracy that should be the basis of an HR data strategy.
The solution is not to buy Workday's big data tools because you're going to be living with many of the same challenges. You need to take the existing structure, enhance, and fundamentally reconstruct a data architecture that solves these problems. We do just that, we extract all employee and object data, analyse the data as it flows and generate additional requests to the Workday API that work through the history of each object. Data is materialized into a schema close to the original but has additional effective dated transactional records that you just wouldn't see in a snapshot based schema. This becomes our raw data input into One Model, delivered to your own warehouses to be used any way you wish.
The resulting dataset is perfect for delivering accurate, flexible reporting and analytics. The final structure is actually closer to what you would see with a traditional relational schema used by the HRIS sold by SAP, Oracle, PeopleSoft etc. Say what you will about the interfaces of these systems but for the most part the way that they manage data is better suited for reporting and analytics.
Now don't get me wrong, this is one area most people know Workday lags in, and in my opinion it should be a low priority decision point when selecting an HRIS. Don't compromise the value of a good transactional fit of an HRIS for your business in an attempt to solve for the reporting and analytics capability because ultimately you will be disappointed. Choose the HRIS system that fits how your business operates, solving for the reporting and analytics needs in another solution as needed.
Time to get a little more technical. What I'm going to discuss below is the original availability format of data in comparison to the approach we take at One Model.
Object Oriented - the why of the snapshot
Okay, so we all know that Workday employs an Object Oriented approach to storing data which is impressively effective for it's transactional use case. It's also quite good at being able to store the historical states of the object. You can see what i mean by taking a look at the API references as below:
The above means the history itself is there but the native format for access is a snapshot at a specific point in time. We need to find a way of accessing this history and making the data useful for more advanced reporting and analytics.
In providing a point in time we are applying a time context to the data at the point of extraction. This context is then static and will never change unless you replace the data set with a different time context. Snapshot extractions are simply a collection of records with a time context applied. Often when extracting for analytics, companies will take a snapshot at the end of each month for each person or object. We get a result set similar to the below:
The above is a simple approach but will miss out on the changes that occur between snapshot because they're effectively hidden and ignored. When connecting external data sets that are properly effective dated you will need to make a decision on which snapshot is accurate to report against in the above but you simply don't have enough information available to make this connection correct. This is just simply an inaccurate representation of what is really occurring in the data set, and it's terrible for pro-rating calculations to departments or cost centers and even something as basic as an average headcount is severely limited. Close enough is not good enough.
If you are not starting out with a basis of accuracy then everything you do downstream has the potential to be compromised.
Remove the context of time
There's a better way to represent data for reporting and analytics.
- Connect transactional events into a timeline
- Extract the details associated with the events
- Collapse the record set to provide an effective-dated set of records.
The above distills down the number of records to only that which is needed and matches transactional and other object changes which means you can join to the data set at the correct point in time rather than approximating.
Time becomes a flexible concept
This change requires that you apply a time context at query time providing infinite flexibility for aligning data with different time constructs like the below
- Pay Periods
- Any time construct you can think of
It's a simple enough join to create the linkage
left outer join timeperiods tp on tp.date between employee.effective_date and employee.end_date
We are joining at the day level here which gives us the most flexibility and accuracy but will absolutely explode the number of records used in calculations into the millions and potentially billions of intersections. For us at One Model accuracy is a worthwhile trade off and the volume of data can be dealt with using clever query construction and of course some heavy compute power. We recently moved to a Graphics Processing Unit (GPU) powered database because really why would you have dozens of compute cores when you can have thousands? (And, as a side note, it also allows us to run R and Python directly in the warehouse #realtimedatascience).
More on this in a future post but for a quick comparison take a look at the Mythbusters demonstration
What about other objects?
We also apply the same approach to the related objects within Workday so that we're building a historical effective dated representation over time. Not all objects support this so there are some alternative methods for building history.
Data changes and corrections occur all the time as we know we regularly see volumes of changes being most active in the last six months and can occur several years in the past. Snapshots often ignore these changes unless you replace the complete data set each load. The smarter way is to identify changes and replace only the data that is affected (i.e, replace all historical data for a person who has had a retroactive change). This approach facilitates a changes-only feed and can get you close to a near-real time data set. I say "close to near-real time" because the Workday API is quite slow so speed will differ depending on the amount of changes occurring.
Okay, so how do you accomplish this magic?
We have built our own integration software specifically for Workday that accomplishes all of the above. It follows this sequence:
- Extracts all object data, and for each of them it
- Evaluates the data flow and identifies where additional requests are needed to extract historical data at a different time context, then
- Merges these records, collapses them, and effective-dates each record
- We now have an effective dated historical extract of each object sourced from the Workday API. This is considered the raw input source into One Model, and it is highly normalized and enormous in its scope as most customers have 300+ tables extracted. The pattern in the below image is a representation of each object coming through, you can individually select the object slice itself
- The One Model modelling and calculation engines take over to make sense of the highly normalized schema, connect in any other data sources available, and deliver a cohesive data warehouse built specifically for HR data.
- Data is available in our toolsets or you have the option to plug in your own software like Tableau, PowerBI, Qlik, SAS, etc
- One Model is up and running in a few days. To accomplish all of the above, all we need is a set of authorized API credentials with access provided to the objects you'd like us to access.
- With the data model constructed, the storyboards, dashboards, and querying capabilities are immediately available. Examples:
Flexibility - the biggest advantage you now have
We now have virtually all data extracted from workday in a historically accurate transaction based format that is perfect for integrating additional data sources or generating an output with any desired time context (convert back to snapshots if you require). Successful reporting and analytics with Workday starts with having a data strategy for overcoming the inherent limitations of the native architecture that is just not built for this purpose.
We're HR data and people analytics experts and we do this all day long. If you would like to take a look please feel free to contact us or book some time to talk directly below.