In brief
Before building a turnover dashboard or launching a predictive model, one step is essential: prepare and explore your data. Missing values, relevant variables, chi2 test: find out how to structure the exploratory analysis of your HR data to identify the real drivers of attrition in your organization.
Would you like to understand why you should conduct a turnover analysis in your organization and what data is important for this? The first part of our study on the data to be taken into account when analyzing turnover has revealed the different facets of this phenomenon, and gives you the employee characteristics and variables to be taken into account.
Let’s dive into the second part of our analysis. In this section, we’ll prepare our data for analysis. We’re going to look at each variable and understand its influence on the employee’s voluntary departure.
Keep in mind that the variable names represent our use case, but that you need to adapt them to your case.
Why data preparation is the most critical step:
In a turnover analysis project, the temptation is to quickly move on to visualizations and insights. This is a common mistake. A dashboard built on poorly prepared data – missing values ignored, irrelevant variables included, biases undetected – will produce erroneous conclusions and counter-productive decisions.
The quality of the analysis depends entirely on the thoroughness of this preparatory stage. This is one of the 5 classic pitfalls of HR Data projects, which we detail in our dedicated article.
Data preparation and exploratory turnover analysis
To better understand the factors influencing turnover within your organization, it’s essential to look atHR data analysis.
Thanks to the use of data analysis tools such as Python (which we will also use in the rest of the study), we can import these data and prepare our study.
What is exploratory data analysis?
Exploratory Data Analysis (EDA) is used to analyze and study data sets and then summarize their main characteristics, using visualization methods or statistical tests. It enables you to better understand the variables in a dataset and the relationships between them, and thus determine the best way to manipulate data sources to get the answers you need.
1st stage of EDA: analysis of missing values
The first thing to do is to look at the missing values. These gaps can be revealing and need to be treated with care. We notice that more than 50% of the employee and manager score data are missing, so we decide to exclude these variables. For the other variables, missing values represent less than 4% of the dataset, so we simply delete the rows associated with them. This way of thinking is just one example, but it’s important to analyze the impact this kind of decision can have on the final results.
What is chi2 and why use it to analyze turnover data?
Now we want to know whether all the variables selected at the outset are relevant to our analysis. We will therefore apply a chi2 statistical test of independence to each of our variables individually.
We choose to use the chi2 test because it’s a test of independence for categorical variables that shows whether one variable depends on another. Here, we want to know whether a variable has an influence on departures, i.e. whether departures are dependent on any variable. For example, in our case, if we apply the test to the age variable, we have seen that age has an influence on departures.
We can therefore model a contingency table:
This contingency table shows that the youngest employees
(aged 0-30 and 31-40) account for a high proportion of departures: 42% + 35% of departures are in these age brackets. This is not the final result, but a complement to the results that may or may not be visible on the dashboard.
The chi2 test therefore enabled us to determine which variables to exclude from our study. In our case, we have deduced that the number of internal transfers and long-term leave have no impact on departures, so we will not take them into account in our study. Apart from these, we will use all other variables.
Conclusion on data preparation and exploratory turnover analysis
To conclude this second part,exploratory data analysis has proved to be a valuable tool for deciphering the complexities inherent in our data set. In this way, we have been able to identify the variables that merit particular attention in the next stages of the study.
The chi2 test, applied systematically to all variables over an extended period, provides a rich and relevant dataset for thestudy of turnover.
SQORUS is your partner of choice in this process, providing expertise, innovation and personalized support. Contact us today to discuss your specific requirements.
In our third and final part, we show you how touse PowerBI to easily visualize turnover data and find solutions.
This tool will enable you to immediately grasp trends,identify significantturnover factors and make informed decisions to optimize payroll management.
HR Data strategy: what if we accelerated?
Imagine a world where the HR function is propelled into a new dimension thanks to the power of data. What if this world were within our reach? Discover how to harness the full potential of HR Data to revolutionize your organization.
Contact
A project? A request?A question?
Contact us today and find out how we can work together to make your company’s digital future a reality.
FAQ – Exploratory analysis of turnover data
What is exploratory data analysis (EDA) and why is it essential?
EDA is the first step before any modeling or visualization. It enables us to assess data quality, understand the distribution of variables and identify those which are really linked to the phenomenon under study, in this case, voluntary turnover. Without this step, the risk is to build dashboards on biased or unrepresentative data, leading to poor decisions.
Why use the chi2 test to analyze turnover?
The chi2 test is a statistical test of independence adapted to categorical variables (age, gender, type of contract, department). It can be used to determine whether a variable is statistically linked to departures, or whether the observed relationship is due to chance. It is a rigorous filter that avoids including irrelevant variables in the final analysis.
What tools should you use to prepare a turnover analysis?
Python (with the pandas, scipy and matplotlib libraries) is the reference tool for advanced statistical analysis. R is a powerful alternative for statistical testing. For less technical HR teams, Power BI or Tableau enable exploratory visualizations that are accessible without programming skills. The choice depends on your organization's data maturity and the skills available in-house.



