Man looking at dashboard with large looking glass at this feet

Multivariate Regression Insights in Practice!

A few weeks ago, I wrote this newsletter: Why You Need Multivariate Regression for True Pay Equity. A few days later, I saw a follow-up post from Thomas de Haas. He used the theory of the article and turned it into a practical example, by running this analysis on a company and showing the results. His post perfectly illustrated the topic, and I thought it would be great to share so you can a practical example. And because I really hope you will apply it in future.

I reached out to Thomas and he graciously agreed to share his example in this newsletter! But first, let’s meet Thomas!

Please tell us a bit about yourself:

My name is Thomas de Haas, and I am currently a Master’s student in HR and Change. I have a background in Social and Organizational Psychology, and statistics was one of the topics in which I excelled in from the start, earning top-percentile grades in the program.  Furthermore, I am currently interning at ASML as an HRBP. My role is centered on fostering well-being and nurturing leadership skills within my sector, while leveraging data to create effective solutions. I also love traveling and will be going on an exchange semester to Taiwan to further develop my international leadership and management skills.

How did you come up with the idea of adding a practical example?

Since earning my bachelor’s degree, I have been interested in research, particularly how it establishes a foundation for building proof and applying findings from studies to achieve more and accomplish tasks in proven ways. However, much of today’s research is unnecessarily complex with vague conclusions and no practical applications. There should be better connections between research and practice because many companies are decades behind research findings. I chose this example because it is simple enough to illustrate how to make research more tangible and practical.

What would you say to readers who think statistical analysis is too difficult and complex?

I agree that some advanced analyses are difficult and complex, and this is not helped by professionals’ complex explanations (they are too distant from non-experts). At the same time, I think the most useful analyses are not difficult to understand. For example, explaining that a regression analysis examines how a variable, such as gender or age, affects someone’s salary is not a hard idea to grasp. Additionally, advanced tools like Copilot and AI make it easier than ever to ask, “How do I perform a regression analysis to predict Y based on X in Excel?” Therefore, I believe there is a significant opportunity to learn how to use analytics in our daily lives to make better, more efficient decisions than ever before.

And here’s Thomas’ example:

Many companies still use simple averages to compare gender and ethnicity pay gaps. While this method may seem straightforward, it fails to account for the various factors influencing salary differences. From a statistical standpoint, simple averages do not explain the variance caused by multiple variables, such as age, gender, ethnicity, tenure, experience, education, job responsibilities, and performance. To truly understand pay equity, it’s essential to conduct multivariate regression analysis, which allows us to determine how much each factor influences the pay gap and whether people are being paid fairly when all relevant factors are considered.

Here’s a practical example to illustrate how regression analysis works:

For Company A, we want to analyze the influence of the following variables on salary: Age, Gender, Tenure at the company, Education level, and Experience

For categorical variables like gender and education level, we’ll convert them to numeric values (e.g., Female = 1, Male = 0; Master’s Degree = 1, Bachelor’s Degree = 0).

This leads to the regression formula:

Salary = β₀ + β₁·Gender + β₂·Age + β₃·Tenure + β₄·Education + β₅·Experience

Where:
β₀ (Intercept): The baseline salary when all variables are zero.
β₁·Gender: The salary difference between males and females, assuming all other factors are the same.
β₂·Age: The salary change per additional year of age.
β₃·Tenure: The salary increase for each additional year at the company.
β₄·Education: The salary difference based on education level (Master’s vs. Bachelor’s).
β₅·Experience: The salary effect of years of work experience.

Once you’ve collected your data, you can input it into tools like Excel, R, Python, or SPSS to conduct the analysis. The output will include values for β (coefficients), p-values (to assess significance), and R-squared (explaining how much variance is accounted for).

regression analysis

In addition to including individual variables in a regression analysis, it’s also valuable to consider interaction effects, which examine how the relationship between one variable and salary may change depending on the value of another variable. Statistically, this involves creating a new term by multiplying two variables—typically standardized—to investigate whether their combined effect significantly influences salary. For instance, the impact of tenure on salary might not be the same for men and women. The gender pay gap could either widen or narrow as tenure increases, suggesting an interaction between gender and tenure. To capture this, we would include an additional term such as Gender × Tenure in the regression model. The revised formula would then be:

Salary = β₀ + β₁·Gender + β₂·Age + β₃·Tenure + β₄·Education + β₅·Experience + β₆·(Gender × Tenure)

Including interaction terms helps uncover nuanced patterns that may be missed in a model with only main effects, offering deeper insights into the dynamics of pay disparities and helping organizations make more informed, equitable decisions.

Hope you found it helpful to see the theory in practice! And a big Thank You to Thomas!