The Gender Pay Gap in Data Science Salaries

A painting of coins on a table by Josef Wagner-Höhenberg.

The gender pay gap is a contentious issue, especially in tech where women are historically excluded. We can explore the gap in Data Science salaries a little with the same Insight data I used last time to look at Data Science salaries in general.

Others have looked into the same question before: Florian Lindstaedt used a much larger (but less clean) dataset from Kaggle to look at the issue on his blog. He found that for data scientists younger than 30, women earned slightly more, but in the 30–35 age group men earned more.

My data is much smaller, but better curated. However, it has some biases in that it is collected from Insight alumni who are mostly:

Asking the respondent’s gender was added to the survey late, so around a third of the data does not have that information. This leaves us 79 men and 28 women. Not a huge sample, but better than nothing.

Of course, this low number of woman might itself be a further bias: Insight generally has pretty gender-balanced cohorts, so that fact the many fewer women have filled out the survey is worrying. It is possible that non-response is correlated to the underlying distribution, for example, perhaps people who are paid less refuse to report.

The data used in this post is available here. The notebook with all the code is here (rendered on Github).

Pay: Men Vs. Women

Here is total recurring compensation1 by gender. I have removed all non-data scientists (like the MLEs I looked at last time) because there are very few responses from them. I have also removed the one data scientist who responded “transgender” without further indicating their gender identity.

So, how is pay equality in data science?

A swarm plot showing salaries for male and female data scientists.

Pretty equal, actually! The median woman in the sample earns more than the median man, but of course the number of samples is really small.

Gender Median Total Compensation
Female $149k
Male $139k

There are lots of things I would like to explore—like “do women see the same benefit from seniority as men?”, as I observed last time—but I just do not have enough women in the sample to say anything conclusive.

Instead I will look at salaries by region (which I know drives large pay differences) and age, which Florian looked at.

By Region

Only California (LA, San Francisco, and Silicon Valley) and the Northeast (New York, Boston, and DC) have enough respondents to form any reasonable conclusions, so I limit my sample to those regions.

A swarm plot showing salaries for male and female data scientists in California and the East Coast.

Again, these look pretty equal, with the median woman earning slightly more than the median man in both regions.

Region Gender Median Total Compensation
California Female $168k
Male $162k
Northeast Female $145k
Male $136k

By Age

Finally, I can check what Florian found: that women under 30 earned more than men in the same age range, but men out earned women in the 30–35 age range. I use the same selection as above, but now partitioning by age instead of region.

A swarm plot showing salaries for male and female data scientists in California and the East Coast by age

I do not see Florian’s trend; instead the salaries look roughly equal, with the median woman earning more in every age group, as shown below:

Age Gender Median Total Compensation
0 to 30 Female $155k
Male $140k
31–35 Female $164k
Male $148k
36+ Female $180k
Male $138k

Conclusion

In my small dataset, women in data science earn the same as men, and they do so across regions and age groups. I wish I could have explored more slices of the data to look at things like seniority, percent of compensation in stock, etc., but slicing the data very quickly reduces the number of data points beyond usefulness.


  1. Salary, yearly bonus, and yearly stock grant. Signing bonus is not included.