Equalising race distances for men and women in XC - a statistical analysis 

Preface:
I’m not an expert, just an athlete with nothing better to do
This data was pulled from publicly available sources
This is only a case study, not a comprehensive investigation


Preface:



CASE STUDY: Scottish National XC Champs 2011-2020
Reasons for choosing this race:
Encapsulates both mass participation and elite competition in one race
Recently equalised race distances for senior men and women, providing sufficient data pre- and post- distance change

Reasons for choosing this race:


FACTORS BEING CONSIDERED:
Participation
Duration of race (w/ regards to officials & time for slowest runners)
Race excitement (w/ regards to spectators & race for title/medals)
Race quality (w/ regards to participation levels from elite to good club level athletes)






Men - 538
Women - 231
Total - 769

Men - 666
Women - 294
Total - 960

Men +23.8%
Women +27.3%
Total +24.8%
It might appear as if the change in distance has caused an increase in participation, however if we view the data as a whole we can see there is no significant increase in the upwards trend during the time of the race distance change
There appears to be a significant increase in participation in 2019, which could be attributed to the distance change with a lag effect, but due to the decrease in 2020, it seems more likely this was due to other external factors.


Men - 74:20
Women - 49:58
Total - 124:18

Men - 69:58
Women - 71:51
Total - 141:49

Men -5.87%
Women + 43.8%
Total +14.1%



For this factor, I have analysed the spread of the top 10 athletes in each race, as the closer this is the more exciting it is for the spectators.
Range, standard deviation
Pre change top 10 spread
Men - 1:51, 36.3s
Women - 2:06, 41.4s
Post change too 10 spread:
Men - 1:44, 32.4
Women - 2:35, 54.6
Change
Men -7s, -3.9s
Women +29s, +13.2s

Men - 1:51, 36.3s
Women - 2:06, 41.4s

Men - 1:44, 32.4
Women - 2:35, 54.6

Men -7s, -3.9s
Women +29s, +13.2s



For this factor I have analysed the spread of the top 50, top 100 and the entire field.




Men - 4:59, 7:15
Women - 5:18, 7:55

Men - 4:05, 5:43
Women - 6:53, 10:42

Men -18.1%, -21.1%
Women +29.9%, +35.2%




This is obvious due to the change in race duration, but it is worth noting that the change has also made the spreads less equal between men and women

(standard deviation and interquartile range)

Men - 6:44, 8:58
Women - 4:34, 6:18

Men - 6:34, 8:53
Women - 7:05, 10:10

Men -2.46, -0.9%
Women +55.1%, +61.4%




It is worth noting that the change in distances has made the spread of the entire field more equal between men and women
Conclusion
In this case study, the equalising of race distances has been shown to:
Have little effect on participation levels
Increase event duration for officials
Reduce equality in the spread of the top 50/100
Increase equality in the spread of the entire field

In this case study, the equalising of race distances has been shown to:




Further considerations 
What this study does not consider:
How does this affect age groups other than senior?
How does this affect the athletes’ transition from junior to senior and retention levels through the age groups?
Is this data replicated in other events?

What this study does not consider:




This was not meant to provide a solution, and I am not qualified nor in a position to make this decision, it is meant to provide insight & evidence into the effects the decision may have.
Thanks for the inspiration & help with the data/analysis @CordyParker https://twitter.com/cordyparker/status/1352259722108395521