This week I was invited to take part in a roundtable discussion about the role of “Big data” in sports, since big data seems to have become a hype in sports. (But only for people who don’t not understand sport or do not understand big data). Many clubs, federations, companies and coaches jump this bandwagon. Their enthusiasm mainly comes from well=marketed success stories of Google, Facebook and other really large datasets. Yes, in some sports like baseball or basketball in the US, there is a long history of gathering statistical data from competition results, teams and players. But are these data really of any help to improve sports performances?
The real experts in big data gathering and processing are banks and intelligence service like the NSA. But is their effort really as efficient as they want us to believe. The “supernerds”, like mathematicians and statistician have been hired by banks and traders for the longest time, but still they failed to foresee the tremendous economic crisis. Thousands of well-paid, brilliant people have been working day and night and nobody saw this rollercoaster coming our way? Remember that nobody forecasted the fall of the Soviet Union, the Arab Spring or 9/11, despite that terabites of data gathered in order to predict events like this.
We unravelled the human genome for some time now, but did it really brings closer to a deeper understanding about human life. Still, on a monthly base the scientific journals pay attention to that fact scientists are puzzled by the fact that very simple organisms possess many more genes that we as humans do and we are considered to be the most advanced and complex organism around. A simple pine tree has 7 times more DNA than we do.
In my opinion one of the simple factors is the eternal and global contradiction between quality and quantity. “More” does not necessarily translate into “better”. More does not automatically translate into more relevant or more significant. And the same applies to information or data. More data simply makes it harder to see or find meaningful data or patterns. This is already the case with “tiny data” like doing all kinds of test on an athlete.
Data gathering all starts with a choice which data you gather and which data you do not. A mistake here will leave you clueless already. The most easy way it out of this it seems is to gather as many data as you can and then try to find something useful. It’s like collecting more bathwater to find more babies. An example to make this more clear. I work with talent coaches from all kinds of sports. Asking them what makes the difference between a real big talent and a reasonable athlete, they come up with twenty expressions like: the talent is more dedicated, more motivated, more flexible, has higher sport-specific intelligence, more goal-oriented, better in self-regulation, etc….. Perfect!
So one might assume that a talent-scouting system would try to collect data about these factors, since these factor seem to make the real difference….. But….. nothing is further from the truth, they collect data about VO2 max, explosive strength, height, fat percentage etc. Now you can collect all the data in the world about physical factors only, but they have very little predictive value about the development of the total performance of an athlete in the future.
One clear example comes to memory when I spoke to former GDR sprint coaches about their sophisticated and comprehensive talent scouting system at that time. They told me , that my own athlete Nelli Cooman, did would have had no chance to being embraced by the GDR development system, since she was considered to be too short, still she beat the very best GDR sprinters at that time., who were dominating the field. I am pretty sure that Usain Bolt also would not stand a chance to in that system at that time from being too tall…..
So even if you have the data you can still draw the wrong conclusions and that easy because big data is about the middle part of the Gaussian curve or the inverted U-curve, that is where you can find a lot of data, in the range of the average and the median. Not at the extremes, how big is the dataset of world recordholders in sprints. Apart from the fact that we have very little data that have been systematically collected and shared from this group anyway. Big data also might lead one to drift away from the personalization (lie personalized medicine and personalized nutrition) and individualisation of training.
Another limitation: we also have been collect weather patterns for hundreds of years but still is impossible to predict the weather for more than a few days in advance, despite the sophisticated models applied nowadays.
Still I am a firm believer in data, being an obsessive-compulsive data collector myself, but in an early stage I found out that collecting more data isn’t necessarily and automatically in having more or better information I can work with. The secret is no in Big data, but rather in small data, well chosen, well collected, adequately processed, and then drawing the right conclusions and optimally applied.
Henk’s rule applies here too: collect data as much as necessary, not as much as possible….
More data can help to good coaches into better coaches, but will turn mediocre coaches into confused coaches.