That's Not What I Call Data.

Jul 31 2009

I don’t want to be a jerk. Both Garmin-Slipstream’s vocal anti-doping stance and Wiggins’ readiness to reveal the intimate details of his oxygen transport system are laudable. But guys, you have to do better than this:

First of all, I have no idea when any of these tests occurred. The specified dates on the x-axis have no correlation to the testing dates, and the time scales on each graph (4 months, 5 months, and one month) make them useless for comparison. Was Wiggins only tested 4 times in all of 2008? If so, why did three of those tests come in what appears to be less than a single month?

Then there’s the inferred line on each chart—on the “Pre 2009 TdF” chart, how can the off-score rise through the month of March even when there are no data points from that time period? Similar time gaps on the other charts reflect no corresponding rise, and data that could suggest a rise do not receive one.

This absence of hard numerical data is aggravated severely by the lack of context. Who ordered these tests? What did they test for? Why were off-score and hemoglobin concentration the only variables measured? Were they taken at altitude? Is this a full set of test data, or selected points?

For all the drama surrounding Lance Armstrong’s testing results—which, by the way, have not been updated since the end of the Giro—the Texan and his handlers have done an excellent job of cataloging the time and purpose of each test. It may not look as pretty, but for the skeptics, it’s the difference between disclosing your VO2 max and “otra pregunta“.

My point here isn’t to suggest, even slightly, that Wiggins is doing anything untoward. My point is that handing in data like this would fail you out of any 10th grade bio course.

The idea behind releasing this information is to reassure fans and sponsors, and set an example for other cyclists. Presenting some charts that anyone could fabricate with 10 minutes and a copy of iWork is a lousy way to do either.

(report this ad)

15 Responses to “That's Not What I Call Data.”

  1. Miro 31 July 2009 at 3:29 pm #

    Its fine Cosmo. Ultimately, it shows that hemoglobin is normal all throughout, and that means that at least for blood doping, his biopassport looks even.

    what I would like to see is tests for anything else that could speed recovery.

  2. John 31 July 2009 at 4:22 pm #

    I’m not so sure whether or not it really matters, was anyone caught doping on this years tour? none that i can think of, meaning that either there is a new drug that people cant catch yet or people have finally managed to get the message, in bradley wiggins case, personally i believe it’s just one of those fluke experiences some atheletes have where nothing goes wrong. As to your question miro, i really don’t think there are many drugs that would aid in such a way that they would be illegal, all you really need is protien supplements of some kind and a proper massage and you should be fine in that regard

  3. Andy MJ 31 July 2009 at 4:23 pm #

    The rising arc in the month of March is a byproduct of curve-fitting, nothing more. But it is just a default Excel thing, if they were trying to make a better point they’d have used a better curve fit.

  4. cosmo 31 July 2009 at 4:38 pm #

    The problem with a sloppy-fitting curve, Andy, is that this picture is all we have for data. There are no numbers to go to and say “Ah—no data there. Must have just been a sloppy fitting curve”. We just have some blue and red dots.

    Also, sloppiness is a fantastic way to hide things; a dirty window isn’t really transparent. Every I dotted and T crossed is one more level of reassurance.

    As for Miro’s assertion that it shows “normal” hemoglobin throughout, I’d counter that it shows a consistent level of hemoglobin during the Giro and the Tour, and at two points outside competition. How can cynics know he didn’t just dope up before each Grand Tour, and once in Feb 08 and Jan 09?

    Some comparative data for what manipulated profiles would look like might also be nice—anyone have links for that?

  5. iworedettos 31 July 2009 at 4:41 pm #

    whenever anyone asks me what it took to win the tour i tell them “protein supplements of some kind and a proper massage. pretty straightforward.”

    and they go, “okay.”

  6. rainbow 31 July 2009 at 5:43 pm #

    Too many X-files reruns again Cosmo. Miro and Andy are spot on with their comments. More testing would be lovely but it’s a random situation that the rides are subject to. The dates times and purposes of the testing’s are background data that is incorporated in the graphs but not displayed for clarity. Not everyone can afford the best testing regime the word has ever seen, not even “you know who” (thanks JKR)

  7. joran 31 July 2009 at 6:15 pm #

    I’m not qualified to comment on the actual relevance of these numbers to questions of doping, but, being a statistician (nominally at least, if not gainfully employed) I can comment on the graph itself. So I will!

    As per my chosen profession, I hate excel with a passion. So there’s that. Most of the aesthetic criticisms Cosmo made I agree with. I would add the following:

    The left vertical axis needs a label. The whole thing needs a caption (I don’t know what an off score is). I don’t know what the purpose of the solid horizontal blue line is. The title (Giro d’Italia?) should go on top of the graph, rather than the key, which should either be moved in graph or to one side. If indeed there are only 4 data points, the “curve” you fit probably shouldn’t be quadratic. If there are more data points, they should be plotted. Finally, on behalf of statisticians everywhere, I must strenuously advocate for the practice of posting a csv of your raw data whenever you post a graph. I mean, for christ’s sake, it takes you like 3 extra clicks in excel. Why is that so hard?

    Also, sadly, I have to say that this chart would probably seem pretty impressive to your average 10th grade bio teacher. In my experience.

  8. Josh 1 August 2009 at 9:43 am #

    Ouch. Tough love here.

    There is additional context and explanation of terms here (joran): Note that the graphs are appropriately labeled on the primary and secondary y-axis.

    This data certainly does have some gaps and leaves some questions unanswered, as Cosmo illustrated. Keep in mind that Team Garmin-Slipstream isn’t obligated to produce any of this data. This is an encouraging first step. How often do businesses provide more transparency than required? Not often.

  9. Josh 1 August 2009 at 9:58 am #

    P.S. It’s nice to see some skepticism here. It’s a refreshing change from the apparent professional stenographers who staff many of our current media outlets.

  10. Andy MJ 1 August 2009 at 11:04 am #

    I’m not saying they’re OK to leave out data points, but I was pointing out what the “rise in March, when no data was taken” was due to – there wasn’t enough data in the center to make that rise mean anything.

  11. Matt 1 August 2009 at 2:08 pm #

    It’s quite possible no testing was done between Jan and Apr. There are 3 measurements in April/May because that’s when the Giro which Wiggins took part in took place. However, in the Giro, Wiggins didn’t do that great. What they have to show (hopefully the data is coming) is the blood profile for Jun/July (i.e. TdF).

  12. Miro 2 August 2009 at 2:40 am #

    My educated guess on the subject of missing data points is that it doesn’t matter all that much. Hemoglobin is necessary to deliver oxygen during an endurance exercise. What a lot of readers here appear to be criticizing is the “way” data was presented, while what ultimately counts is the biological significance of the points on the graph during his competition. You can assume that Jan point is his starting value, while later values are the same (or lower, which is expected during the race). If Wiggins didn’t race in Feb and March, but had high hematocrit/hemoglobin at that time, it wouldn’t be helpful later when the effects of erythropoiesis stimulant have come down for his races in April and May. The whole point is to have as many RBC at the time of the event (and not get caught). My feeling is that having high hematocrit for riding base is useless at best, and detrimental at its worst, since you would want to train your muscle to work with less oxygen (thus the whole idea of living/recovering at altitude). If January was his “building” month, he could ‘push’ thru his workouts easier with higher hematocrit, but the worth of doing this is debatable imho. I just don’t think, though do not know, if there are any well-designed studies that specifically tested performance later in the season waaaay after using erythropoietin. It is quite possible that using erythropoietin can ‘reset’ your various parameters to a lower than normal level because you boost your hemoglobin beyond necessary. This in turn will be detrimental to your training. For instance, in my studies with mice, their hematocrit crashes 8 days later to lower than normal level (and stays there for a number of days) after one dose of epo on day 0. In other words, its like having coffee for a month, and then suddenly stopping – your body gets used to it, and once you are off, you are really sleepy for a while.

    I do read a fair bit of scientific literature. I don’t mean to sound like a snob/knowitall, but personally I found that the LINK to Wiggins’ data (graphs AND figure legend that explains what the blue line is) contained a very simple layout that didn’t confuse the reader, and most readers are not scientific literature readers. It is silly to question if something is hidden if data points for February are missing. Figure legend explained everything you wanted to know in a few sentences. When you present complex concepts to everyday reader, you cannot overwhelm them with data. This was simple and showed what they wanted to say.

    Cosmo, I think you should find something better to write about.

  13. Keith 28 November 2009 at 6:41 pm #

    The “rise” due to curve fitting may not be *statistically* the best idea – when in doubt, plot a straight line because that’s the simplest function and requires the least amount of information. But taking into account the physics of the situation … a moment’s thought makes it clear that a straight line makes no sense for curves about an athlete’s fitness! One expects any such curve to rise and then decline over the course of a year.

    Now in order to do a fit, one *has* to assume the form of the function being fit (a straight line or power law are the usual suspects, but they are still assumptions). The simplest curve that fits the “rise and fall” heuristic is a second order polynomial, or “quadratic.” So … they assume ax^2 + bx +c = 0 and everything follows from there.

    Makes perfect sense to me.

    As far as missing data … the dude got tested an average of roughly once a month at the beginning of 09, more frequently as we approached the Tour. I don’t see anything untoward there either.

  14. Miro 30 November 2009 at 1:44 am #

    forget what i said about the mice, their hct returns to normal, but their progenitors crash.

  15. cycling jerseys 7 September 2011 at 10:21 pm #

    While the cloth should be featherweight, for cold weather riding, the Livestrong Cycling jerseys should be long-sleeved and warm.

Trackbacks and Pingbacks

  1. The Spanish Cycling Bubble | Cyclocosm - Pro Cycling Blog - 24 January 2012

    […] I’ve found the specific efforts at transparency occasionally disappointing, there’s a general air of openness with the post-Lance generation of American cyclists. Fully […]

  2. The New Reality | Cyclocosm - Pro Cycling Blog - 24 July 2012

    […] it or not, I like Brad Wiggins. Sure, I took issue with his testing data a few years ago, but I stressed then that I think he’s racing clean about as much as I think […]

Leave a Reply