Wednesday, January 15, 2014

Graph Fraud

Happy Wednesday BlogLand!

There are actually a bunch of things I want to post about today, but I'm going to save some of it for a later post.

First of all, go check out this article about clave and percussion rhythms. It is SO cool! I couldn't justify dedicating a whole post to this one awesome article, but I can at least throw it in at the beginning of this one. I absolutely LOVE this sort of stuff that combines math with arts and culture. Plus, it has awesome visual representations AND its about the clave which I also LOVE. This might be the best thing ever. Go read it!

OK, on to the real post. We have to talk about a serious problem plaguing the world of science. Graph Fraud! Even if your data are accurate, there is still so much manipulation that can occur in how you choose to represent that data. Often, we can let our biases and expectations to get in the way of presenting clear data. Here's a fun little story as an example.

In the spring of 2013, the Tap Dancing Engineer went to visit some of her Canadian Tap friends. Recently, the weather had been strangely warm (or maybe it had been strangely cold, I don't recall.) The TDE and the Canadian Tap Dancers hit a conversational stumble. You see, we Americans simply refuse to convert to the logical system of measurement that just about everyone else in the world uses. This system of measurement includes using Celsius instead of Fahrenheit. The Canadians have little reason to learn our system of measuring temperature (only motivation is for small talk with Americans) and vice versa. So suddenly we could not fully communicate and I could not understand exactly HOW unusually warm (or maybe cold) it had been there.

Being the scientist that I am, I of course know that to convert form C to F you simply multiply by 1.8 and then add 32. I generally do this in my head by A) multiply by 2 B) Divide that number by 10 (giving me 0.2 of the original number) C) Subtract B from A and finally D) Add 32. Its a fairly involved process and all of us spaced out for a minute to try to do the conversion in our head. This absurdity made us talk about how to quickly convert between the two numbers.

Canadian Tap Dancer Kylie said "Oh, I usually just multiply by 2 and add 30." Could it be that simple? We did the math both ways and they were pretty close. Something like 72F vs 74F (yep, must have been unusually warm.) As soon as I had access to a computer and a spreadsheet I quickly made a graph to be able to understand the full picture visually. Here is what I found:


WOAH! They are almost identical! Amazing! Its so simple and I can't believe I've been doing all that mental math for all these years. 


...


.......


............



Or, actually, it might have gone something more like this:

Canadian Tap Dancer Kylie said "Oh, I usually just multiply by 2 and add 30." Could it be that simple? We did the math both ways and they were pretty close. Something like 72F vs 74F (yep, must have been unusually warm.) As soon as I had access to a computer and a spreadsheet I quickly made a graph to be able to understand the full picture visually. Here is what I found:

Well, its pretty good at lower temperatures, but its WAY off at higher temps and its just getting worse and worse as it goes. I don't know. Maybe I do need to do all that math. Dang.



So, which one is the "real" data? They BOTH are. Yep, its true.

Whats the difference here? The ranges used for both the x axis and the y axis. First, lets look at the x axis. In the first graph it only includes 0C to 20C (32F to 68F.) Where I live, that covers a pretty large part of the year, but definitely leaves out some critical data. Now look at the second graph. It's x axis range is 10C to 60C (50F to 140F.) If we're discussing the weather, then this graph isn't even applicable. It leaves off way too much on the cold side and, for the US and Canada at least, goes too high on the hot side.

Now, lets look at the y axis. Notice how the first graph has "white space" above and below the highest and lowest points on the curves. The y axis range has been increased in order to reduce the appearance of the white space between the two lines and make them look more similar. In the second graph the y axis range has been reduced as much as possible without the curves disappearing off the top or bottom of the graph. This magnifies the differences (white space between the curves.)

And finally, I also used a different line thickness for the two graphs. By using a 3pt line thickness for the first graph (as opposed to the 2pt thickness default that was used for the second graph) I was able to make the lines look even more "on top of each other." 

So, always check your ranges. Are they reasonable for the data that is being represented? Is there a bunch of unnecessary white space on the graph? If you are comparing two (or more) graphs ALWAYS be sure to check that the axis ranges are the same. You cannot compare graphs with different ranges! And, if you ever see an article making a point with graphs that have different ranges, that is a very good sign that the author is not a good scientist and is presenting data that is skewed to support their cause. It may not be intentionally skewed and even if you unskew it, it may still support their cause, but you should still take all information from that source lightly and skeptically. 

So, lets unskew this data! I did some quick Wikipedia searching for average January temps in various Canadian cities (~-20C) as well as the Ave July high in various Arizona cities (~110F.) I have decided to set my x axis range to be -20C to 45C and set my y axis range to -20 to 130 (this is 10 degrees F outside of my lowest and highest points.) I also left all line thicknesses at the default of 2pt. Here's the graph I made:


Oh good! They are pretty darn similar. Not exactly the same, but close enough. Right?

Or ARE they?


I think its pretty obvious what I did here right? The only difference is the size and shape of the two graphs, but they look really different. Now imagine you are writing a scientific article and trying to fit a graph in without messing up the formatting of the rest of the paper. You want badly for that graph to fit in the small amount of white space you have for it. If you are trying to show that these two lines are the same, you probably wouldn't think twice about smushing it down to be only a few lines of text high. If however, you were trying to show that these two lines are different, you wouldn't dare smush your graph and risk reducing the effect of your visual. You'd probably just make it HUGE and dedicate a whole page to it instead. See how easy it can be to unintentionally skew your visuals? 

The take away? Pay attention. Especially pay attention when multiple graphs are being used to tell a story. Good scientists with solid data don't need to resort to these tricks. Good scientists with less than solid data will still present that data properly even if it means their visuals are not as clear as they'd like it to be.

Finally, a quick little note about how far off the Kylie conversion is from the actual conversion. By simple subtraction we see that for every 5 degrees C, the Kylie conversion gains 1 degree F in error. 


So there you have it. For general weather use, you'll probably be off by less than 5 degrees F. If you live someplace like Arizona or Minnesota, use the Kylie conversion with caution during extreme temperatures. Also note that using the Kylie conversion at very high and very low temps just makes you exaggerate the extremes. If its -22 and you claim it to be "about -30" I still see that and think "so its really REALLY cold. Same for the hot side. People that do not live in areas that regularly experience these weather extremes have a hard time wrapping their heads around what -22 or 110 feels like so being off by a few degrees doesn't change the conversation that much if you're talking to someone from outside of your area. To be safe, however, you should probably shave a few degrees off your statements at extreme temps (perhaps say "Its about -25." and then you're not off by much at all.)

Hopefully none of us will have to deal with temperatures quite that cold again for a while... 

Is the Kylie conversion good enough? When discussing the weather with her Canadian friends, the TDE will be using the Kylie conversion from now on. If you're writing a research paper or lab report, you should the official conversion. 

I hope I've made you a little more aware of Graph Fraud. Together, we can help others learn to eliminate Graph Fraud from their lives. Pass it on!






1 comment: