QM Assignment: Two Lies and a Truth
“There are three kinds of lies: lies, damned lies, and statistics.”
Due Date: November 4, 2010, to be handed in at the beginning of class
This assignment asks you to consider the ways that statistics are frequently misused to prove falsehoods. The assignment has three parts. First, you will find your data sources. You must use two. You must use data from either www.worldbank.com or the United Nations Human Development Index (find this via LEA). You might compare that data set with FIFA.com, if you wish. Second, you will use two different tools that we have learned in this class (graphing, central tendency, scatterplot, correlation) to prove something you know to be untrue. Third, you will use another method of statistical analysis to show the ‘truth.’ In addition to providing statistical analysis in this third part, you will write a conclusion paragraph in which you explain why the measures used in part two were misleading and how the measures used in part three correct for those errors.
Points will be awarded for:
a/ the ridiculousness of the lie (examples of types of lies can be found in Part III and will be discussed in class)
b/ the degree to which the lying ‘report’ shows mastery of two skills—one visual and one using a measure of central tendency
c/ the degree to which the measure chosen in Part III corrects the mistakes in part two
d/ the clarity of your written analysis
Presentation should conform to the following:
Part I (one page)
The lie: Write a paragraph (modeled after a newspaper report) that outlines an outlandish discovery. For example: that all soccer teams with more blue-eyed players do better at the World Cup than those with green-eyed players.
Part II (one page)
Using real data sources (you will list these in your bibliography), provide “proof” of the lies in Part I. The proof must include a frequency data chart, an analysis of central tendency and a chart.
Part III (one page)
Using a different method of statistical analysis, show how the first parts were able to lie and then use statistics to illustrate a more reasonable or truthful analysis of the data. You might use standard deviation for this part. You must categorize the lies of Part II into one of the following categories:
1. The misrepresentative measure of central tendency (ie. Mode, when mean says more about the data set, or Median when Mode would say more)
2. The graph without numbers/ units or with a well-chosen break in one axis
3. The sample with a built-in bias
4. Post hoc fallacy – If A follows B, A must have been caused by B.
5. Stasticulation (Statistical Manipulation)
1. The lie: Asia has seen increased unemployment over the past 10 years.
The data: Malaysia and Singapore, the only two countries for which this is true
The truth: Asia, as a whole, has had increased employment, those two saw decreased unemployment because of their unique political situations.
2. The lie: Increased CO2 emissions causes increased life expectancy.
The data: Averaged CO2 emissions and life expectancy reports
The truth: Both these variables correlate to increased industrialization. When we plot the two together, it appears as if the one causes the other, but it doesn’t. Both are effects of increased industrialization and increased wealth.
For example, can you figure out what is wrong with the conclusion below and which of the above categories of misrepresentation it falls under.
Q: What effect does going to college have on your chances of remaining unmarried?
A: If you’re a woman, it skyrockets your chances of becoming an old maid. But if you’re a man, it has the opposite effect—it minimizes your chances of staying a bachelor
McGill University made a study of 1,500 typical middle-aged college graduates. Of the men, 93 per cent were married (compared to 83 per cent for the general population). But of the middle-aged women graduates only 65 per cent were married. Spinsters were relative three times as numerous among college graduates as among women of the general population.
1. Go to http://www.worldbank.org and find a set of data that is measured over time.
2. Download it into excel.
3. Make it into a graph that charts change over time.
4. Remake it twice to emphasize different perspectives. Include a textbox which ‘explains’ the different interpretations.