It is hard to deny that we are surrounded by statistics every day. Whether it be the results of a study, the newspaper or even on the label of our favourite juice – we are faced with a statistic. However, everyone is used to overlooking it. Sure, people read it. They may even make a comment about it if the percentage seems high but they do so without truly understanding.
The truth is not many people think statistically because it is a whole lot of numbers and percentages. However, with a deeper understanding of statistics, we will be able to get a much clearer picture of what these numbers are actually telling us. Furthermore, we might be able to see through the sometimes skewed statistics that we are faced with.
In this book summary readers will discover:
- Why achieving a random sample is hard
- Different types of averages
- Things to be wary of when it comes to statistics
- How to defend yourself against bad statistics
Key lesson one: Why achieving a random sample is hard
When it comes to statistics, the best results come from studies with an adequate sample size that is random. This may seem like a strange term if you are not familiar with statistics but ensuring that the sample is random is crucial for eliminating any bias in the study. This is actually pretty hard to get right. Further adding to the difficulty, the cost involved to obtain a truly random sample may be quite high.
For example, if your study involved interviewing 30-year-old men to find out how often they went to a bar, you would have to randomly pick 30-year-old men for the study. You would not look at the social status, how much money they earned, the area they lived in or if they were employed. This becomes difficult if researchers do not find creative ways to obtain a randomized sample.
Ignoring the importance of a truly randomized sample would lead to sample bias and results that are skewed. Stratified random sampling is one technique that can be used to ensure a random sample. Using our example from above, once you have a registry of all the 30-year-old men that you like to interview, you would then divide this into subgroups based on common factors. This could be race, location or any other factor that they might share. Once you have these subgroups, you then choose a random sample from each subgroup to get your completely random group.
Key lesson two: Different types of averages
What does the word average mean to you? In statistics, there are three different types of averages and they mean very different things. Therefore it is important to know which one you are dealing with because others can use this to their advantage.
The mean represents the arithmetic average and is the one we are most familiar with. If someone said that the average annual salary in their company was $100 000 this would mean that they took all the annual salaries of every employee in the company, added it and then divided it by the total number of employees.
In contrast, for the same company, the median would represent the middle value from the dataset. So, in terms of the employee salaries, they would list all the salaries from lowest to highest and the median value would be the value in the middle. To explain this further if there were only 5 employees in the company and they earned $50 000, $50 000, $150 000, $250 000 and $500 000 – the median would be $150 000.
Using the same company again, the mode would describe the most common income. With the values given above, the mode would be $50 000. This example clearly exhibits how averages can differ and why it is important to know which one is being reported.
Key lesson three: Things to be wary of when it comes to statistics
For a research study to be considered statistically it must have an adequate sample size to avoid significance bias. If the sample is too small, it cannot possibly be statistically significant. You just have to consider a coin toss to appreciate this concept. In a coin toss, the probability of getting tails is 50 per cent. However, if you tossed a coin six times right now, how many tails do you get? The chances of it landing on tails three times is not likely. Why? Because the coin toss was not replicated enough times. The more you toss the coin, the closer you would get to 50 per cent.
That is why studies need to have an appropriate sample size to avoid results that are exaggerated incorrectly. This is exactly what some researchers employ when they want to impress the public. Think about the labels present on a carton of milk that states that people who drink this milk reported 25% higher levels of calcium. How many people were included in the study? Did they take vitamins in addition to drinking milk? Researchers would present the best result obtained from their sample groups regardless of these questions.
But besides sample size and sensational results, you should also look out for missing values. The standard error, in particular, gives you an indication of the average lack of accuracy in the data that has been measured. This number will never equal zero but will get closer to it the more accurate the sample is. A large standard error means that your sample is not representative of the group of interest.
Another diversion tactic used by researchers is called semi-attached figures. It is used to compare two things that are totally different. The most common examples come from the pharmaceutical sector. When advertising a new medication, they often don’t explicitly state its effectiveness but rather another arb fact and try to link the two. For example, the new medication was successful in removing 99 per cent of germs in the laboratory in just ten seconds. Now as much as this may be true, is it effective in treating illness? Who knows?
This is exactly the same method employed when products claim to be 50 per cent more effective or contain 75 per cent more of – they don’t tell you what they are comparing it to. Therefore, you need to be wary of comparisons and question if they are factors that are comparable in the first place.
This brings us to post-hoc fallacy. Post-hoc fallacy occurs when causal relationships are assumed just because two things happen at the same time. Basically, they are incorrect correlations. A simple example would be there has been an increase in the temperature in the oceans as the number of divorces increased. These two have no impact on each other and we cannot assume that they do. Therefore, you must always remember that correlation does not always mean causation so you should not jump to conclusions.
Key lesson four: How to defend yourself against bad statistics
As much as some people lie with statistics, others could just have made statistical errors unintentionally. This usually occurs when they don’t quite understand the statistics completely or don’t realise the error they have made. It is important to be wary and don’t believe everything straight away.
There are three things you can do to defend yourself from potential statistical lies. Firstly, remember two questions – Who ran the study and what are their motives? This is important because studies that are funded by companies usually want a result favourable to their cause. So, in these cases, it might be worthwhile to scrutinize the results carefully.
Secondly, look at the data that is presented and that is missing. Knowing what you do now about statistics ensure that everything about the study ticks all the boxes. Is the sample size adequate and were they selected correctly? Were any factors overlooked and are the correlations sound? Also, be sure that the standard error and type of average used are presented and correct for study. When it comes to statistics if everything is presented and there are no missing parameters then you can be certain that the researchers have nothing to hide.
Lastly, when results are presented there should be a gradual flow of information. There should not be abrupt changes in the subject matter. If the study jumps straight from the raw data collected to the results without an explanation, that serves as a red flag. Once again, if there is nothing to hide, everything will be presented accordingly.
These safeguards will prevent you from being swept up in incorrect reporting. As much as statistics are confusing, asking these questions can help sort out the truth from the lies.
The key takeaway from How to Lie With Statistics is:
Statistics is an important part of research. However, it can be easily misinterpreted if you are not familiar with common problems and can also be manipulated to produce exciting results for the benefit of those in charge. Therefore, it is important not to get caught up in impressive statistics and instead keep an eye out for any biased reports. Once you know the basics and understand what tactics are used to deceive and divert attention, you will never be tricked into believing anything and everything reported.
How can I implement the lessons learned in How to Lie With Statistics:
Don’t be afraid to ask questions. If you are presented with a statistical report that sports all the right numbers but lacks any detail, feel free to ask questions. Was the sample size big enough? Did they consider the randomness of the sample and what numbers are they actually reporting. It will become clear that you will not be fooled and want the study conducted and reported accurately.