User-friendly statistics

The perspective here attempts to make statistical methods simpler but no less powerful

Michael Wood (MichaelWoodSLG@gmail.com)

 How sure are we? Suppose you are told that your life expectancy is another 40 years, or that women have better emotional intelligence than men. But how accurate is the 40 years, and how certain should we be that the assertion about women is correct? This article explains two statistical approaches to questions like these - testing hypotheses and estimating confidence levels - from first principles, explaining both how to get the answers, and why the answers are sensible. Both approaches can be carried out by "resampling" or bootstrapping" using this spreadsheet. The article also includes a very brief discussion of probability in terms of stories about equally likely possible worlds. The same spreadsheet can be used to explore how this idea works in practice using Monte Carlo simulation. Making predictions. Suppose you wanted to devise a formula for predicting the price of a house its size, location, etc. One method for doing this is known as regression, although I think "straight line prediction formula" would be clearer. This horrible video explains how it works, and this video explains the rationale behind it. There's a similar explanation using a different example in Chapter 9 of my book, and a spreadsheet here to explain how regression works (this is an improved version of the one referred to in my book). A more general perspective on the idea of making knowledge simpler and more efficient.

The basic idea of statistics is to take a lot of data and analyze it to show patterns that may not be clear from a superficial glance. It enables us to learn about the dangers of smoking and of living near a busy road, to assess the effectiveness of innovations in various domains, and to discover that in 2011 there were 7.8 road deaths per 100,000 of population in Belgium compared to the UK average of 3.1 (why?), and so on. Life would be much poorer without the improvements brought about by statistics in so many domains.

However ... statistics is notorious for being a difficult subject to master. This reputation is deserved: the concepts involved are inevitably subtle, but the difficulties are magnified massively by unhelpful jargon, dense mathematical techniques covering very restricted situations, and the promotion of the wrong concepts (e.g. null hypothesis significance tests). Statistics needs simplifying by focusing on user-friendly approaches which clarify the rationale behind concepts, by ignoring superfluous concepts and techniques and by taking care to avoid unhelpful jargon. I think this is possible without sacrificing its power  and usefulness; in fact a fresh, simpler, perspective is likely to make the process of learning and using statistics more efficient, effective and useful.

I have been trying to make statistics more user-friendly for a long time. I used to teach an introductory course for students on an MBA and similar courses which I tried to make it as simple as possible, and use more intuitive methods whenever I could. I took these ideas a bit further in my book. The philosophy behind this book is summed up in the first chapter.

What I want to do here is to take this philosophy a bit further still. The results are in the links in the box on the right.  Each of these should make reasonable sense on its own. Chapter 3 of my book may be helpful as background on how to summarize statistical data by graphs, averages, correlations, etc. (The data sets mentioned in this chapter are on the web at drink20.xls, drink.xls, iofm.xls, shares.xls.)

But what is statistics? From my perspective the core concepts are averages, probabilities and randomization. It is impossible to predict with certainty whether a particular person who smokes will develop lung cancer or exactly how many more years they will live, but the statistics can estimate the probability of developing lung cancer, and an average life expectancy for people in a similar position. It usually does this by taking a sample of data, and it is obviously important that this sample represents the whole population as accurately as possible. In practice the easiest way of doing this is often to use a random sample – this is chosen so that every member of the population is equally likely to be selected so there shouldn’t be any consistent bias.

Randomization is also important in experiments or trials such as drug trials. Suppose you wanted to compare a drug treatment for a disease with a placebo, and you decided to ask patients whether they wanted the drug or the placebo. The difficulty would be that the two groups would be different in ways that would almost certainly affect the result (e.g. those who were more seriously ill might opt for the drug). The solution is to allocate patients to the drug or placebo group at random.

There is a more detailed discussion of the nature of statistics, and it relation to the fuzzy logic and the idea of chaos in Chapter 4 of my book, Making sense of statistics: a non-mathematical approach, and some thoughts on designing and interpreting practical investigations using statistical methods in Chapter 10.

The twin sister of statistics is probability theory. As well as being essential to statistics itself, it also enables you to go from simple assumptions (like each of the 59 numbered balls in the UK Lotto draw is equally likely to be drawn) to conclusions about how likely particular results are (e.g. the probability of winning the jackpot can be calculated as about 1 in 45 million) or about averages in a large population (e.g. the average number of winners of a particular prize).

The boxes below, and above right, list some articles and other resources on the theme of simplifying statistics and making it more useful.

 Making statistical methods more useful: some suggestions from a case study (Sage Open, vol. 3, no. 1). User-friendly statistical concepts for process monitoring (Journal of the Operational Research Society,  1998). Beyond p values: practical methods for  analyzing uncertainty in research (draft article, July 2016) Simple Methods for Estimating Tentative Probabilities for Hypotheses Instead of P Values (draft article, February 2017) Computer simulation is powerful technique for making statistics both more transparent and more powerful: The role of simulation approaches in statistics (Journal of Statistics Education, 2005). Bootstrapped confidence intervals as an approach to statistical inference (Organizational Research Methods,2005). Statistical inference using bootstrap confidence intervals (Significance, 2004). Video on a simulation approach to hypothesis testing, and an improved version of the resampling spreadsheet used in this video Statistics is, of course, an important tool in Research methods